100% FREE Updated: Mar 2026 Probability and Statistics Probability Theory

Random Variables and Distributions

Comprehensive study notes on Random Variables and Distributions for CMI Data Science preparation. This chapter covers key concepts, formulas, and examples needed for your exam.

Random Variables and Distributions

Overview

Welcome to the foundational chapter on Random Variables and Distributions, a cornerstone of your CMI Masters in Data Science curriculum. This chapter is absolutely critical, as it lays the theoretical and practical groundwork for understanding and applying nearly every statistical and machine learning concept you will encounter. Without a firm grasp of random variables and their distributions, topics like hypothesis testing, regression analysis, and even advanced deep learning architectures become abstract and difficult to interpret effectively.

In the CMI context, mastering this material is not just about theoretical understanding; it's about developing the intuition and analytical tools to tackle real-world data challenges. You'll learn how to mathematically model uncertainty, quantify variability, and make informed decisions based on probabilistic outcomes. This chapter directly addresses core competencies required for the CMI exams, ensuring you can correctly identify, apply, and interpret different probabilistic models crucial for data analysis and predictive modeling.

By the end of this chapter, you will possess the essential framework for reasoning about data generation processes, understanding the behavior of estimators, and interpreting the output of complex algorithms. This knowledge is indispensable for building robust data science solutions and effectively communicating insights, making it a high-yield area for your CMI success.

---

Chapter Contents

| # | Topic | What You'll Learn |
|---|-------|-------------------|
| 1 | Random Variables | Quantify outcomes of random experiments numerically. |
| 2 | Distribution Functions | Describe probability of variable taking values. |
| 3 | Expectation and Variance | Measure central tendency and data spread. |
| 4 | Standard Distributions | Explore common, well-understood probabilistic models. |

---

Learning Objectives

❗ By the End of This Chapter

After studying this chapter, you will be able to:

  • Define and classify discrete and continuous random variables, and understand their role in modeling uncertainty.

  • Calculate and interpret Probability Mass Functions (PMFPMF), Probability Density Functions (PDFPDF), and Cumulative Distribution Functions (CDFCDF).

  • Compute and explain the expectation, variance, and standard deviation of random variables.

  • Identify characteristics and apply properties of key standard distributions (e.g., Bernoulli, Binomial, Poisson, Uniform, Normal, Exponential).

---

Now let's begin with Random Variables...
## Part 1: Random Variables

Introduction

In the realm of probability and statistics, a random variable serves as a fundamental concept, bridging the gap between abstract outcomes of a random experiment and numerical values. It is a function that assigns a real number to each outcome in the sample space of a random experiment. This transformation allows us to apply mathematical tools, such as algebra and calculus, to analyze the probabilities associated with these numerical outcomes.

Understanding random variables is crucial for CMI as it forms the bedrock for analyzing data, modeling uncertainty, and making predictions. In data science, almost every piece of data collected or generated can be viewed as a realization of one or more random variables, from the success rate of an algorithm to the error in a measurement. This unit will rigorously define random variables, explore their types, and detail methods for characterizing their behavior through probability distributions.

πŸ“– Random Variable

A random variable XX is a function that maps each outcome Ο‰\omega in the sample space Ξ©\Omega of a random experiment to a unique real number.

X:Ξ©β†’RX: \Omega \to \mathbb{R}

The set of all possible values that a random variable XX can take is called its range or support, denoted by RXR_X.

---

Key Concepts

1. Types of Random Variables

Random variables are primarily classified into two types based on their range:

* Discrete Random Variables: A random variable is discrete if its range RXR_X is a finite or countably infinite set of real numbers. These variables typically arise from counting processes.
* Examples: The number of heads in three coin flips (RX={0,1,2,3}R_X = \{0, 1, 2, 3\}), the number of customers arriving at a store in an hour (RX={0,1,2,… }R_X = \{0, 1, 2, \dots\}).

* Continuous Random Variables: A random variable is continuous if its range RXR_X is an uncountable infinite set, typically an interval or a collection of intervals on the real line. These variables usually arise from measurements.
* Examples: The height of a student, the time it takes for a process to complete, the temperature of a room.

For the purpose of this chapter and the CMI exam, we will primarily focus on discrete random variables, as they are frequently encountered and directly relevant to the provided PYQ.

---

2. Probability Mass Function (PMF)

For a discrete random variable, its probability distribution is described by a Probability Mass Function (PMF). The PMF specifies the probability that the random variable takes on each of its possible values.

πŸ“– Probability Mass Function (PMF)

For a discrete random variable XX with range RX={x1,x2,… }R_X = \{x_1, x_2, \dots\}, its Probability Mass Function (PMF), denoted by pX(x)p_X(x) or P(X=x)P(X=x), is a function such that:

  • pX(x)β‰₯0p_X(x) \ge 0 for all x∈RXx \in R_X.

  • βˆ‘x∈RXpX(x)=1\sum_{x \in R_X} p_X(x) = 1

  • pX(x)=0p_X(x) = 0 for xβˆ‰RXx \notin R_X.

The value pX(x)p_X(x) represents the probability that the random variable XX takes on the specific value xx.

Worked Example:

Problem: A fair coin is flipped three times. Let XX be the random variable representing the number of heads obtained. Determine the PMF of XX.

Solution:

Step 1: Identify the sample space and the values of XX.

The sample space Ξ©\Omega consists of 23=82^3 = 8 equally likely outcomes:
Ξ©={HHH,HHT,HTH,THH,HTT,THT,TTH,TTT}\Omega = \{HHH, HHT, HTH, THH, HTT, THT, TTH, TTT\}

The random variable XX maps each outcome to the number of heads:
X(HHH)=3X(HHH) = 3
X(HHT)=2X(HHT) = 2
X(HTH)=2X(HTH) = 2
X(THH)=2X(THH) = 2
X(HTT)=1X(HTT) = 1
X(THT)=1X(THT) = 1
X(TTH)=1X(TTH) = 1
X(TTT)=0X(TTT) = 0

The range of XX is RX={0,1,2,3}R_X = \{0, 1, 2, 3\}.

Step 2: Calculate the probability for each value in RXR_X.

Since each outcome in Ξ©\Omega has a probability of 1/81/8:

For X=0X=0: Only TTTTTT maps to 00.
pX(0)=P(X=0)=P({TTT})=18p_X(0) = P(X=0) = P(\{TTT\}) = \frac{1}{8}

For X=1X=1: HTT,THT,TTHHTT, THT, TTH map to 11.
pX(1)=P(X=1)=P({HTT,THT,TTH})=38p_X(1) = P(X=1) = P(\{HTT, THT, TTH\}) = \frac{3}{8}

For X=2X=2: HHT,HTH,THHHHT, HTH, THH map to 22.
pX(2)=P(X=2)=P({HHT,HTH,THH})=38p_X(2) = P(X=2) = P(\{HHT, HTH, THH\}) = \frac{3}{8}

For X=3X=3: Only HHHHHH maps to 33.
pX(3)=P(X=3)=P({HHH})=18p_X(3) = P(X=3) = P(\{HHH\}) = \frac{1}{8}

Step 3: Verify the properties of a PMF.

All pX(x)β‰₯0p_X(x) \ge 0.

βˆ‘x∈RXpX(x)=18+38+38+18=88=1\sum_{x \in R_X} p_X(x) = \frac{1}{8} + \frac{3}{8} + \frac{3}{8} + \frac{1}{8} = \frac{8}{8} = 1

Answer: The PMF of XX is:
pX(0)=1/8p_X(0) = 1/8
pX(1)=3/8p_X(1) = 3/8
pX(2)=3/8p_X(2) = 3/8
pX(3)=1/8p_X(3) = 1/8
and pX(x)=0p_X(x) = 0 for xβˆ‰{0,1,2,3}x \notin \{0, 1, 2, 3\}.

---

3. Cumulative Distribution Function (CDF)

The CDF provides a cumulative view of the probabilities, indicating the probability that a random variable XX takes on a value less than or equal to a given value xx.

πŸ“– Cumulative Distribution Function (CDF)

The Cumulative Distribution Function (CDF) of a random variable XX, denoted by FX(x)F_X(x), is defined for any real number xx as:

FX(x)=P(X≀x)F_X(x) = P(X \le x)

For a discrete random variable, the CDF can be calculated by summing the PMF values:

FX(x)=βˆ‘xi≀xpX(xi)F_X(x) = \sum_{x_i \le x} p_X(x_i)

Properties of a CDF:

  • 0≀FX(x)≀10 \le F_X(x) \le 1 for all x∈Rx \in \mathbb{R}.

  • FX(x)F_X(x) is non-decreasing: if a<ba < b, then FX(a)≀FX(b)F_X(a) \le F_X(b).

  • lim⁑xβ†’βˆ’βˆžFX(x)=0\lim_{x \to -\infty} F_X(x) = 0

  • lim⁑xβ†’βˆžFX(x)=1\lim_{x \to \infty} F_X(x) = 1

  • lim⁑tβ†’x+FX(t)=FX(x)\lim_{t \to x^+} F_X(t) = F_X(x)

Worked Example:

Problem: Using the PMF from the previous example (XX = number of heads in 3 coin flips), find the CDF of XX.

Solution:

Step 1: Recall the PMF values.
pX(0)=1/8p_X(0) = 1/8
pX(1)=3/8p_X(1) = 3/8
pX(2)=3/8p_X(2) = 3/8
pX(3)=1/8p_X(3) = 1/8

Step 2: Calculate FX(x)F_X(x) for different intervals of xx.

For x<0x < 0:
FX(x)=P(X≀x)=0F_X(x) = P(X \le x) = 0 (since XX cannot be negative)

For 0≀x<10 \le x < 1:
FX(x)=P(X≀x)=pX(0)=18F_X(x) = P(X \le x) = p_X(0) = \frac{1}{8}

For 1≀x<21 \le x < 2:
FX(x)=P(X≀x)=pX(0)+pX(1)=18+38=48=12F_X(x) = P(X \le x) = p_X(0) + p_X(1) = \frac{1}{8} + \frac{3}{8} = \frac{4}{8} = \frac{1}{2}

For 2≀x<32 \le x < 3:
FX(x)=P(X≀x)=pX(0)+pX(1)+pX(2)=18+38+38=78F_X(x) = P(X \le x) = p_X(0) + p_X(1) + p_X(2) = \frac{1}{8} + \frac{3}{8} + \frac{3}{8} = \frac{7}{8}

For xβ‰₯3x \ge 3:
FX(x)=P(X≀x)=pX(0)+pX(1)+pX(2)+pX(3)=18+38+38+18=88=1F_X(x) = P(X \le x) = p_X(0) + p_X(1) + p_X(2) + p_X(3) = \frac{1}{8} + \frac{3}{8} + \frac{3}{8} + \frac{1}{8} = \frac{8}{8} = 1

Answer: The CDF of XX is:

FX(x)={0x<01/80≀x<11/21≀x<27/82≀x<31xβ‰₯3F_X(x) = \begin{cases} 0 & x < 0 \\ 1/8 & 0 \le x < 1 \\ 1/2 & 1 \le x < 2 \\ 7/8 & 2 \le x < 3 \\ 1 & x \ge 3 \end{cases}

---

4. Functions of Random Variables

Often, we are interested in a new random variable YY that is a function of an existing random variable XX, i.e., Y=g(X)Y = g(X). To find the PMF of YY, we need to identify the possible values of YY and sum the probabilities of all XX values that map to each YY value. This concept is directly tested in the provided CMI PYQ.

❗ Finding the PMF of Y = g(X)

Let XX be a discrete random variable with PMF pX(x)p_X(x) and range RXR_X.
Let Y=g(X)Y = g(X) be a new discrete random variable.
The range of YY is RY={y∣y=g(x) for some x∈RX}R_Y = \{y | y = g(x) \text{ for some } x \in R_X\}.
The PMF of YY, pY(y)p_Y(y), is given by:

pY(y)=P(Y=y)=βˆ‘x∈RX:g(x)=ypX(x)p_Y(y) = P(Y=y) = \sum_{x \in R_X: g(x)=y} p_X(x)

This means for each value yy in the range of YY, we sum the probabilities of all xx values in RXR_X that are mapped to yy by the function gg.

Worked Example:

Problem: Let XX be a discrete random variable with PMF given by:
pX(1)=0.2p_X(1) = 0.2, pX(2)=0.3p_X(2) = 0.3, pX(3)=0.3p_X(3) = 0.3, pX(4)=0.2p_X(4) = 0.2.
Let Y=(Xβˆ’2)2Y = (X-2)^2. Find the PMF of YY.

Solution:

Step 1: Identify the range of XX and its PMF.
RX={1,2,3,4}R_X = \{1, 2, 3, 4\}
pX(1)=0.2p_X(1) = 0.2
pX(2)=0.3p_X(2) = 0.3
pX(3)=0.3p_X(3) = 0.3
pX(4)=0.2p_X(4) = 0.2

Step 2: Determine the possible values of Y=(Xβˆ’2)2Y = (X-2)^2 by applying the function g(x)=(xβˆ’2)2g(x) = (x-2)^2 to each value in RXR_X.

For x=1x=1: y=(1βˆ’2)2=(βˆ’1)2=1y = (1-2)^2 = (-1)^2 = 1
For x=2x=2: y=(2βˆ’2)2=(0)2=0y = (2-2)^2 = (0)^2 = 0
For x=3x=3: y=(3βˆ’2)2=(1)2=1y = (3-2)^2 = (1)^2 = 1
For x=4x=4: y=(4βˆ’2)2=(2)2=4y = (4-2)^2 = (2)^2 = 4

The range of YY is RY={0,1,4}R_Y = \{0, 1, 4\}.

Step 3: Calculate the PMF of YY for each value in RYR_Y.

For Y=0Y=0: Only X=2X=2 maps to Y=0Y=0.
pY(0)=P(Y=0)=P(X=2)=pX(2)=0.3p_Y(0) = P(Y=0) = P(X=2) = p_X(2) = 0.3

For Y=1Y=1: X=1X=1 and X=3X=3 map to Y=1Y=1.
pY(1)=P(Y=1)=P(X=1Β orΒ X=3)=pX(1)+pX(3)=0.2+0.3=0.5p_Y(1) = P(Y=1) = P(X=1 \text{ or } X=3) = p_X(1) + p_X(3) = 0.2 + 0.3 = 0.5

For Y=4Y=4: Only X=4X=4 maps to Y=4Y=4.
pY(4)=P(Y=4)=P(X=4)=pX(4)=0.2p_Y(4) = P(Y=4) = P(X=4) = p_X(4) = 0.2

Step 4: Verify the PMF properties.
All pY(y)β‰₯0p_Y(y) \ge 0.

βˆ‘y∈RYpY(y)=0.3+0.5+0.2=1.0\sum_{y \in R_Y} p_Y(y) = 0.3 + 0.5 + 0.2 = 1.0

Answer: The PMF of YY is:
pY(0)=0.3p_Y(0) = 0.3
pY(1)=0.5p_Y(1) = 0.5
pY(4)=0.2p_Y(4) = 0.2
and pY(y)=0p_Y(y) = 0 for yβˆ‰{0,1,4}y \notin \{0, 1, 4\}.

---

5. Expected Value of a Discrete Random Variable

The expected value (or mean) of a random variable is a measure of its central tendency, representing the average value we would expect to observe if the experiment were repeated many times.

πŸ“ Expected Value (Mean)

For a discrete random variable XX with PMF pX(x)p_X(x) and range RXR_X, the Expected Value (or Mean), denoted by E⁑[X]\operatorname{E}[X] or μX\mu_X, is:

E⁑[X]=βˆ‘x∈RXxβ‹…pX(x)\operatorname{E}[X] = \sum_{x \in R_X} x \cdot p_X(x)

Variables:

    • XX = discrete random variable

    • xx = a specific value in the range of XX

    • pX(x)p_X(x) = probability that XX takes the value xx


When to use: To find the long-run average of a random variable, or its central location.

πŸ“ Expected Value of a Function of a Random Variable

If Y=g(X)Y = g(X) is a function of a discrete random variable XX, its expected value can be calculated directly from the PMF of XX:

E⁑[g(X)]=βˆ‘x∈RXg(x)β‹…pX(x)\operatorname{E}[g(X)] = \sum_{x \in R_X} g(x) \cdot p_X(x)

Variables:

    • g(X)g(X) = function of the random variable XX

    • xx = a specific value in the range of XX

    • pX(x)p_X(x) = probability that XX takes the value xx


When to use: To find the average value of a transformation of a random variable without first finding the PMF of Y=g(X)Y=g(X).

Properties of Expected Value:

  • E⁑[c]=c\operatorname{E}[c] = c for any constant cc.

  • E⁑[aX+b]=aE⁑[X]+b\operatorname{E}[aX + b] = a\operatorname{E}[X] + b for constants a,ba, b. (Linearity of Expectation)

  • E⁑[X+Y]=E⁑[X]+E⁑[Y]\operatorname{E}[X+Y] = \operatorname{E}[X] + \operatorname{E}[Y] (for any random variables X,YX, Y, not necessarily independent).
  • Worked Example:

    Problem: For the random variable XX (number of heads in 3 coin flips) with PMF: pX(0)=1/8p_X(0) = 1/8, pX(1)=3/8p_X(1) = 3/8, pX(2)=3/8p_X(2) = 3/8, pX(3)=1/8p_X(3) = 1/8, calculate E⁑[X]\operatorname{E}[X] and E⁑[2X+1]\operatorname{E}[2X+1].

    Solution:

    Step 1: Calculate E⁑[X]\operatorname{E}[X].

    E⁑[X]=βˆ‘x∈RXxβ‹…pX(x)\operatorname{E}[X] = \sum_{x \in R_X} x \cdot p_X(x)
    E⁑[X]=(0β‹…18)+(1β‹…38)+(2β‹…38)+(3β‹…18)\operatorname{E}[X] = (0 \cdot \frac{1}{8}) + (1 \cdot \frac{3}{8}) + (2 \cdot \frac{3}{8}) + (3 \cdot \frac{1}{8})
    E⁑[X]=0+38+68+38\operatorname{E}[X] = 0 + \frac{3}{8} + \frac{6}{8} + \frac{3}{8}
    E⁑[X]=128=32=1.5\operatorname{E}[X] = \frac{12}{8} = \frac{3}{2} = 1.5

    Step 2: Calculate E⁑[2X+1]\operatorname{E}[2X+1] using linearity of expectation.

    E⁑[2X+1]=2E⁑[X]+1\operatorname{E}[2X+1] = 2\operatorname{E}[X] + 1
    E⁑[2X+1]=2(1.5)+1\operatorname{E}[2X+1] = 2(1.5) + 1
    E⁑[2X+1]=3+1\operatorname{E}[2X+1] = 3 + 1
    E⁑[2X+1]=4\operatorname{E}[2X+1] = 4

    Answer: E⁑[X]=1.5 and E⁑[2X+1]=4\boxed{\operatorname{E}[X] = 1.5 \text{ and } \operatorname{E}[2X+1] = 4}

    ---

    6. Variance of a Discrete Random Variable

    The variance measures the spread or dispersion of the values of a random variable around its mean. A higher variance indicates greater variability.

    πŸ“ Variance

    For a discrete random variable XX with PMF pX(x)p_X(x) and mean E⁑[X]=ΞΌX\operatorname{E}[X] = \mu_X, the Variance, denoted by Var⁑(X)\operatorname{Var}(X) or ΟƒX2\sigma_X^2, is:

    Var⁑(X)=E⁑[(Xβˆ’ΞΌX)2]=βˆ‘x∈RX(xβˆ’ΞΌX)2pX(x)\operatorname{Var}(X) = \operatorname{E}[(X - \mu_X)^2] = \sum_{x \in R_X} (x - \mu_X)^2 p_X(x)

    An often more convenient computational formula for variance is:

    Var⁑(X)=E⁑[X2]βˆ’(E⁑[X])2\operatorname{Var}(X) = \operatorname{E}[X^2] - (\operatorname{E}[X])^2

    where E⁑[X2]=βˆ‘x∈RXx2pX(x)\operatorname{E}[X^2] = \sum_{x \in R_X} x^2 p_X(x).

    Variables:

      • XX = discrete random variable

      • ΞΌX\mu_X = expected value (mean) of XX

      • pX(x)p_X(x) = probability that XX takes the value xx


    When to use: To quantify the spread or variability of a random variable's distribution.

    πŸ“– Standard Deviation

    The Standard Deviation of a random variable XX, denoted by ΟƒX\sigma_X, is the positive square root of its variance:

    ΟƒX=Var⁑(X)\sigma_X = \sqrt{\operatorname{Var}(X)}

    Properties of Variance:

  • Var⁑(c)=0\operatorname{Var}(c) = 0 for any constant cc.

  • Var⁑(aX+b)=a2Var⁑(X)\operatorname{Var}(aX + b) = a^2 \operatorname{Var}(X) for constants a,ba, b. Note that the constant bb does not affect variance.

  • Var⁑(X)β‰₯0\operatorname{Var}(X) \ge 0.
  • Worked Example:

    Problem: For the random variable XX (number of heads in 3 coin flips) with PMF: pX(0)=1/8p_X(0) = 1/8, pX(1)=3/8p_X(1) = 3/8, pX(2)=3/8p_X(2) = 3/8, pX(3)=1/8p_X(3) = 1/8, and E⁑[X]=1.5\operatorname{E}[X] = 1.5, calculate Var⁑(X)\operatorname{Var}(X) and Var⁑(2X+1)\operatorname{Var}(2X+1).

    Solution:

    Step 1: Calculate E⁑[X2]\operatorname{E}[X^2].

    E⁑[X2]=βˆ‘x∈RXx2pX(x)\operatorname{E}[X^2] = \sum_{x \in R_X} x^2 p_X(x)
    E⁑[X2]=(02β‹…18)+(12β‹…38)+(22β‹…38)+(32β‹…18)\operatorname{E}[X^2] = (0^2 \cdot \frac{1}{8}) + (1^2 \cdot \frac{3}{8}) + (2^2 \cdot \frac{3}{8}) + (3^2 \cdot \frac{1}{8})
    E⁑[X2]=(0β‹…18)+(1β‹…38)+(4β‹…38)+(9β‹…18)\operatorname{E}[X^2] = (0 \cdot \frac{1}{8}) + (1 \cdot \frac{3}{8}) + (4 \cdot \frac{3}{8}) + (9 \cdot \frac{1}{8})
    E⁑[X2]=0+38+128+98\operatorname{E}[X^2] = 0 + \frac{3}{8} + \frac{12}{8} + \frac{9}{8}
    E⁑[X2]=248=3\operatorname{E}[X^2] = \frac{24}{8} = 3

    Step 2: Calculate Var⁑(X)\operatorname{Var}(X) using the computational formula.

    Var⁑(X)=E⁑[X2]βˆ’(E⁑[X])2\operatorname{Var}(X) = \operatorname{E}[X^2] - (\operatorname{E}[X])^2
    Var⁑(X)=3βˆ’(1.5)2\operatorname{Var}(X) = 3 - (1.5)^2
    Var⁑(X)=3βˆ’2.25\operatorname{Var}(X) = 3 - 2.25
    Var⁑(X)=0.75\operatorname{Var}(X) = 0.75

    Step 3: Calculate Var⁑(2X+1)\operatorname{Var}(2X+1) using properties of variance.

    Var⁑(2X+1)=22Var⁑(X)\operatorname{Var}(2X+1) = 2^2 \operatorname{Var}(X)
    Var⁑(2X+1)=4β‹…0.75\operatorname{Var}(2X+1) = 4 \cdot 0.75
    Var⁑(2X+1)=3\operatorname{Var}(2X+1) = 3

    Answer: Var⁑(X)=0.75 and Var⁑(2X+1)=3\boxed{\operatorname{Var}(X) = 0.75 \text{ and } \operatorname{Var}(2X+1) = 3}

    ---

    7. Joint Probability and Independence of Random Variables

    When dealing with multiple random variables, we often need to understand their joint behavior.

    πŸ“– Joint Probability Mass Function (Joint PMF)

    For two discrete random variables XX and YY, their Joint Probability Mass Function (Joint PMF), denoted by pX,Y(x,y)p_{X,Y}(x,y) or P(X=x,Y=y)P(X=x, Y=y), is a function such that:

    • pX,Y(x,y)β‰₯0p_{X,Y}(x,y) \ge 0 for all (x,y)(x,y) in the joint range.

    • βˆ‘xβˆ‘ypX,Y(x,y)=1\sum_x \sum_y p_{X,Y}(x,y) = 1.

    The value pX,Y(x,y)p_{X,Y}(x,y) represents the probability that XX takes value xx AND YY takes value yy simultaneously.

    From a joint PMF, we can derive the marginal PMFs for XX and YY:

    pX(x)=βˆ‘ypX,Y(x,y)p_X(x) = \sum_y p_{X,Y}(x,y)

    pY(y)=βˆ‘xpX,Y(x,y)p_Y(y) = \sum_x p_{X,Y}(x,y)

    πŸ“– Independence of Discrete Random Variables

    Two discrete random variables XX and YY are said to be independent if and only if their joint PMF is equal to the product of their marginal PMFs for all possible values xx and yy:

    pX,Y(x,y)=pX(x)β‹…pY(y)forΒ allΒ x,yp_{X,Y}(x,y) = p_X(x) \cdot p_Y(y) \quad \text{for all } x, y

    Equivalently, XX and YY are independent if P(X=x,Y=y)=P(X=x)P(Y=y)P(X=x, Y=y) = P(X=x)P(Y=y) for all x,yx,y.

    ⚠️ Independence of X and g(X)

    ❌ A common mistake is assuming that if Y=g(X)Y = g(X), then XX and YY are independent.
    βœ… This is generally false. If YY is a non-trivial function of XX, they are dependent. Knowing the value of XX directly tells you the value of YY, which is the definition of dependence. The only exception is if g(X)g(X) is a constant, in which case YY is not truly random (and independence holds vacuously for XX and a constant). The PYQ explicitly tests this concept.

    ---

    8. Uniform Discrete Distribution

    A random variable follows a uniform discrete distribution if each value in its finite range has an equal probability of being observed. This is directly stated in the PYQ.

    πŸ“– Uniform Discrete Random Variable

    A discrete random variable XX has a uniform distribution over a finite set of NN values {x1,x2,…,xN}\{x_1, x_2, \dots, x_N\} if its PMF is given by:

    pX(xi)=1NforΒ i=1,2,…,Np_X(x_i) = \frac{1}{N} \quad \text{for } i = 1, 2, \dots, N

    and pX(x)=0p_X(x) = 0 otherwise.

    Expected Value: E⁑[X]=1Nβˆ‘i=1Nxi\operatorname{E}[X] = \frac{1}{N} \sum_{i=1}^N x_i
    Variance: Var⁑(X)=1Nβˆ‘i=1Nxi2βˆ’(1Nβˆ‘i=1Nxi)2\operatorname{Var}(X) = \frac{1}{N} \sum_{i=1}^N x_i^2 - \left(\frac{1}{N} \sum_{i=1}^N x_i\right)^2

    Worked Example:

    Problem: Let XX be a random variable sampled uniformly at random from the set S={0,1,2,3,4}S = \{0, 1, 2, 3, 4\}.
    a) What is the PMF of XX?
    b) Calculate E⁑[X]\operatorname{E}[X].

    Solution:

    Step 1: Identify the size of the set SS.
    The set SS has N=5N=5 elements.

    Step 2: Determine the PMF.
    Since XX is sampled uniformly, each element has a probability of 1/N1/N.

    a) The PMF of XX is:
    pX(x)=15p_X(x) = \frac{1}{5} for x∈{0,1,2,3,4}x \in \{0, 1, 2, 3, 4\}
    and pX(x)=0p_X(x) = 0 otherwise.

    Step 3: Calculate E⁑[X]\operatorname{E}[X].

    E⁑[X]=βˆ‘x∈Sxβ‹…pX(x)\operatorname{E}[X] = \sum_{x \in S} x \cdot p_X(x)
    E⁑[X]=(0β‹…15)+(1β‹…15)+(2β‹…15)+(3β‹…15)+(4β‹…15)\operatorname{E}[X] = (0 \cdot \frac{1}{5}) + (1 \cdot \frac{1}{5}) + (2 \cdot \frac{1}{5}) + (3 \cdot \frac{1}{5}) + (4 \cdot \frac{1}{5})
    E⁑[X]=15(0+1+2+3+4)\operatorname{E}[X] = \frac{1}{5} (0 + 1 + 2 + 3 + 4)
    E⁑[X]=105=2\operatorname{E}[X] = \frac{10}{5} = 2

    Answer: a) pX(x)=1/5 for x∈{0,1,2,3,4}. b) E⁑[X]=2.\boxed{\text{a) } p_X(x) = 1/5 \text{ for } x \in \{0,1,2,3,4\}. \text{ b) } \operatorname{E}[X] = 2.}

    ---

    Problem-Solving Strategies

    πŸ’‘ CMI Strategy: Functions of RVs

    When asked about the distribution or probability of Y=g(X)Y = g(X):

    • List RXR_X and pX(x)p_X(x): Clearly write down the range and PMF of the original random variable XX.

    • Determine RYR_Y: For each x∈RXx \in R_X, calculate y=g(x)y = g(x). Collect these unique yy values to form RYR_Y.

    • Map XX to YY: For each y∈RYy \in R_Y, identify all x∈RXx \in R_X such that g(x)=yg(x) = y.

    • Sum Probabilities: pY(y)=βˆ‘x:g(x)=ypX(x)p_Y(y) = \sum_{x: g(x)=y} p_X(x).

    • Verify: Ensure βˆ‘y∈RYpY(y)=1\sum_{y \in R_Y} p_Y(y) = 1.

    This systematic approach minimizes errors, especially when g(X)g(X) is not a one-to-one function.

    πŸ’‘ CMI Strategy: Independence Check

    To verify if XX and YY are independent:

    • Calculate Marginal PMFs: Find pX(x)p_X(x) and pY(y)p_Y(y) from the joint PMF pX,Y(x,y)p_{X,Y}(x,y).

    • Check Condition: For all pairs (x,y)(x,y), verify if pX,Y(x,y)=pX(x)β‹…pY(y)p_{X,Y}(x,y) = p_X(x) \cdot p_Y(y).

    • One Counterexample is Enough: If you find even one pair (x,y)(x,y) for which the equality does not hold, then XX and YY are dependent.

    ---

    Common Mistakes

    ⚠️ Avoid These Errors
      • ❌ Assuming XX and g(X)g(X) are independent: This is a very common trap. As discussed, knowing XX usually determines g(X)g(X), making them dependent. For instance, if XX is the number of heads, and Y=X2Y=X^2, they are clearly dependent.
    βœ… Correct approach: Always assume XX and g(X)g(X) are dependent unless g(X)g(X) is a constant function or specifically proven otherwise.
      • ❌ Incorrectly calculating PMF for Y=g(X)Y=g(X): Forgetting to sum probabilities for all XX values that map to the same YY value.
    βœ… Correct approach: Systematically list all xx values, calculate their corresponding yy values, group xx values that result in the same yy, and sum their original pX(x)p_X(x) values.
      • ❌ Confusing PMF and CDF: Using P(X=x)P(X=x) when P(X≀x)P(X \le x) is required, or vice versa.
    βœ… Correct approach: Remember pX(x)p_X(x) is for a single value, FX(x)F_X(x) is for values up to and including xx. For discrete RVs, P(a<X≀b)=FX(b)βˆ’FX(a)P(a < X \le b) = F_X(b) - F_X(a).
      • ❌ Arithmetic errors with modulo operator: Misunderstanding the range of values produced by a(modn)a \pmod n.
    βœ… Correct approach: Recall that a(modn)a \pmod n always results in a value in the set {0,1,…,nβˆ’1}\{0, 1, \dots, n-1\} for positive nn. For example, 5(mod3)=25 \pmod 3 = 2, and 0(mod3)=00 \pmod 3 = 0.

    ---

    Practice Questions

    :::question type="MCQ" question="Let XX be a discrete random variable with PMF pX(x)p_X(x) given by pX(1)=0.1p_X(1)=0.1, pX(2)=0.3p_X(2)=0.3, pX(3)=0.4p_X(3)=0.4, pX(4)=0.2p_X(4)=0.2. Let Y=∣Xβˆ’2∣Y = |X-2|. Which of the following is the correct PMF for YY?" options=["pY(0)=0.3,pY(1)=0.5,pY(2)=0.2p_Y(0)=0.3, p_Y(1)=0.5, p_Y(2)=0.2","pY(0)=0.1,pY(1)=0.3,pY(2)=0.4,pY(3)=0.2p_Y(0)=0.1, p_Y(1)=0.3, p_Y(2)=0.4, p_Y(3)=0.2","pY(0)=0.3,pY(1)=0.3,pY(2)=0.4p_Y(0)=0.3, p_Y(1)=0.3, p_Y(2)=0.4","pY(0)=0.3,pY(1)=0.6,pY(2)=0.1p_Y(0)=0.3, p_Y(1)=0.6, p_Y(2)=0.1"] answer="pY(0)=0.3,pY(1)=0.5,pY(2)=0.2p_Y(0)=0.3, p_Y(1)=0.5, p_Y(2)=0.2" hint="Map each value of XX to YY and sum probabilities for repeated YY values." solution="
    Step 1: Determine the values of Y=∣Xβˆ’2∣Y = |X-2| for each x∈RXx \in R_X.

    • If X=1X=1, Y=∣1βˆ’2∣=βˆ£βˆ’1∣=1Y = |1-2| = |-1| = 1.

    • If X=2X=2, Y=∣2βˆ’2∣=∣0∣=0Y = |2-2| = |0| = 0.

    • If X=3X=3, Y=∣3βˆ’2∣=∣1∣=1Y = |3-2| = |1| = 1.

    • If X=4X=4, Y=∣4βˆ’2∣=∣2∣=2Y = |4-2| = |2| = 2.


    Step 2: Identify the range of YY, which is RY={0,1,2}R_Y = \{0, 1, 2\}.

    Step 3: Calculate the PMF for YY.

    • pY(0)=P(Y=0)=P(X=2)=pX(2)=0.3p_Y(0) = P(Y=0) = P(X=2) = p_X(2) = 0.3.

    • pY(1)=P(Y=1)=P(X=1Β orΒ X=3)=pX(1)+pX(3)=0.1+0.4=0.5p_Y(1) = P(Y=1) = P(X=1 \text{ or } X=3) = p_X(1) + p_X(3) = 0.1 + 0.4 = 0.5.

    • pY(2)=P(Y=2)=P(X=4)=pX(4)=0.2p_Y(2) = P(Y=2) = P(X=4) = p_X(4) = 0.2.


    Step 4: Verify that the probabilities sum to 1: 0.3+0.5+0.2=1.00.3 + 0.5 + 0.2 = 1.0.
    Answer: pY(0)=0.3,pY(1)=0.5,pY(2)=0.2\boxed{p_Y(0)=0.3, p_Y(1)=0.5, p_Y(2)=0.2}
    "
    :::

    :::question type="NAT" question="A discrete random variable XX has the following PMF: pX(x)=c(x+1)p_X(x) = c(x+1) for x∈{0,1,2}x \in \{0, 1, 2\}, and 00 otherwise. Calculate the value of E⁑[X2]\operatorname{E}[X^2]. (Enter your answer as a decimal rounded to two decimal places.)" answer="2.33" hint="First find the constant cc by ensuring the sum of probabilities is 1. Then calculate E⁑[X2]\operatorname{E}[X^2]." solution="
    Step 1: Find the constant cc.
    The sum of probabilities must be 1:

    βˆ‘x=02pX(x)=1\sum_{x=0}^2 p_X(x) = 1

    c(0+1)+c(1+1)+c(2+1)=1c(0+1) + c(1+1) + c(2+1) = 1

    c(1)+c(2)+c(3)=1c(1) + c(2) + c(3) = 1

    c(1+2+3)=1c(1+2+3) = 1

    6c=16c = 1

    c=16c = \frac{1}{6}

    Step 2: Write out the full PMF.
    pX(0)=16(0+1)=16p_X(0) = \frac{1}{6}(0+1) = \frac{1}{6}
    pX(1)=16(1+1)=26p_X(1) = \frac{1}{6}(1+1) = \frac{2}{6}
    pX(2)=16(2+1)=36p_X(2) = \frac{1}{6}(2+1) = \frac{3}{6}

    Step 3: Calculate E⁑[X2]\operatorname{E}[X^2].

    E⁑[X2]=βˆ‘x∈RXx2pX(x)\operatorname{E}[X^2] = \sum_{x \in R_X} x^2 p_X(x)

    E⁑[X2]=(02β‹…16)+(12β‹…26)+(22β‹…36)\operatorname{E}[X^2] = (0^2 \cdot \frac{1}{6}) + (1^2 \cdot \frac{2}{6}) + (2^2 \cdot \frac{3}{6})

    E⁑[X2]=(0β‹…16)+(1β‹…26)+(4β‹…36)\operatorname{E}[X^2] = (0 \cdot \frac{1}{6}) + (1 \cdot \frac{2}{6}) + (4 \cdot \frac{3}{6})

    E⁑[X2]=0+26+126\operatorname{E}[X^2] = 0 + \frac{2}{6} + \frac{12}{6}

    E⁑[X2]=146=73\operatorname{E}[X^2] = \frac{14}{6} = \frac{7}{3}

    Step 4: Convert to decimal rounded to two places.
    7/3β‰ˆ2.3333...7/3 \approx 2.3333...
    Rounded to two decimal places, E⁑[X2]=2.33\operatorname{E}[X^2] = 2.33.
    Answer: 2.33\boxed{2.33}
    "
    :::

    :::question type="MSQ" question="Let XX be a random variable representing the outcome of rolling a fair six-sided die, so RX={1,2,3,4,5,6}R_X = \{1, 2, 3, 4, 5, 6\}. Let Y=X(mod3)Y = X \pmod 3. Which of the following statements is/are true?" options=["P(Y=0)=1/3P(Y=0) = 1/3","XX and YY are independent","E⁑[Y]=1\operatorname{E}[Y] = 1","Var⁑(Y)=2/3\operatorname{Var}(Y) = 2/3"] answer="A,C,D" hint="Calculate the PMF of YY first. Then evaluate independence, expected value, and variance." solution="
    Step 1: Determine the PMF of XX. Since it's a fair die, pX(x)=1/6p_X(x) = 1/6 for x∈{1,2,3,4,5,6}x \in \{1, 2, 3, 4, 5, 6\}.

    Step 2: Determine the PMF of Y=X(mod3)Y = X \pmod 3.

    • If X=1X=1, Y=1(mod3)=1Y = 1 \pmod 3 = 1.

    • If X=2X=2, Y=2(mod3)=2Y = 2 \pmod 3 = 2.

    • If X=3X=3, Y=3(mod3)=0Y = 3 \pmod 3 = 0.

    • If X=4X=4, Y=4(mod3)=1Y = 4 \pmod 3 = 1.

    • If X=5X=5, Y=5(mod3)=2Y = 5 \pmod 3 = 2.

    • If X=6X=6, Y=6(mod3)=0Y = 6 \pmod 3 = 0.

    The range of YY is RY={0,1,2}R_Y = \{0, 1, 2\}.

    Step 3: Calculate pY(y)p_Y(y).

    • P(Y=0)=P(X=3Β orΒ X=6)=pX(3)+pX(6)=1/6+1/6=2/6=1/3P(Y=0) = P(X=3 \text{ or } X=6) = p_X(3) + p_X(6) = 1/6 + 1/6 = 2/6 = 1/3. (Statement A is TRUE)

    • P(Y=1)=P(X=1Β orΒ X=4)=pX(1)+pX(4)=1/6+1/6=2/6=1/3P(Y=1) = P(X=1 \text{ or } X=4) = p_X(1) + p_X(4) = 1/6 + 1/6 = 2/6 = 1/3.

    • P(Y=2)=P(X=2Β orΒ X=5)=pX(2)+pX(5)=1/6+1/6=2/6=1/3P(Y=2) = P(X=2 \text{ or } X=5) = p_X(2) + p_X(5) = 1/6 + 1/6 = 2/6 = 1/3.


    Step 4: Evaluate the statements.

    Statement A: P(Y=0)=1/3P(Y=0) = 1/3. This is TRUE from our calculation above.

    Statement B: XX and YY are independent.
    Since YY is a function of XX (Y=g(X)Y = g(X)), they are generally dependent. For example, if we know X=1X=1, then YY must be 1(mod3)=11 \pmod 3 = 1. This means P(Y=1∣X=1)=1β‰ P(Y=1)=1/3P(Y=1 | X=1) = 1 \ne P(Y=1) = 1/3. Thus, XX and YY are dependent. (Statement B is FALSE)

    Statement C: E⁑[Y]=1\operatorname{E}[Y] = 1.

    E⁑[Y]=βˆ‘y∈RYyβ‹…pY(y)=(0β‹…13)+(1β‹…13)+(2β‹…13)\operatorname{E}[Y] = \sum_{y \in R_Y} y \cdot p_Y(y) = (0 \cdot \frac{1}{3}) + (1 \cdot \frac{1}{3}) + (2 \cdot \frac{1}{3})

    E⁑[Y]=0+13+23=33=1\operatorname{E}[Y] = 0 + \frac{1}{3} + \frac{2}{3} = \frac{3}{3} = 1

    (Statement C is TRUE)

    Statement D: Var⁑(Y)=2/3\operatorname{Var}(Y) = 2/3.
    First, calculate E⁑[Y2]\operatorname{E}[Y^2].

    E⁑[Y2]=(02β‹…13)+(12β‹…13)+(22β‹…13)\operatorname{E}[Y^2] = (0^2 \cdot \frac{1}{3}) + (1^2 \cdot \frac{1}{3}) + (2^2 \cdot \frac{1}{3})

    E⁑[Y2]=0+13+43=53\operatorname{E}[Y^2] = 0 + \frac{1}{3} + \frac{4}{3} = \frac{5}{3}

    Now, calculate Var⁑(Y)\operatorname{Var}(Y).
    Var⁑(Y)=E⁑[Y2]βˆ’(E⁑[Y])2=53βˆ’(1)2=53βˆ’1=23\operatorname{Var}(Y) = \operatorname{E}[Y^2] - (\operatorname{E}[Y])^2 = \frac{5}{3} - (1)^2 = \frac{5}{3} - 1 = \frac{2}{3}

    (Statement D is TRUE)
    Answer: A,Β C,Β D\boxed{\text{A, C, D}}
    "
    :::

    :::question type="SUB" question="Let XX be a discrete random variable with PMF pX(x)=12xp_X(x) = \frac{1}{2^x} for x∈{1,2,3,… }x \in \{1, 2, 3, \dots\}, and 00 otherwise.
    a) Prove that this is a valid PMF.
    b) Derive the expression for the CDF, FX(x)F_X(x).
    c) Calculate E⁑[X]\operatorname{E}[X]. " answer="a) βˆ‘pX(x)=1\sum p_X(x) = 1. b) FX(x)=1βˆ’12⌊xβŒ‹F_X(x) = 1 - \frac{1}{2^{\lfloor x \rfloor}}. c) E⁑[X]=2\operatorname{E}[X] = 2." hint="For part a), use the sum of a geometric series. For part b), use the definition of CDF. For part c), use the formula for E⁑[X]\operatorname{E}[X] and the sum βˆ‘k=1∞krk=r(1βˆ’r)2\sum_{k=1}^{\infty} k r^k = \frac{r}{(1-r)^2} for ∣r∣<1|r|<1." solution="
    Part a) Prove that this is a valid PMF.

    Step 1: Check non-negativity.
    For x∈{1,2,3,… }x \in \{1, 2, 3, \dots\}, pX(x)=12x>0p_X(x) = \frac{1}{2^x} > 0. For other xx, pX(x)=0p_X(x)=0. So, pX(x)β‰₯0p_X(x) \ge 0 for all xx.

    Step 2: Check if the sum of probabilities is 1.
    We need to evaluate βˆ‘x=1∞pX(x)\sum_{x=1}^{\infty} p_X(x).

    βˆ‘x=1∞12x=121+122+123+…\sum_{x=1}^{\infty} \frac{1}{2^x} = \frac{1}{2^1} + \frac{1}{2^2} + \frac{1}{2^3} + \dots

    This is a geometric series with first term a=1/2a = 1/2 and common ratio r=1/2r = 1/2.
    The sum of an infinite geometric series is a1βˆ’r\frac{a}{1-r} for ∣r∣<1|r|<1.

    βˆ‘x=1∞12x=1/21βˆ’1/2=1/21/2=1\sum_{x=1}^{\infty} \frac{1}{2^x} = \frac{1/2}{1-1/2} = \frac{1/2}{1/2} = 1

    Since both conditions are met, pX(x)p_X(x) is a valid PMF.

    Part b) Derive the expression for the CDF, FX(x)F_X(x).

    Step 1: Define FX(x)=P(X≀x)F_X(x) = P(X \le x).

    For x<1x < 1:
    FX(x)=0F_X(x) = 0 (since XX cannot take values less than 1)

    For xβ‰₯1x \ge 1:

    FX(x)=βˆ‘k=1⌊xβŒ‹pX(k)=βˆ‘k=1⌊xβŒ‹12kF_X(x) = \sum_{k=1}^{\lfloor x \rfloor} p_X(k) = \sum_{k=1}^{\lfloor x \rfloor} \frac{1}{2^k}

    This is a finite geometric series with a=1/2a=1/2, r=1/2r=1/2, and n=⌊xβŒ‹n = \lfloor x \rfloor terms.
    The sum of a finite geometric series is a1βˆ’rn1βˆ’ra \frac{1-r^n}{1-r}.

    FX(x)=1/2(1βˆ’(1/2)⌊xβŒ‹)1βˆ’1/2=1/2(1βˆ’(1/2)⌊xβŒ‹)1/2F_X(x) = \frac{1/2 (1 - (1/2)^{\lfloor x \rfloor})}{1 - 1/2} = \frac{1/2 (1 - (1/2)^{\lfloor x \rfloor})}{1/2}
    FX(x)=1βˆ’(12)⌊xβŒ‹=1βˆ’12⌊xβŒ‹F_X(x) = 1 - \left(\frac{1}{2}\right)^{\lfloor x \rfloor} = 1 - \frac{1}{2^{\lfloor x \rfloor}}

    Thus, the CDF is:

    FX(x)={0x<11βˆ’12⌊xβŒ‹xβ‰₯1F_X(x) = \begin{cases} 0 & x < 1 \\ 1 - \frac{1}{2^{\lfloor x \rfloor}} & x \ge 1 \end{cases}

    Part c) Calculate E⁑[X]\operatorname{E}[X].

    Step 1: Use the definition of expected value.

    E⁑[X]=βˆ‘x=1∞xβ‹…pX(x)=βˆ‘x=1∞xβ‹…12x\operatorname{E}[X] = \sum_{x=1}^{\infty} x \cdot p_X(x) = \sum_{x=1}^{\infty} x \cdot \frac{1}{2^x}

    This is a known series sum. For a geometric series βˆ‘k=1∞krk=r(1βˆ’r)2\sum_{k=1}^{\infty} k r^k = \frac{r}{(1-r)^2} for ∣r∣<1|r|<1.
    Here, r=1/2r = 1/2.

    E⁑[X]=1/2(1βˆ’1/2)2=1/2(1/2)2=1/21/4\operatorname{E}[X] = \frac{1/2}{(1 - 1/2)^2} = \frac{1/2}{(1/2)^2} = \frac{1/2}{1/4}
    E⁑[X]=12β‹…4=2\operatorname{E}[X] = \frac{1}{2} \cdot 4 = 2

    Therefore, E⁑[X]=2\operatorname{E}[X] = 2.
    Answer: a)Β βˆ‘pX(x)=1.Β b)Β FX(x)=1βˆ’12⌊xβŒ‹.Β c)Β E⁑[X]=2.\boxed{\text{a) } \sum p_X(x) = 1. \text{ b) } F_X(x) = 1 - \frac{1}{2^{\lfloor x \rfloor}}. \text{ c) } \operatorname{E}[X] = 2.}
    "
    :::

    ---

    Summary

    ❗ Key Takeaways for CMI

    • Random Variables Map Outcomes to Numbers: A random variable XX is a function X:Ξ©β†’RX: \Omega \to \mathbb{R}. Its range RXR_X is the set of all possible numerical values it can take.

    • PMF for Discrete RVs: The Probability Mass Function pX(x)=P(X=x)p_X(x) = P(X=x) describes the probability of a discrete random variable taking a specific value. It must satisfy pX(x)β‰₯0p_X(x) \ge 0 and βˆ‘xpX(x)=1\sum_x p_X(x) = 1.

    • CDF Provides Cumulative Probabilities: The Cumulative Distribution Function FX(x)=P(X≀x)F_X(x) = P(X \le x) gives the probability that XX is less than or equal to xx.

    • Functions of RVs are Crucial: To find the PMF of Y=g(X)Y=g(X), sum the probabilities pX(x)p_X(x) for all xx values that map to the same yy value. This is a common exam concept.

    • Expected Value and Variance: E[X]E[X] measures central tendency, and Var(X)Var(X) measures spread. Remember their formulas and properties, especially linearity of expectation and Var(aX+b)=a2Var(X)Var(aX+b) = a^2 Var(X).

    • Independence of XX and g(X)g(X) is Rare: XX and Y=g(X)Y=g(X) are generally dependent. Do not assume independence unless g(X)g(X) is a constant. Check P(X=x,Y=y)=P(X=x)P(Y=y)P(X=x, Y=y) = P(X=x)P(Y=y) for independence.

    ---

    What's Next?

    πŸ’‘ Continue Learning

    This topic connects to:

      • Common Discrete Distributions: Understanding specific PMFs (e.g., Bernoulli, Binomial, Poisson) that arise from specific random experiments. These distributions are built upon the fundamental concepts of random variables.

      • Joint Distributions of Multiple Random Variables: Extending the concepts of PMF, CDF, expectation, and variance to scenarios involving two or more random variables, exploring their relationships (e.g., covariance, correlation).

      • Continuous Random Variables: While this chapter focused on discrete RVs, the principles extend to continuous RVs using Probability Density Functions (PDFs) and integrals instead of sums.


    Master these connections for comprehensive CMI preparation!

    ---

    πŸ’‘ Moving Forward

    Now that you understand Random Variables, let's explore Distribution Functions which builds on these concepts.

    ---

    Part 2: Distribution Functions

    Introduction

    Distribution functions are fundamental to probability theory and statistics, providing a comprehensive way to describe the behavior of random variables. In the context of the CMI Masters in Data Science, a deep understanding of these functions is crucial for modeling real-world phenomena, performing statistical inference, and building predictive models. This topic covers the essential concepts of how probabilities are distributed across the possible values of a random variable, whether discrete or continuous. Mastery of distribution functions allows us to quantify uncertainty, calculate probabilities of events, and characterize key aspects like the central tendency and spread of data, which are indispensable skills for any data scientist.

    πŸ“– Random Variable

    A random variable is a function that maps the outcomes of a random experiment to real numbers. Random variables can be broadly classified into two types:

      • Discrete Random Variable: A random variable whose set of possible values is finite or countably infinite.

      • Continuous Random Variable: A random variable whose set of possible values is an interval (finite or infinite) on the real number line.

    ---

    Key Concepts

    1. Probability Mass Function (PMF)

    The Probability Mass Function (PMF) is used to describe the probability distribution of a discrete random variable. It assigns a probability to each possible value that the random variable can take.

    πŸ“– Probability Mass Function (PMF)

    For a discrete random variable XX, its Probability Mass Function (PMF), denoted by pX(x)p_X(x) or P(X=x)P(X=x), satisfies the following properties:

    • pX(x)β‰₯0p_X(x) \ge 0 for all possible values xx.

    • βˆ‘xpX(x)=1\sum_{x} p_X(x) = 1, where the sum is over all possible values of XX.

    Worked Example:

    Problem: Let XX be the number of heads in two coin tosses. Determine its PMF.

    Solution:

    Step 1: Identify the sample space and possible values of XX.

    The sample space for two coin tosses is S={HH,HT,TH,TT}S = \{HH, HT, TH, TT\}.
    The possible values for XX (number of heads) are 0,1,20, 1, 2.

    Step 2: Calculate the probability for each value of XX.

    P(X=0)=P({TT})=14P(X=0) = P(\{TT\}) = \frac{1}{4}
    P(X=1)=P({HT,TH})=24=12P(X=1) = P(\{HT, TH\}) = \frac{2}{4} = \frac{1}{2}
    P(X=2)=P({HH})=14P(X=2) = P(\{HH\}) = \frac{1}{4}

    Step 3: Write down the PMF.

    pX(x)={14,x=012,x=114,x=20,otherwisep_X(x) = \begin{cases} \frac{1}{4}, & x=0 \\ \frac{1}{2}, & x=1 \\ \frac{1}{4}, & x=2 \\ 0, & \text{otherwise} \end{cases}

    Answer: The PMF is pX(0)=1/4p_X(0)=1/4, pX(1)=1/2p_X(1)=1/2, pX(2)=1/4p_X(2)=1/4.

    ---

    2. Probability Density Function (PDF)

    The Probability Density Function (PDF) is used to describe the probability distribution of a continuous random variable. Unlike the PMF, the PDF does not give the probability of a specific value, but rather the relative likelihood of the random variable taking on a given value. Probabilities for continuous random variables are calculated over intervals.

    πŸ“– Probability Density Function (PDF)

    For a continuous random variable XX, its Probability Density Function (PDF), denoted by fX(x)f_X(x) or f(x)f(x), satisfies the following properties:

    • f(x)β‰₯0f(x) \ge 0 for all x∈Rx \in \mathbb{R}.

    • βˆ«βˆ’βˆžβˆžf(x)dx=1\int_{-\infty}^{\infty} f(x) dx = 1.

    The probability that XX falls into an interval [a,b][a, b] is given by P(a≀X≀b)=∫abf(x)dxP(a \le X \le b) = \int_a^b f(x) dx.

    ❗ Must Remember

    For a continuous random variable XX, the probability of XX taking any single specific value is 00. That is, P(X=x0)=0P(X=x_0) = 0 for any x0x_0. Consequently, P(a≀X≀b)=P(a<X≀b)=P(a≀X<b)=P(a<X<b)P(a \le X \le b) = P(a < X \le b) = P(a \le X < b) = P(a < X < b).

    Worked Example:

    Problem: Let XX be a continuous random variable with PDF f(x)=cx(1βˆ’x)f(x) = cx(1-x) for 0≀x≀10 \le x \le 1, and 00 otherwise.
    (a) Determine the value of cc.
    (b) Find the probability P(X>0.5)P(X > 0.5).

    Solution (a):

    Step 1: Apply the normalization property of a PDF.

    βˆ«βˆ’βˆžβˆžf(x)dx=1\int_{-\infty}^{\infty} f(x) dx = 1

    Step 2: Substitute the given PDF and integrate over its non-zero range.

    ∫01cx(1βˆ’x)dx=1\int_{0}^{1} cx(1-x) dx = 1

    Step 3: Simplify the integrand and perform the integration.

    c∫01(xβˆ’x2)dx=1c \int_{0}^{1} (x - x^2) dx = 1
    c[x22βˆ’x33]01=1c \left[ \frac{x^2}{2} - \frac{x^3}{3} \right]_{0}^{1} = 1
    c((122βˆ’133)βˆ’(022βˆ’033))=1c \left( \left( \frac{1^2}{2} - \frac{1^3}{3} \right) - \left( \frac{0^2}{2} - \frac{0^3}{3} \right) \right) = 1
    c(12βˆ’13)=1c \left( \frac{1}{2} - \frac{1}{3} \right) = 1
    c(3βˆ’26)=1c \left( \frac{3-2}{6} \right) = 1
    c(16)=1c \left( \frac{1}{6} \right) = 1

    Step 4: Solve for cc.

    c=6c = 6

    Answer (a): c=6c=6.

    Solution (b):

    Step 1: Set up the integral for P(X>0.5)P(X > 0.5) using the determined PDF.

    P(X>0.5)=∫0.51f(x)dxP(X > 0.5) = \int_{0.5}^{1} f(x) dx

    Step 2: Substitute the PDF with the value of cc.

    P(X>0.5)=∫0.516x(1βˆ’x)dxP(X > 0.5) = \int_{0.5}^{1} 6x(1-x) dx

    Step 3: Perform the integration.

    P(X>0.5)=6∫0.51(xβˆ’x2)dxP(X > 0.5) = 6 \int_{0.5}^{1} (x - x^2) dx
    P(X>0.5)=6[x22βˆ’x33]0.51P(X > 0.5) = 6 \left[ \frac{x^2}{2} - \frac{x^3}{3} \right]_{0.5}^{1}
    P(X>0.5)=6((122βˆ’133)βˆ’(0.522βˆ’0.533))P(X > 0.5) = 6 \left( \left( \frac{1^2}{2} - \frac{1^3}{3} \right) - \left( \frac{0.5^2}{2} - \frac{0.5^3}{3} \right) \right)
    P(X>0.5)=6((12βˆ’13)βˆ’(0.252βˆ’0.1253))P(X > 0.5) = 6 \left( \left( \frac{1}{2} - \frac{1}{3} \right) - \left( \frac{0.25}{2} - \frac{0.125}{3} \right) \right)
    P(X>0.5)=6(16βˆ’(18βˆ’124))P(X > 0.5) = 6 \left( \frac{1}{6} - \left( \frac{1}{8} - \frac{1}{24} \right) \right)
    P(X>0.5)=6(16βˆ’(3βˆ’124))P(X > 0.5) = 6 \left( \frac{1}{6} - \left( \frac{3-1}{24} \right) \right)
    P(X>0.5)=6(16βˆ’224)P(X > 0.5) = 6 \left( \frac{1}{6} - \frac{2}{24} \right)
    P(X>0.5)=6(16βˆ’112)P(X > 0.5) = 6 \left( \frac{1}{6} - \frac{1}{12} \right)
    P(X>0.5)=6(2βˆ’112)P(X > 0.5) = 6 \left( \frac{2-1}{12} \right)
    P(X>0.5)=6(112)P(X > 0.5) = 6 \left( \frac{1}{12} \right)
    P(X>0.5)=12P(X > 0.5) = \frac{1}{2}

    Answer (b): P(X>0.5)=0.5P(X > 0.5) = 0.5.









    xx
    f(x)f(x)



    0

    0.5

    1














    P(X>0.5)P(X > 0.5)



    Max (0.5,1.5)(0.5, 1.5)

    f(x)=6x(1βˆ’x)f(x) = 6x(1-x)

    ---

    3. Cumulative Distribution Function (CDF)

    The Cumulative Distribution Function (CDF) provides the probability that a random variable XX takes a value less than or equal to a given value xx. It is defined for both discrete and continuous random variables.

    πŸ“– Cumulative Distribution Function (CDF)

    For any random variable XX, its Cumulative Distribution Function (CDF), denoted by FX(x)F_X(x) or F(x)F(x), is defined as:

    F(x)=P(X≀x)F(x) = P(X \le x)

    Properties of a CDF:
    • 0≀F(x)≀10 \le F(x) \le 1 for all x∈Rx \in \mathbb{R}.

    • F(x)F(x) is non-decreasing: if a<ba < b, then F(a)≀F(b)F(a) \le F(b).

    • lim⁑xβ†’βˆ’βˆžF(x)=0\lim_{x \to -\infty} F(x) = 0.

    • lim⁑xβ†’βˆžF(x)=1\lim_{x \to \infty} F(x) = 1.

    • F(x)F(x) is right-continuous: lim⁑tβ†’x+F(t)=F(x)\lim_{t \to x^+} F(t) = F(x).

    For a discrete random variable XX with PMF pX(x)p_X(x):

    F(x)=βˆ‘t≀xpX(t)F(x) = \sum_{t \le x} p_X(t)

    For a continuous random variable XX with PDF fX(x)f_X(x):
    F(x)=βˆ«βˆ’βˆžxfX(t)dtF(x) = \int_{-\infty}^{x} f_X(t) dt

    Conversely, if F(x)F(x) is differentiable, then fX(x)=ddxF(x)f_X(x) = \frac{d}{dx} F(x).

    πŸ“ Probability from CDF

    For any random variable XX:

    P(a<X≀b)=F(b)βˆ’F(a)P(a < X \le b) = F(b) - F(a)

    For a continuous random variable:
    P(X>a)=1βˆ’F(a)P(X > a) = 1 - F(a)

    Variables:

      • F(x)F(x) = Cumulative Distribution Function

      • P(X≀x)P(X \le x) = Probability that XX is less than or equal to xx


    When to use: Calculating probabilities over intervals for any type of random variable.

    Worked Example:

    Problem: For the continuous random variable XX with PDF f(x)=6x(1βˆ’x)f(x) = 6x(1-x) for 0≀x≀10 \le x \le 1, and 00 otherwise, find its CDF F(x)F(x). Then, use the CDF to find P(X>0.5)P(X > 0.5).

    Solution:

    Step 1: Define F(x)F(x) for different ranges of xx.

    For x<0x < 0:

    F(x)=βˆ«βˆ’βˆžx0 dt=0F(x) = \int_{-\infty}^{x} 0 \, dt = 0

    For 0≀x≀10 \le x \le 1:

    F(x)=βˆ«βˆ’βˆžxf(t)dt=∫0x6t(1βˆ’t)dtF(x) = \int_{-\infty}^{x} f(t) dt = \int_{0}^{x} 6t(1-t) dt
    F(x)=6∫0x(tβˆ’t2)dtF(x) = 6 \int_{0}^{x} (t - t^2) dt
    F(x)=6[t22βˆ’t33]0xF(x) = 6 \left[ \frac{t^2}{2} - \frac{t^3}{3} \right]_{0}^{x}
    F(x)=6(x22βˆ’x33)F(x) = 6 \left( \frac{x^2}{2} - \frac{x^3}{3} \right)
    F(x)=3x2βˆ’2x3F(x) = 3x^2 - 2x^3

    For x>1x > 1:

    F(x)=βˆ«βˆ’βˆž1f(t)dt+∫1x0 dt=∫016t(1βˆ’t)dtF(x) = \int_{-\infty}^{1} f(t) dt + \int_{1}^{x} 0 \, dt = \int_{0}^{1} 6t(1-t) dt

    From the previous example, we know this integral equals 1.

    F(x)=1F(x) = 1

    Step 2: Combine the parts to write the full CDF.

    F(x)={0,x<03x2βˆ’2x3,0≀x≀11,x>1F(x) = \begin{cases} 0, & x < 0 \\ 3x^2 - 2x^3, & 0 \le x \le 1 \\ 1, & x > 1 \end{cases}

    Step 3: Use the CDF to find P(X>0.5)P(X > 0.5).

    P(X>0.5)=1βˆ’F(0.5)P(X > 0.5) = 1 - F(0.5)
    F(0.5)=3(0.5)2βˆ’2(0.5)3F(0.5) = 3(0.5)^2 - 2(0.5)^3
    F(0.5)=3(0.25)βˆ’2(0.125)F(0.5) = 3(0.25) - 2(0.125)
    F(0.5)=0.75βˆ’0.25F(0.5) = 0.75 - 0.25
    F(0.5)=0.5F(0.5) = 0.5
    P(X>0.5)=1βˆ’0.5=0.5P(X > 0.5) = 1 - 0.5 = 0.5

    Answer: The CDF is F(x)=3x2βˆ’2x3F(x) = 3x^2 - 2x^3 for 0≀x≀10 \le x \le 1, and P(X>0.5)=0.5P(X > 0.5) = 0.5.

    ---

    ---

    #
    ## 4. Expected Value (Mean)

    The expected value, or mean, of a random variable is a measure of its central tendency. It represents the average value one would expect if the experiment were repeated many times.

    πŸ“– Expected Value (Mean)

    For a discrete random variable XX with PMF pX(x)p_X(x):

    E[X]=βˆ‘xxβ‹…pX(x)E[X] = \sum_{x} x \cdot p_X(x)

    For a continuous random variable XX with PDF fX(x)f_X(x):
    E[X]=βˆ«βˆ’βˆžβˆžxβ‹…fX(x)dxE[X] = \int_{-\infty}^{\infty} x \cdot f_X(x) dx

    πŸ“ Expected Value of a Function

    For a discrete random variable XX and a function g(X)g(X):

    E[g(X)]=βˆ‘xg(x)β‹…pX(x)E[g(X)] = \sum_{x} g(x) \cdot p_X(x)

    For a continuous random variable XX and a function g(X)g(X):
    E[g(X)]=βˆ«βˆ’βˆžβˆžg(x)β‹…fX(x)dxE[g(X)] = \int_{-\infty}^{\infty} g(x) \cdot f_X(x) dx

    Variables:

      • XX = Random variable

      • pX(x)p_X(x) = PMF of XX

      • fX(x)f_X(x) = PDF of XX

      • g(X)g(X) = A function of XX


    When to use: To find the average value of a random variable or a function of a random variable.

    Worked Example:

    Problem: Find the expected value of XX for the continuous random variable with PDF f(x)=6x(1βˆ’x)f(x) = 6x(1-x) for 0≀x≀10 \le x \le 1.

    Solution:

    Step 1: Apply the formula for the expected value of a continuous random variable.

    E[X]=βˆ«βˆ’βˆžβˆžxβ‹…f(x)dxE[X] = \int_{-\infty}^{\infty} x \cdot f(x) dx

    Step 2: Substitute the PDF and integrate over its non-zero range.

    E[X]=∫01xβ‹…[6x(1βˆ’x)]dxE[X] = \int_{0}^{1} x \cdot [6x(1-x)] dx
    E[X]=6∫01(x2βˆ’x3)dxE[X] = 6 \int_{0}^{1} (x^2 - x^3) dx

    Step 3: Perform the integration.

    E[X]=6[x33βˆ’x44]01E[X] = 6 \left[ \frac{x^3}{3} - \frac{x^4}{4} \right]_{0}^{1}
    E[X]=6((133βˆ’144)βˆ’(033βˆ’044))E[X] = 6 \left( \left( \frac{1^3}{3} - \frac{1^4}{4} \right) - \left( \frac{0^3}{3} - \frac{0^4}{4} \right) \right)
    E[X]=6(13βˆ’14)E[X] = 6 \left( \frac{1}{3} - \frac{1}{4} \right)
    E[X]=6(4βˆ’312)E[X] = 6 \left( \frac{4-3}{12} \right)
    E[X]=6(112)E[X] = 6 \left( \frac{1}{12} \right)
    E[X]=12E[X] = \frac{1}{2}

    Answer: E[X]=0.5E[X] = 0.5.

    ---

    #
    ## 5. Variance

    The variance measures the spread or dispersion of a random variable's values around its expected value. A higher variance indicates greater variability.

    πŸ“– Variance

    The variance of a random variable XX, denoted by Var⁑(X)\operatorname{Var}(X) or ΟƒX2\sigma^2_X, is defined as:

    Var⁑(X)=E[(Xβˆ’E[X])2]\operatorname{Var}(X) = E[(X - E[X])^2]

    An equivalent and often more convenient formula is:
    Var⁑(X)=E[X2]βˆ’(E[X])2\operatorname{Var}(X) = E[X^2] - (E[X])^2

    The standard deviation, ΟƒX\sigma_X, is the positive square root of the variance: ΟƒX=Var⁑(X)\sigma_X = \sqrt{\operatorname{Var}(X)}.

    πŸ“ Variance Calculation

    For a discrete random variable XX:

    Var⁑(X)=βˆ‘xx2pX(x)βˆ’(βˆ‘xxpX(x))2\operatorname{Var}(X) = \sum_{x} x^2 p_X(x) - \left( \sum_{x} x p_X(x) \right)^2

    For a continuous random variable XX:
    Var⁑(X)=βˆ«βˆ’βˆžβˆžx2fX(x)dxβˆ’(βˆ«βˆ’βˆžβˆžxfX(x)dx)2\operatorname{Var}(X) = \int_{-\infty}^{\infty} x^2 f_X(x) dx - \left( \int_{-\infty}^{\infty} x f_X(x) dx \right)^2

    Variables:

      • XX = Random variable

      • pX(x)p_X(x) = PMF of XX

      • fX(x)f_X(x) = PDF of XX


    When to use: To quantify the spread or dispersion of a random variable's values.

    Worked Example:

    Problem: Find the variance of XX for the continuous random variable with PDF f(x)=6x(1βˆ’x)f(x) = 6x(1-x) for 0≀x≀10 \le x \le 1. (We already found E[X]=0.5E[X] = 0.5.)

    Solution:

    Step 1: Calculate E[X2]E[X^2].

    E[X2]=βˆ«βˆ’βˆžβˆžx2β‹…f(x)dxE[X^2] = \int_{-\infty}^{\infty} x^2 \cdot f(x) dx
    E[X2]=∫01x2β‹…[6x(1βˆ’x)]dxE[X^2] = \int_{0}^{1} x^2 \cdot [6x(1-x)] dx
    E[X2]=6∫01(x3βˆ’x4)dxE[X^2] = 6 \int_{0}^{1} (x^3 - x^4) dx
    E[X2]=6[x44βˆ’x55]01E[X^2] = 6 \left[ \frac{x^4}{4} - \frac{x^5}{5} \right]_{0}^{1}
    E[X2]=6(14βˆ’15)E[X^2] = 6 \left( \frac{1}{4} - \frac{1}{5} \right)
    E[X2]=6(5βˆ’420)E[X^2] = 6 \left( \frac{5-4}{20} \right)
    E[X2]=6(120)E[X^2] = 6 \left( \frac{1}{20} \right)
    E[X2]=620=310=0.3E[X^2] = \frac{6}{20} = \frac{3}{10} = 0.3

    Step 2: Use the variance formula Var⁑(X)=E[X2]βˆ’(E[X])2\operatorname{Var}(X) = E[X^2] - (E[X])^2.

    We have E[X2]=0.3E[X^2] = 0.3 and E[X]=0.5E[X] = 0.5.

    Var⁑(X)=0.3βˆ’(0.5)2\operatorname{Var}(X) = 0.3 - (0.5)^2
    Var⁑(X)=0.3βˆ’0.25\operatorname{Var}(X) = 0.3 - 0.25
    Var⁑(X)=0.05\operatorname{Var}(X) = 0.05

    Answer: Var⁑(X)=0.05\operatorname{Var}(X) = 0.05.

    ---

    #
    ## 6. Properties of Expectation and Variance

    These properties simplify calculations involving sums and linear transformations of random variables.

    πŸ“ Linearity of Expectation

    For any random variables X1,X2,…,XnX_1, X_2, \dots, X_n and constants a1,a2,…,an,ba_1, a_2, \dots, a_n, b:

    E[a1X1+a2X2+β‹―+anXn+b]=a1E[X1]+a2E[X2]+β‹―+anE[Xn]+bE[a_1 X_1 + a_2 X_2 + \dots + a_n X_n + b] = a_1 E[X_1] + a_2 E[X_2] + \dots + a_n E[X_n] + b

    A special case for a single random variable XX:
    E[aX+b]=aE[X]+bE[aX + b] = a E[X] + b

    When to use: To easily find the expected value of linear combinations of random variables, regardless of their independence.

    πŸ“ Properties of Variance

    For any random variable XX and constants a,ba, b:

    Var⁑(aX+b)=a2Var⁑(X)\operatorname{Var}(aX + b) = a^2 \operatorname{Var}(X)

    For independent random variables X1,X2,…,XnX_1, X_2, \dots, X_n:
    Var⁑(X1+X2+β‹―+Xn)=Var⁑(X1)+Var⁑(X2)+β‹―+Var⁑(Xn)\operatorname{Var}(X_1 + X_2 + \dots + X_n) = \operatorname{Var}(X_1) + \operatorname{Var}(X_2) + \dots + \operatorname{Var}(X_n)

    And for independent random variables X1,…,XnX_1, \dots, X_n and constants a1,…,ana_1, \dots, a_n:
    Var⁑(a1X1+β‹―+anXn)=a12Var⁑(X1)+β‹―+an2Var⁑(Xn)\operatorname{Var}(a_1 X_1 + \dots + a_n X_n) = a_1^2 \operatorname{Var}(X_1) + \dots + a_n^2 \operatorname{Var}(X_n)

    When to use: To find the variance of linear transformations or sums of independent random variables.

    ⚠️ Common Mistake

    ❌ Assuming Var⁑(X+Y)=Var⁑(X)+Var⁑(Y)\operatorname{Var}(X+Y) = \operatorname{Var}(X) + \operatorname{Var}(Y) for any random variables X,YX, Y.
    βœ… The property Var⁑(X+Y)=Var⁑(X)+Var⁑(Y)\operatorname{Var}(X+Y) = \operatorname{Var}(X) + \operatorname{Var}(Y) holds only if XX and YY are independent. If they are not independent, the covariance term must be included: Var⁑(X+Y)=Var⁑(X)+Var⁑(Y)+2Cov⁑(X,Y)\operatorname{Var}(X+Y) = \operatorname{Var}(X) + \operatorname{Var}(Y) + 2\operatorname{Cov}(X,Y).

    ---

    #
    ## 7. Central Limit Theorem (CLT) and Normal Approximation

    The Central Limit Theorem (CLT) is one of the most powerful theorems in statistics. It explains why many natural phenomena follow a normal distribution, even if the individual components contributing to them do not.

    πŸ“– Central Limit Theorem (CLT)

    Let X1,X2,…,XnX_1, X_2, \dots, X_n be a sequence of independent and identically distributed (i.i.d.) random variables, each with finite mean E[Xi]=ΞΌE[X_i] = \mu and finite variance Var⁑(Xi)=Οƒ2\operatorname{Var}(X_i) = \sigma^2.
    As nn approaches infinity, the distribution of the sample mean XΛ‰n=1nβˆ‘i=1nXi\bar{X}_n = \frac{1}{n} \sum_{i=1}^{n} X_i approaches a normal distribution with mean ΞΌ\mu and variance Οƒ2n\frac{\sigma^2}{n}.
    That is, for large nn:

    XΛ‰n∼N(ΞΌ,Οƒ2n)\bar{X}_n \sim N\left(\mu, \frac{\sigma^2}{n}\right)

    Equivalently, the distribution of the sum Sn=βˆ‘i=1nXiS_n = \sum_{i=1}^{n} X_i approaches a normal distribution with mean nΞΌn\mu and variance nΟƒ2n\sigma^2:
    Sn∼N(nΞΌ,nΟƒ2)S_n \sim N(n\mu, n\sigma^2)

    The standardized random variable Z=XΛ‰nβˆ’ΞΌΟƒ/nZ = \frac{\bar{X}_n - \mu}{\sigma/\sqrt{n}} (or Z=Snβˆ’nΞΌΟƒnZ = \frac{S_n - n\mu}{\sigma\sqrt{n}}) approaches a standard normal distribution N(0,1)N(0,1) as nβ†’βˆžn \to \infty.

    πŸ’‘ Exam Shortcut

    For problems involving sums or averages of a large number of i.i.d. random variables, immediately think of applying the Central Limit Theorem to approximate the distribution as normal. This allows you to use Z-scores and standard normal tables for probability calculations.

    Worked Example:

    Problem: The time taken (in minutes) for a data scientist to complete a specific task is a random variable with mean 1515 minutes and standard deviation 44 minutes. If a data scientist completes 100100 such tasks independently, what is the approximate probability that the total time taken for these 100100 tasks is less than 14501450 minutes?

    Solution:

    Step 1: Identify the given parameters for a single task XiX_i.

    Mean E[Xi]=ΞΌ=15E[X_i] = \mu = 15 minutes.
    Standard deviation Οƒ=4\sigma = 4 minutes.
    Number of tasks n=100n = 100.

    Step 2: Define the total time S100S_{100} and apply CLT.

    The total time for 100100 tasks is S100=βˆ‘i=1100XiS_{100} = \sum_{i=1}^{100} X_i.
    By the Central Limit Theorem, for large nn, SnS_n is approximately normally distributed.

    Calculate the mean of S100S_{100}:

    E[S100]=nΞΌ=100Γ—15=1500E[S_{100}] = n\mu = 100 \times 15 = 1500

    Calculate the variance of S100S_{100}:

    Var⁑(S100)=nΟƒ2=100Γ—(42)=100Γ—16=1600\operatorname{Var}(S_{100}) = n\sigma^2 = 100 \times (4^2) = 100 \times 16 = 1600

    Calculate the standard deviation of S100S_{100}:

    ΟƒS100=1600=40\sigma_{S_{100}} = \sqrt{1600} = 40

    So, S100∼N(1500,1600)S_{100} \sim N(1500, 1600) approximately.

    Step 3: Standardize the random variable to use the Z-score.

    We want to find P(S100<1450)P(S_{100} < 1450).

    Z=S100βˆ’E[S100]ΟƒS100Z = \frac{S_{100} - E[S_{100}]}{\sigma_{S_{100}}}

    Z=1450βˆ’150040Z = \frac{1450 - 1500}{40}
    Z=βˆ’5040Z = \frac{-50}{40}
    Z=βˆ’1.25Z = -1.25

    Step 4: Look up the probability using the standard normal CDF (or Z-table).

    P(S100<1450)β‰ˆP(Z<βˆ’1.25)P(S_{100} < 1450) \approx P(Z < -1.25)

    Using a standard normal table or calculator, P(Z<βˆ’1.25)β‰ˆ0.1056P(Z < -1.25) \approx 0.1056.

    Answer: The approximate probability that the total time taken is less than 14501450 minutes is 0.10560.1056.

    ---

    #
    ## 8. Standardization (Z-score)

    Standardization transforms a random variable into a standard score (Z-score), which represents how many standard deviations an observation is from the mean. This is particularly useful for comparing values from different normal distributions or for using standard normal tables.

    πŸ“ Z-score

    For a random variable XX with mean ΞΌ\mu and standard deviation Οƒ\sigma:

    Z=Xβˆ’ΞΌΟƒZ = \frac{X - \mu}{\sigma}

    Variables:

      • ZZ = Standardized score (Z-score)

      • XX = Value of the random variable

      • ΞΌ\mu = Mean of XX

      • Οƒ\sigma = Standard deviation of XX


    When to use: To transform any normally distributed variable into a standard normal variable N(0,1)N(0,1), allowing for the use of standard normal tables to find probabilities. Also used in conjunction with the CLT.

    ---

    Problem-Solving Strategies

    πŸ’‘ CMI Strategy

    • Identify Random Variable Type: First, determine if the random variable is discrete or continuous. This dictates whether to use PMF/summation or PDF/integration.

    • Check PDF/PMF Properties: For questions involving determining constants or verifying a function, always use βˆ‘pX(x)=1\sum p_X(x) = 1 (for discrete) or ∫f(x)dx=1\int f(x) dx = 1 (for continuous). Remember f(x)β‰₯0f(x) \ge 0.

    • Probability from CDF/PDF: P(a<X≀b)=F(b)βˆ’F(a)P(a < X \le b) = F(b) - F(a) for CDF. For PDF, it's ∫abf(x)dx\int_a^b f(x) dx.

    • Expectation & Variance: Remember the "shortcut" formula for variance: Var⁑(X)=E[X2]βˆ’(E[X])2\operatorname{Var}(X) = E[X^2] - (E[X])^2.

    • CLT Application: When dealing with sums or averages of a large number of independent and identically distributed random variables, the Central Limit Theorem is your go-to. This implies a normal approximation, and thus Z-scores.

    • Read Carefully: Pay attention to "total number," "average number," "more than," "less than," "at least," etc., to set up the correct integral or sum limits and inequalities.

    ---

    Common Mistakes

    ⚠️ Avoid These Errors
      • ❌ Confusing PMF and PDF: Using integration for a discrete random variable or summation for a continuous one.
    βœ… Correct: PMF for discrete (summation), PDF for continuous (integration).
      • ❌ Incorrect PDF Properties: Forgetting to check f(x)β‰₯0f(x) \ge 0 or not normalizing the integral to 1.
    βœ… Correct: Always ensure f(x)β‰₯0f(x) \ge 0 and ∫f(x)dx=1\int f(x) dx = 1.
      • ❌ Probability of a single point for continuous RV: Assuming P(X=x0)P(X=x_0) is non-zero for a continuous random variable.
    βœ… Correct: For continuous RVs, P(X=x0)=0P(X=x_0) = 0. Probabilities are over intervals.
      • ❌ Ignoring Independence for Variance Sums: Applying Var⁑(X+Y)=Var⁑(X)+Var⁑(Y)\operatorname{Var}(X+Y) = \operatorname{Var}(X) + \operatorname{Var}(Y) when XX and YY are not independent.
    βœ… Correct: This property requires independence. If not independent, use Var⁑(X+Y)=Var⁑(X)+Var⁑(Y)+2Cov⁑(X,Y)\operatorname{Var}(X+Y) = \operatorname{Var}(X) + \operatorname{Var}(Y) + 2\operatorname{Cov}(X,Y).
      • ❌ Misapplying CLT: Using CLT for small sample sizes or for variables that are not i.i.d.
    βœ… Correct: CLT is for large nn (typically nβ‰₯30n \ge 30) and i.i.d. random variables.
      • ❌ Calculation Errors in Integration/Summation: Simple algebraic or calculus mistakes when evaluating integrals or sums.
    βœ… Correct: Double-check calculations, especially definite integrals and series summations.

    ---

    ---

    Practice Questions

    :::question type="MCQ" question="Let XX be a continuous random variable with the probability density function given by:

    f(x)={keβˆ’2x,x>00,otherwisef(x) = \begin{cases} k e^{-2x}, & x > 0 \\ 0, & \text{otherwise} \end{cases}

    What is the value of kk that makes f(x)f(x) a valid PDF?" options=["1/21/2","11","22","ee"] answer="22" hint="Use the property that the integral of a PDF over its entire range must equal 1." solution="Step 1: Apply the normalization condition for a PDF.
    βˆ«βˆ’βˆžβˆžf(x)dx=1\int_{-\infty}^{\infty} f(x) dx = 1

    Step 2: Substitute the given PDF into the integral.
    ∫0∞keβˆ’2xdx=1\int_{0}^{\infty} k e^{-2x} dx = 1

    Step 3: Evaluate the integral.
    k[βˆ’12eβˆ’2x]0∞=1k \left[ -\frac{1}{2} e^{-2x} \right]_{0}^{\infty} = 1

    k(lim⁑bβ†’βˆž(βˆ’12eβˆ’2b)βˆ’(βˆ’12e0))=1k \left( \lim_{b \to \infty} \left( -\frac{1}{2} e^{-2b} \right) - \left( -\frac{1}{2} e^{0} \right) \right) = 1

    k(0βˆ’(βˆ’12))=1k \left( 0 - \left( -\frac{1}{2} \right) \right) = 1

    k(12)=1k \left( \frac{1}{2} \right) = 1

    Step 4: Solve for kk.
    k=2k = 2

    Answer: \boxed{2}"
    :::

    :::question type="NAT" question="A discrete random variable YY has the following Probability Mass Function:

    P(Y=y)=cy+1,forΒ y=0,1,2P(Y=y) = \frac{c}{y+1}, \quad \text{for } y=0, 1, 2

    What is the value of cc (rounded to two decimal places)?" answer="0.55" hint="The sum of all probabilities in a PMF must be equal to 1." solution="Step 1: Apply the normalization condition for a PMF.
    βˆ‘y=02P(Y=y)=1\sum_{y=0}^{2} P(Y=y) = 1

    Step 2: Sum the probabilities for each possible value of YY.
    P(Y=0)=c0+1=cP(Y=0) = \frac{c}{0+1} = c

    P(Y=1)=c1+1=c2P(Y=1) = \frac{c}{1+1} = \frac{c}{2}

    P(Y=2)=c2+1=c3P(Y=2) = \frac{c}{2+1} = \frac{c}{3}

    Step 3: Set the sum equal to 1 and solve for cc.
    c+c2+c3=1c + \frac{c}{2} + \frac{c}{3} = 1

    Find a common denominator (6).
    6c6+3c6+2c6=1\frac{6c}{6} + \frac{3c}{6} + \frac{2c}{6} = 1

    11c6=1\frac{11c}{6} = 1

    c=611c = \frac{6}{11}

    Step 4: Round to two decimal places.
    cβ‰ˆ0.5454...β‰ˆ0.55c \approx 0.5454... \approx 0.55

    Answer: \boxed{0.55}"
    :::

    :::question type="MSQ" question="Let XX be a continuous random variable with CDF given by:

    F(x)={0,x<0x2,0≀x<11,xβ‰₯1F(x) = \begin{cases} 0, & x < 0 \\ x^2, & 0 \le x < 1 \\ 1, & x \ge 1 \end{cases}

    Which of the following statements is/are true?" options=["The PDF of XX is f(x)=2xf(x) = 2x for 0≀x<10 \le x < 1.","P(X≀0.5)=0.25P(X \le 0.5) = 0.25.","P(0.2<X<0.8)=0.6P(0.2 < X < 0.8) = 0.6.","E[X]=2/3E[X] = 2/3." ] answer="A,B,C,D" hint="Remember that f(x)=dF(x)/dxf(x) = dF(x)/dx for continuous random variables. Use the CDF to find probabilities. Calculate E[X]=∫xf(x)dxE[X] = \int x f(x) dx." solution="Statement A: The PDF of XX is f(x)=2xf(x) = 2x for 0≀x<10 \le x < 1.
    To find the PDF from the CDF, differentiate the CDF:
    f(x)=ddxF(x)=ddx(x2)=2xf(x) = \frac{d}{dx} F(x) = \frac{d}{dx} (x^2) = 2x

    This is valid for 0≀x<10 \le x < 1. For x<0x<0 and xβ‰₯1x \ge 1, f(x)=0f(x)=0. So, statement A is true.

    Statement B: P(X≀0.5)=0.25P(X \le 0.5) = 0.25.
    Using the CDF definition:

    P(X≀0.5)=F(0.5)P(X \le 0.5) = F(0.5)

    Since 0≀0.5<10 \le 0.5 < 1, we use F(x)=x2F(x) = x^2:
    F(0.5)=(0.5)2=0.25F(0.5) = (0.5)^2 = 0.25

    So, statement B is true.

    Statement C: P(0.2<X<0.8)=0.6P(0.2 < X < 0.8) = 0.6.
    Using the CDF property P(a<X<b)=F(b)βˆ’F(a)P(a < X < b) = F(b) - F(a):

    P(0.2<X<0.8)=F(0.8)βˆ’F(0.2)P(0.2 < X < 0.8) = F(0.8) - F(0.2)

    F(0.8)=(0.8)2=0.64F(0.8) = (0.8)^2 = 0.64

    F(0.2)=(0.2)2=0.04F(0.2) = (0.2)^2 = 0.04

    P(0.2<X<0.8)=0.64βˆ’0.04=0.60P(0.2 < X < 0.8) = 0.64 - 0.04 = 0.60

    So, statement C is true.

    Statement D: E[X]=2/3E[X] = 2/3.
    Using the PDF f(x)=2xf(x) = 2x for 0≀x<10 \le x < 1:

    E[X]=βˆ«βˆ’βˆžβˆžxf(x)dx=∫01x(2x)dxE[X] = \int_{-\infty}^{\infty} x f(x) dx = \int_{0}^{1} x (2x) dx

    E[X]=∫012x2dxE[X] = \int_{0}^{1} 2x^2 dx

    E[X]=[2x33]01E[X] = \left[ \frac{2x^3}{3} \right]_{0}^{1}

    E[X]=2(1)33βˆ’2(0)33=23E[X] = \frac{2(1)^3}{3} - \frac{2(0)^3}{3} = \frac{2}{3}

    So, statement D is true.

    All options A, B, C, D are true.
    Answer: \boxed{A,B,C,D}"
    :::

    :::question type="SUB" question="A manufacturing process produces items whose weights are independent random variables with a mean of 1010 kg and a standard deviation of 22 kg. A sample of 6464 items is taken from the production line.
    (a) What is the probability that the average weight of the items in the sample is less than 9.59.5 kg?
    (b) What is the total expected weight of the 6464 items?" answer="0.02280.0228, 640640 kg" hint="For part (a), use the Central Limit Theorem to approximate the distribution of the sample mean. For part (b), use the linearity of expectation for a sum of random variables." solution="(a) Probability that the average weight is less than 9.59.5 kg:

    Step 1: Identify parameters for a single item XiX_i.
    Mean E[Xi]=ΞΌ=10E[X_i] = \mu = 10 kg.
    Standard deviation Οƒ=2\sigma = 2 kg.
    Sample size n=64n = 64.

    Step 2: Apply the Central Limit Theorem to the sample mean Xˉn\bar{X}_n.
    For large nn, Xˉn\bar{X}_n is approximately normally distributed with:
    Mean of sample mean: E[Xˉn]=μ=10E[\bar{X}_n] = \mu = 10 kg.
    Standard deviation of sample mean (standard error): σXˉn=σn=264=28=0.25\sigma_{\bar{X}_n} = \frac{\sigma}{\sqrt{n}} = \frac{2}{\sqrt{64}} = \frac{2}{8} = 0.25 kg.
    So, XΛ‰64∼N(10,(0.25)2)\bar{X}_{64} \sim N(10, (0.25)^2) approximately.

    Step 3: Standardize the value 9.59.5 kg.
    We want to find P(Xˉ64<9.5)P(\bar{X}_{64} < 9.5).

    Z=XΛ‰64βˆ’E[XΛ‰64]ΟƒXΛ‰64Z = \frac{\bar{X}_{64} - E[\bar{X}_{64}]}{\sigma_{\bar{X}_{64}}}

    Z=9.5βˆ’100.25Z = \frac{9.5 - 10}{0.25}

    Z=βˆ’0.50.25Z = \frac{-0.5}{0.25}

    Z=βˆ’2Z = -2

    Step 4: Look up the probability using the standard normal CDF.

    P(XΛ‰64<9.5)β‰ˆP(Z<βˆ’2)P(\bar{X}_{64} < 9.5) \approx P(Z < -2)

    Using a standard normal table or calculator, P(Z<βˆ’2)β‰ˆ0.0228P(Z < -2) \approx 0.0228.

    (b) Total expected weight of the 6464 items:

    Step 1: Define the total weight S64S_{64}.
    S64=βˆ‘i=164XiS_{64} = \sum_{i=1}^{64} X_i.

    Step 2: Apply the linearity of expectation.

    E[S64]=E[βˆ‘i=164Xi]=βˆ‘i=164E[Xi]E[S_{64}] = E\left[\sum_{i=1}^{64} X_i\right] = \sum_{i=1}^{64} E[X_i]

    Since each E[Xi]=ΞΌ=10E[X_i] = \mu = 10 kg:
    E[S64]=64Γ—10=640E[S_{64}] = 64 \times 10 = 640

    Answer: (a) \boxed{0.0228}, (b) \boxed{640 \text{ kg}}"
    :::

    ---

    Summary

    ❗ Key Takeaways for CMI

    • PMF vs. PDF: Discrete random variables use Probability Mass Functions (PMF) which sum to 1. Continuous random variables use Probability Density Functions (PDF) which integrate to 1.

    • CDF for All: The Cumulative Distribution Function (CDF) F(x)=P(X≀x)F(x) = P(X \le x) is defined for both discrete and continuous variables, is non-decreasing, and ranges from 0 to 1. P(a<X≀b)=F(b)βˆ’F(a)P(a < X \le b) = F(b) - F(a).

    • Expected Value & Variance: These measure central tendency and spread. Remember E[aX+b]=aE[X]+bE[aX+b] = aE[X]+b and Var⁑(aX+b)=a2Var⁑(X)\operatorname{Var}(aX+b) = a^2\operatorname{Var}(X). For independent variables, Var⁑(X+Y)=Var⁑(X)+Var⁑(Y)\operatorname{Var}(X+Y) = \operatorname{Var}(X) + \operatorname{Var}(Y).

    • Central Limit Theorem (CLT): For a large number of i.i.d. random variables, their sum or average is approximately normally distributed. This is critical for inferential statistics and often tested in CMI.

    • Standardization (Z-score): Use Z=(Xβˆ’ΞΌ)/ΟƒZ = (X - \mu)/\sigma to convert any normal random variable to a standard normal N(0,1)N(0,1), which allows for probability look-ups in Z-tables.

    ---

    What's Next?

    πŸ’‘ Continue Learning

    This topic connects to:

      • Specific Probability Distributions: Understanding distribution functions is the foundation for studying named distributions like Bernoulli, Binomial, Poisson, Exponential, Uniform, and Normal distributions. Each has its own PMF/PDF and CDF.

      • Joint Distributions: Extending these concepts to multiple random variables, understanding their joint behavior, and concepts like covariance and correlation.

      • Statistical Inference: The Central Limit Theorem forms the bedrock for hypothesis testing and confidence intervals, allowing us to make inferences about population parameters from sample data.


    Master these connections for comprehensive CMI preparation!

    ---

    πŸ’‘ Moving Forward

    Now that you understand Distribution Functions, let's explore Expectation and Variance which builds on these concepts.

    ---

    Part 3: Expectation and Variance

    Introduction

    Expectation and variance are fundamental concepts in probability theory, providing concise summaries of the central tendency and spread of a random variable's distribution. The expectation, or expected value, quantifies the "average" outcome of a random variable over a large number of trials. It represents the weighted average of all possible values a random variable can take, with weights given by their respective probabilities. The variance, on the other hand, measures the dispersion or spread of the random variable's values around its expected value. A low variance indicates that values tend to be close to the mean, while a high variance suggests that values are spread out over a wider range.

    In the CMI examination, a deep understanding of expectation and variance is crucial. These concepts are extensively tested, often through complex scenarios involving multiple random variables, indicator functions, and various probability distributions. Mastery of linearity of expectation and the properties of variance is essential for efficiently solving problems that might otherwise appear intractable.

    πŸ“– Random Variable

    A random variable is a function that maps the outcomes of a random experiment to real numbers. It can be discrete (taking on a finite or countably infinite number of values) or continuous (taking on any value within a given interval).

    ---

    Key Concepts

    #
    ## 1. Expectation of a Random Variable

    The expectation, also known as the expected value or mean, of a random variable XX is denoted by E[X]E[X] or ΞΌ\mu. It represents the long-run average value of the variable.

    #
    ### 1.1. Discrete Random Variables

    For a discrete random variable XX with probability mass function (PMF) P(X=x)P(X=x), the expectation is calculated by summing the products of each possible value of XX and its corresponding probability.

    πŸ“ Expectation of a Discrete Random Variable
    E[X]=βˆ‘xxP(X=x)E[X] = \sum_{x} x P(X=x)

    Variables:

      • XX = discrete random variable

      • xx = a possible value of XX

      • P(X=x)P(X=x) = probability mass function (PMF) at xx


    When to use: To find the average value of a discrete random variable.

    Worked Example:

    Problem: A fair six-sided die is rolled. Let XX be the number rolled. Calculate E[X]E[X].

    Solution:

    Step 1: Identify the possible values of XX and their probabilities.
    The possible values are 1,2,3,4,5,61, 2, 3, 4, 5, 6. Since the die is fair, each outcome has a probability of 1/61/6.

    P(X=x)=16for x∈{1,2,3,4,5,6}P(X=x) = \frac{1}{6} \quad \text{for } x \in \{1, 2, 3, 4, 5, 6\}

    Step 2: Apply the formula for the expectation of a discrete random variable.

    E[X]=βˆ‘x=16xP(X=x)E[X] = \sum_{x=1}^{6} x P(X=x)
    E[X]=1β‹…16+2β‹…16+3β‹…16+4β‹…16+5β‹…16+6β‹…16E[X] = 1 \cdot \frac{1}{6} + 2 \cdot \frac{1}{6} + 3 \cdot \frac{1}{6} + 4 \cdot \frac{1}{6} + 5 \cdot \frac{1}{6} + 6 \cdot \frac{1}{6}

    Step 3: Simplify the expression.

    E[X]=16(1+2+3+4+5+6)E[X] = \frac{1}{6} (1+2+3+4+5+6)
    E[X]=216E[X] = \frac{21}{6}
    E[X]=3.5E[X] = 3.5

    Answer: \boxed{3.5}

    ---

    #
    ### 1.2. Continuous Random Variables

    For a continuous random variable XX with probability density function (PDF) f(x)f(x), the expectation is calculated by integrating the product of xx and its PDF over the entire range of possible values.

    πŸ“ Expectation of a Continuous Random Variable
    E[X]=βˆ«βˆ’βˆžβˆžxf(x)dxE[X] = \int_{-\infty}^{\infty} x f(x) dx

    Variables:

      • XX = continuous random variable

      • f(x)f(x) = probability density function (PDF) of XX


    When to use: To find the average value of a continuous random variable.

    Worked Example:

    Problem: Let XX be a continuous random variable with PDF f(x)=2xf(x) = 2x for 0≀x≀10 \le x \le 1, and f(x)=0f(x) = 0 otherwise. Calculate E[X]E[X].

    Solution:

    Step 1: Identify the PDF and its range.

    f(x)=2xforΒ 0≀x≀1f(x) = 2x \quad \text{for } 0 \le x \le 1

    Step 2: Apply the formula for the expectation of a continuous random variable.

    E[X]=βˆ«βˆ’βˆžβˆžxf(x)dxE[X] = \int_{-\infty}^{\infty} x f(x) dx

    Since f(x)f(x) is non-zero only for 0≀x≀10 \le x \le 1, the integral limits change.

    E[X]=∫01x(2x)dxE[X] = \int_{0}^{1} x (2x) dx

    Step 3: Evaluate the integral.

    E[X]=∫012x2dxE[X] = \int_{0}^{1} 2x^2 dx
    E[X]=[2x33]01E[X] = \left[ \frac{2x^3}{3} \right]_{0}^{1}
    E[X]=(2(1)33)βˆ’(2(0)33)E[X] = \left( \frac{2(1)^3}{3} \right) - \left( \frac{2(0)^3}{3} \right)
    E[X]=23E[X] = \frac{2}{3}

    Answer: \boxed{\frac{2}{3}}

    ---

    ---

    #

    1.3. Properties of Expectation

    Expectation has several important properties that simplify calculations, especially when dealing with sums or transformations of random variables.

    πŸ“ Properties of Expectation

    • Expectation of a constant: E[c]=cE[c] = c

    • Scalar multiplication: E[aX]=aE[X]E[aX] = a E[X]

    • Addition of a constant: E[X+b]=E[X]+bE[X + b] = E[X] + b

    • Linearity of Expectation: For any random variables X1,X2,…,XnX_1, X_2, \ldots, X_n (whether independent or dependent) and constants a1,a2,…,ana_1, a_2, \ldots, a_n:

    • E[βˆ‘i=1naiXi]=βˆ‘i=1naiE[Xi]E\left[\sum_{i=1}^{n} a_i X_i\right] = \sum_{i=1}^{n} a_i E[X_i]

      A special case is E[X+Y]=E[X]+E[Y]E[X+Y] = E[X] + E[Y].
    • Expectation of a function of a random variable:

    For discrete XX: E[g(X)]=βˆ‘xg(x)P(X=x)E[g(X)] = \sum_{x} g(x) P(X=x)
    For continuous XX: E[g(X)]=βˆ«βˆ’βˆžβˆžg(x)f(x)dxE[g(X)] = \int_{-\infty}^{\infty} g(x) f(x) dx

    Worked Example (Linearity of Expectation):

    Problem: A box contains 10 balls: 3 red and 7 blue. Two balls are drawn without replacement. Let XX be the number of red balls drawn. Find E[X]E[X].

    Solution:

    Step 1: Define indicator random variables.
    Let X1X_1 be an indicator variable for the first ball drawn being red.
    Let X2X_2 be an indicator variable for the second ball drawn being red.

    X1={1ifΒ theΒ firstΒ ballΒ isΒ red0ifΒ theΒ firstΒ ballΒ isΒ blueX_1 = \begin{cases} 1 & \text{if the first ball is red} \\ 0 & \text{if the first ball is blue} \end{cases}
    X2={1ifΒ theΒ secondΒ ballΒ isΒ red0ifΒ theΒ secondΒ ballΒ isΒ blueX_2 = \begin{cases} 1 & \text{if the second ball is red} \\ 0 & \text{if the second ball is blue} \end{cases}

    Step 2: Express XX as a sum of indicator variables.
    The total number of red balls drawn is X=X1+X2X = X_1 + X_2.

    Step 3: Calculate the expectation of each indicator variable.
    For X1X_1:

    P(X1=1)=310P(X_1=1) = \frac{3}{10}

    E[X1]=1β‹…P(X1=1)+0β‹…P(X1=0)=310E[X_1] = 1 \cdot P(X_1=1) + 0 \cdot P(X_1=0) = \frac{3}{10}

    For X2X_2:
    The probability that the second ball is red can be found using the Law of Total Probability:

    P(X2=1)=P(X2=1∣X1=1)P(X1=1)+P(X2=1∣X1=0)P(X1=0)P(X_2=1) = P(X_2=1 | X_1=1)P(X_1=1) + P(X_2=1 | X_1=0)P(X_1=0)

    P(X2=1)=(29)(310)+(39)(710)P(X_2=1) = \left(\frac{2}{9}\right)\left(\frac{3}{10}\right) + \left(\frac{3}{9}\right)\left(\frac{7}{10}\right)
    P(X2=1)=690+2190=2790=310P(X_2=1) = \frac{6}{90} + \frac{21}{90} = \frac{27}{90} = \frac{3}{10}

    So, E[X2]=310E[X_2] = \frac{3}{10}.

    Step 4: Apply linearity of expectation.

    E[X]=E[X1+X2]E[X] = E[X_1 + X_2]

    E[X]=E[X1]+E[X2]E[X] = E[X_1] + E[X_2]
    E[X]=310+310E[X] = \frac{3}{10} + \frac{3}{10}
    E[X]=610=35E[X] = \frac{6}{10} = \frac{3}{5}

    Answer: 35\frac{3}{5}

    ---

    #

    2. Variance of a Random Variable

    The variance of a random variable XX, denoted by V(X)V(X) or Var(X)\text{Var}(X) or Οƒ2\sigma^2, measures the spread or dispersion of its values around the mean. It is the expected value of the squared deviation from the mean.

    πŸ“ Variance of a Random Variable
    V(X)=E[(Xβˆ’E[X])2]V(X) = E[(X - E[X])^2]

    Alternative (Computational) Formula:

    V(X)=E[X2]βˆ’(E[X])2V(X) = E[X^2] - (E[X])^2

    Variables:

      • XX = random variable

      • E[X]E[X] = expected value of XX

      • E[X2]E[X^2] = expected value of X2X^2


    When to use: To quantify the spread of a random variable's distribution. The alternative formula is often easier for calculation.

    Derivation of V(X)=E[X2]βˆ’(E[X])2V(X) = E[X^2] - (E[X])^2:

    Step 1: Start with the definition of variance.

    V(X)=E[(Xβˆ’E[X])2]V(X) = E[(X - E[X])^2]

    Step 2: Expand the squared term inside the expectation. Let ΞΌ=E[X]\mu = E[X] for simplicity.

    V(X)=E[(Xβˆ’ΞΌ)2]V(X) = E[(X - \mu)^2]
    V(X)=E[X2βˆ’2ΞΌX+ΞΌ2]V(X) = E[X^2 - 2\mu X + \mu^2]

    Step 3: Apply linearity of expectation.

    V(X)=E[X2]βˆ’E[2ΞΌX]+E[ΞΌ2]V(X) = E[X^2] - E[2\mu X] + E[\mu^2]

    Step 4: Use properties of expectation (E[aX]=aE[X]E[aX] = aE[X] and E[c]=cE[c] = c).

    V(X)=E[X2]βˆ’2ΞΌE[X]+ΞΌ2V(X) = E[X^2] - 2\mu E[X] + \mu^2

    Step 5: Substitute ΞΌ=E[X]\mu = E[X] back into the expression.

    V(X)=E[X2]βˆ’2E[X]E[X]+(E[X])2V(X) = E[X^2] - 2 E[X] E[X] + (E[X])^2
    V(X)=E[X2]βˆ’2(E[X])2+(E[X])2V(X) = E[X^2] - 2 (E[X])^2 + (E[X])^2

    Step 6: Simplify the expression.

    V(X)=E[X2]βˆ’(E[X])2V(X) = E[X^2] - (E[X])^2

    ---

    Worked Example:

    Problem: A fair six-sided die is rolled. Let XX be the number rolled. Calculate V(X)V(X).

    Solution:

    Step 1: Recall E[X]E[X] from the previous example.

    E[X]=3.5E[X] = 3.5

    Step 2: Calculate E[X2]E[X^2].
    Using the formula E[g(X)]=βˆ‘xg(x)P(X=x)E[g(X)] = \sum_{x} g(x) P(X=x) with g(X)=X2g(X) = X^2.

    E[X2]=βˆ‘x=16x2P(X=x)E[X^2] = \sum_{x=1}^{6} x^2 P(X=x)
    E[X2]=12β‹…16+22β‹…16+32β‹…16+42β‹…16+52β‹…16+62β‹…16E[X^2] = 1^2 \cdot \frac{1}{6} + 2^2 \cdot \frac{1}{6} + 3^2 \cdot \frac{1}{6} + 4^2 \cdot \frac{1}{6} + 5^2 \cdot \frac{1}{6} + 6^2 \cdot \frac{1}{6}
    E[X2]=16(1+4+9+16+25+36)E[X^2] = \frac{1}{6} (1 + 4 + 9 + 16 + 25 + 36)
    E[X2]=916E[X^2] = \frac{91}{6}

    Step 3: Apply the computational formula for variance.

    V(X)=E[X2]βˆ’(E[X])2V(X) = E[X^2] - (E[X])^2
    V(X)=916βˆ’(3.5)2V(X) = \frac{91}{6} - (3.5)^2
    V(X)=916βˆ’(72)2V(X) = \frac{91}{6} - \left(\frac{7}{2}\right)^2
    V(X)=916βˆ’494V(X) = \frac{91}{6} - \frac{49}{4}

    Step 4: Find a common denominator and simplify.

    V(X)=18212βˆ’14712V(X) = \frac{182}{12} - \frac{147}{12}
    V(X)=3512V(X) = \frac{35}{12}

    Answer: 3512\frac{35}{12}

    ---

    #

    2.1. Properties of Variance

    Variance also has several key properties.

    πŸ“ Properties of Variance

    • Non-negativity: V(X)β‰₯0V(X) \ge 0

    • Variance of a constant: V(c)=0V(c) = 0

    • Scalar multiplication and addition of a constant:

    • V(aX+b)=a2V(X)V(aX + b) = a^2 V(X)

    • Variance of a sum of independent random variables: If X1,X2,…,XnX_1, X_2, \ldots, X_n are independent random variables, then:

    • V[βˆ‘i=1nXi]=βˆ‘i=1nV(Xi)V\left[\sum_{i=1}^{n} X_i\right] = \sum_{i=1}^{n} V(X_i)

      A special case is V(X+Y)=V(X)+V(Y)V(X+Y) = V(X) + V(Y) if XX and YY are independent.
    • Variance of a sum of dependent random variables: If XX and YY are dependent:

    V(X+Y)=V(X)+V(Y)+2Cov(X,Y)V(X+Y) = V(X) + V(Y) + 2 \text{Cov}(X,Y)

    where Cov(X,Y)=E[(Xβˆ’E[X])(Yβˆ’E[Y])]\text{Cov}(X,Y) = E[(X-E[X])(Y-E[Y])] is the covariance between XX and YY.

    ❗ Independence for Variance

    Unlike expectation, which is always linear (E[X+Y]=E[X]+E[Y]E[X+Y] = E[X]+E[Y] regardless of independence), the variance of a sum is only the sum of variances if the random variables are independent. If they are dependent, the covariance term must be included.

    ---

    #

    3. Standard Deviation

    The standard deviation is the square root of the variance and is denoted by Οƒ\sigma. It has the same units as the random variable itself, making it more interpretable than variance in many contexts.

    πŸ“ Standard Deviation
    ΟƒX=V(X)\sigma_X = \sqrt{V(X)}

    Variables:

      • ΟƒX\sigma_X = standard deviation of XX

      • V(X)V(X) = variance of XX


    When to use: To express the spread of data in the original units of the random variable.

    ---

    #

    4. Indicator Random Variables

    An indicator random variable is a special type of discrete random variable that takes on a value of 11 if a particular event occurs and 00 otherwise. They are incredibly powerful when used with the linearity of expectation, especially in counting problems.

    πŸ“– Indicator Random Variable

    For an event AA, the indicator random variable IAI_A is defined as:

    IA={1ifΒ eventΒ AΒ occurs0ifΒ eventΒ AΒ doesΒ notΒ occurI_A = \begin{cases} 1 & \text{if event } A \text{ occurs} \\ 0 & \text{if event } A \text{ does not occur} \end{cases}

    πŸ“ Expectation of an Indicator Variable
    E[IA]=P(A)E[I_A] = P(A)

    Variables:

      • IAI_A = indicator random variable for event AA

      • P(A)P(A) = probability of event AA


    Why: E[IA]=1β‹…P(IA=1)+0β‹…P(IA=0)=1β‹…P(A)+0β‹…(1βˆ’P(A))=P(A)E[I_A] = 1 \cdot P(I_A=1) + 0 \cdot P(I_A=0) = 1 \cdot P(A) + 0 \cdot (1-P(A)) = P(A).

    Worked Example (Using Indicator Variables for Expectation):

    Problem: In a group of nn people, what is the expected number of people who share the same birthday (ignoring leap years)?

    Solution:

    Step 1: Define indicator variables for each possible pair of people.
    Let N=(n2)N = \binom{n}{2} be the total number of pairs of people.
    Let IijI_{ij} be an indicator variable that people ii and jj share a birthday, for 1≀i<j≀n1 \le i < j \le n.

    Iij={1ifΒ personΒ iΒ andΒ personΒ jΒ shareΒ aΒ birthday0otherwiseI_{ij} = \begin{cases} 1 & \text{if person } i \text{ and person } j \text{ share a birthday} \\ 0 & \text{otherwise} \end{cases}

    Step 2: Express the total number of shared birthdays (XX) as a sum of indicator variables.

    X=βˆ‘1≀i<j≀nIijX = \sum_{1 \le i < j \le n} I_{ij}

    Step 3: Calculate the expectation of a single indicator variable.
    Assuming each day of the year (365 days) is equally likely for a birthday.
    The probability that two specific people share a birthday is P(Iij=1)=1365P(I_{ij}=1) = \frac{1}{365}.

    E[Iij]=P(Iij=1)=1365E[I_{ij}] = P(I_{ij}=1) = \frac{1}{365}

    Step 4: Apply linearity of expectation.

    E[X]=E[βˆ‘1≀i<j≀nIij]E[X] = E\left[\sum_{1 \le i < j \le n} I_{ij}\right]
    E[X]=βˆ‘1≀i<j≀nE[Iij]E[X] = \sum_{1 \le i < j \le n} E[I_{ij}]

    Since there are (n2)\binom{n}{2} such indicator variables, and each has the same expectation:

    E[X]=(n2)β‹…1365E[X] = \binom{n}{2} \cdot \frac{1}{365}
    E[X]=n(nβˆ’1)2β‹…1365E[X] = \frac{n(n-1)}{2} \cdot \frac{1}{365}

    Answer: n(nβˆ’1)730\frac{n(n-1)}{730}

    ---

    #

    5. Chebyshev's Inequality

    Chebyshev's Inequality provides a bound on the probability that a random variable deviates from its mean by a certain amount. It is a powerful tool because it applies to any probability distribution for which the mean and variance exist, without requiring knowledge of the specific distribution shape.

    πŸ“ Chebyshev's Inequality

    For any random variable XX with finite mean E[X]E[X] and finite variance V(X)V(X), and for any real number k>0k > 0:

    P(∣Xβˆ’E[X]∣β‰₯k)≀V(X)k2P(|X - E[X]| \ge k) \le \frac{V(X)}{k^2}

    Alternative form: Let k=cσk = c\sigma, where σ=V(X)\sigma = \sqrt{V(X)} is the standard deviation and c>0c > 0.

    P(∣Xβˆ’E[X]∣β‰₯cΟƒ)≀1c2P(|X - E[X]| \ge c\sigma) \le \frac{1}{c^2}

    Variables:

      • XX = random variable

      • E[X]E[X] = mean of XX

      • V(X)V(X) = variance of XX

      • kk = a positive constant representing the deviation from the mean


    When to use: To provide a general upper bound on the probability of extreme deviations from the mean when the exact distribution is unknown or complex.

    Worked Example:

    Problem: The average height of students in a university is 170170 cm with a standard deviation of 55 cm. What is the minimum percentage of students whose height is between 160160 cm and 180180 cm?

    Solution:

    Step 1: Identify the given values.
    E[X]=170E[X] = 170 cm
    ΟƒX=5\sigma_X = 5 cm
    We want to find P(160≀X≀180)P(160 \le X \le 180).

    Step 2: Rephrase the probability in terms of deviation from the mean.
    The interval [160,180][160, 180] is 170Β±10170 \pm 10. So, we are interested in P(∣Xβˆ’170βˆ£β‰€10)P(|X - 170| \le 10).

    Step 3: Apply Chebyshev's Inequality for the complementary event.
    Chebyshev's inequality gives an upper bound for P(∣Xβˆ’E[X]∣β‰₯k)P(|X - E[X]| \ge k).
    Here, k=10k = 10.

    P(∣Xβˆ’170∣β‰₯10)≀V(X)102P(|X - 170| \ge 10) \le \frac{V(X)}{10^2}

    First, calculate V(X)=ΟƒX2=52=25V(X) = \sigma_X^2 = 5^2 = 25.

    P(∣Xβˆ’170∣β‰₯10)≀25100P(|X - 170| \ge 10) \le \frac{25}{100}
    P(∣Xβˆ’170∣β‰₯10)≀14P(|X - 170| \ge 10) \le \frac{1}{4}

    Step 4: Find the probability for the desired interval.
    The probability of being within the interval is 1βˆ’P(∣Xβˆ’170∣β‰₯10)1 - P(|X - 170| \ge 10).

    P(160≀X≀180)=1βˆ’P(∣Xβˆ’170∣β‰₯10)P(160 \le X \le 180) = 1 - P(|X - 170| \ge 10)
    P(160≀X≀180)β‰₯1βˆ’14P(160 \le X \le 180) \ge 1 - \frac{1}{4}
    P(160≀X≀180)β‰₯34P(160 \le X \le 180) \ge \frac{3}{4}

    Step 5: Convert to percentage.

    34=0.75=75%\frac{3}{4} = 0.75 = 75\%

    Answer: At least 75%75\% of students have heights between 160160 cm and 180180 cm.

    ---

    ---

    Problem-Solving Strategies

    πŸ’‘ CMI Strategy: Linearity of Expectation with Indicators

    Many CMI problems involving counting the expected number of "events" (e.g., matching items, shared birthdays, special points) are most efficiently solved using linearity of expectation with indicator random variables.

    • Define the overall random variable XX as the quantity you need to find the expectation of.

    • Decompose XX into a sum of simpler random variables XiX_i. Often, these XiX_i will be indicator variables. For example, if XX is the number of items with property AA, define Xi=1X_i = 1 if item ii has property AA, and 00 otherwise.

    • Calculate E[Xi]E[X_i] for each individual XiX_i. For an indicator variable IAI_A, this is simply P(A)P(A).

    • Apply linearity of expectation:

    E[X]=E[βˆ‘Xi]=βˆ‘E[Xi]E[X] = E\left[\sum X_i\right] = \sum E[X_i]

    This works even if the XiX_i are dependent, which is a major advantage.

    πŸ’‘ CMI Strategy: Using V(X)=E[X2]βˆ’(E[X])2V(X) = E[X^2] - (E[X])^2

    This formula is almost always easier for calculating variance than the definition E[(Xβˆ’E[X])2]E[(X - E[X])^2], especially for complex distributions or when E[X]E[X] is not an integer.

    • First, calculate E[X]E[X].

    • Then, calculate E[X2]E[X^2]. Remember E[X2]E[X^2] is not (E[X])2(E[X])^2. For discrete variables, it's βˆ‘x2P(X=x)\sum x^2 P(X=x); for continuous, ∫x2f(x)dx\int x^2 f(x) dx.

    • Finally, subtract (E[X])2(E[X])^2 from E[X2]E[X^2].

    πŸ’‘ CMI Strategy: Handling Conditional Information

    When a problem provides conditional probabilities or usage statistics (like in server outage problems), use the Law of Total Probability to find unconditional probabilities, which can then be used in expectation calculations.
    For example, if you need the overall probability of an event AA that depends on conditions BiB_i:

    P(A)=βˆ‘P(A∣Bi)P(Bi)P(A) = \sum P(A|B_i)P(B_i)

    Then, if XX is a value associated with AA, E[X]E[X] might involve these combined probabilities.

    ---

    Common Mistakes

    ⚠️ Avoid These Errors
      • ❌ Assuming independence for variance of a sum: Students often mistakenly write V(X+Y)=V(X)+V(Y)V(X+Y) = V(X) + V(Y) even when XX and YY are dependent.
    βœ… Correct approach: Remember that
    V(X+Y)=V(X)+V(Y)+2Cov⁑(X,Y)V(X+Y) = V(X) + V(Y) + 2\operatorname{Cov}(X,Y)
    If XX and YY are independent, Cov⁑(X,Y)=0\operatorname{Cov}(X,Y)=0, so V(X+Y)=V(X)+V(Y)V(X+Y) = V(X) + V(Y). Always check for independence.
      • ❌ Confusing E[X2]E[X^2] with (E[X])2(E[X])^2: These are generally not equal (E[X2]β‰₯(E[X])2E[X^2] \ge (E[X])^2 always).
    βœ… Correct approach: E[X2]E[X^2] is the expectation of the squared random variable. (E[X])2(E[X])^2 is the square of the expected value. Calculate them separately.
      • ❌ Incorrectly applying linearity of expectation to products: E[XY]β‰ E[X]E[Y]E[XY] \ne E[X]E[Y] unless XX and YY are independent.
    βœ… Correct approach: Linearity applies to sums. For products, use
    E[XY]=E[X]E[Y]+Cov⁑(X,Y)E[XY] = E[X]E[Y] + \operatorname{Cov}(X,Y)
    If independent, then E[XY]=E[X]E[Y]E[XY] = E[X]E[Y].
      • ❌ Misinterpreting probability in indicator variable problems: For E[IA]E[I_A], the probability P(A)P(A) must be calculated correctly, considering all conditions of the event AA.
    βœ… Correct approach: Carefully define the event AA for each indicator variable and calculate its probability precisely. This often involves basic combinatorial probability.

    ---

    Practice Questions

    :::question type="NAT" question="A company manufactures light bulbs. The lifespan of a bulb, XX, in years, has a probability density function f(x)=Ξ»eβˆ’Ξ»xf(x) = \lambda e^{-\lambda x} for xβ‰₯0x \ge 0, where Ξ»=0.5\lambda = 0.5. What is the expected lifespan of a bulb in years?" answer="2" hint="Recall the formula for the expectation of a continuous random variable and the properties of the exponential distribution." solution="Step 1: Identify the PDF and its parameter.
    The PDF is f(x)=0.5eβˆ’0.5xf(x) = 0.5 e^{-0.5x} for xβ‰₯0x \ge 0. This is an exponential distribution with rate parameter Ξ»=0.5\lambda = 0.5.

    Step 2: Apply the formula for the expectation of a continuous random variable.

    E[X]=∫0∞xf(x)dxE[X] = \int_{0}^{\infty} x f(x) dx

    E[X]=∫0∞x(0.5eβˆ’0.5x)dxE[X] = \int_{0}^{\infty} x (0.5 e^{-0.5x}) dx

    This is the mean of an exponential distribution, which is 1/Ξ»1/\lambda.

    Step 3: Calculate the expectation.

    E[X]=1Ξ»=10.5=2E[X] = \frac{1}{\lambda} = \frac{1}{0.5} = 2

    The expected lifespan is 2 years.
    Answer: \boxed{2}"
    :::

    :::question type="MCQ" question="Let XX be a discrete random variable with P(X=1)=0.2P(X=1) = 0.2, P(X=2)=0.3P(X=2) = 0.3, and P(X=3)=0.5P(X=3) = 0.5. Which of the following statements about V(X)V(X) is correct?" options=["V(X)=0.61V(X) = 0.61","V(X)=0.76V(X) = 0.76","V(X)=1.69V(X) = 1.69","V(X)=2.3V(X) = 2.3"] answer="V(X)=0.61V(X) = 0.61" hint="First calculate E[X]E[X] and E[X2]E[X^2], then use the formula V(X)=E[X2]βˆ’(E[X])2V(X) = E[X^2] - (E[X])^2." solution="Step 1: Calculate E[X]E[X].

    E[X]=βˆ‘xP(X=x)E[X] = \sum x P(X=x)

    E[X]=(1)(0.2)+(2)(0.3)+(3)(0.5)E[X] = (1)(0.2) + (2)(0.3) + (3)(0.5)

    E[X]=0.2+0.6+1.5=2.3E[X] = 0.2 + 0.6 + 1.5 = 2.3

    Step 2: Calculate E[X2]E[X^2].

    E[X2]=βˆ‘x2P(X=x)E[X^2] = \sum x^2 P(X=x)

    E[X2]=(12)(0.2)+(22)(0.3)+(32)(0.5)E[X^2] = (1^2)(0.2) + (2^2)(0.3) + (3^2)(0.5)

    E[X2]=(1)(0.2)+(4)(0.3)+(9)(0.5)E[X^2] = (1)(0.2) + (4)(0.3) + (9)(0.5)

    E[X2]=0.2+1.2+4.5=5.9E[X^2] = 0.2 + 1.2 + 4.5 = 5.9

    Step 3: Calculate V(X)V(X).

    V(X)=E[X2]βˆ’(E[X])2V(X) = E[X^2] - (E[X])^2

    V(X)=5.9βˆ’(2.3)2V(X) = 5.9 - (2.3)^2

    V(X)=5.9βˆ’5.29V(X) = 5.9 - 5.29

    V(X)=0.61V(X) = 0.61

    Answer: \boxed{0.61}"
    :::

    :::question type="SUB" question="A bag contains 5 red balls and 5 blue balls. Three balls are drawn without replacement. Let YY be the number of blue balls drawn. Calculate E[Y]E[Y] using indicator random variables." answer="E[Y]=1.5E[Y] = 1.5" hint="Define an indicator variable for each draw, then use linearity of expectation." solution="Step 1: Define indicator random variables.
    Let Y1Y_1, Y2Y_2, Y3Y_3 be indicator variables for the first, second, and third ball drawn being blue, respectively.

    Yi={1ifΒ theΒ i-thΒ ballΒ drawnΒ isΒ blue0otherwiseY_i = \begin{cases} 1 & \text{if the } i\text{-th ball drawn is blue} \\ 0 & \text{otherwise} \end{cases}

    Step 2: Express YY as a sum of indicator variables.

    Y=Y1+Y2+Y3Y = Y_1 + Y_2 + Y_3

    Step 3: Calculate the expectation of each indicator variable.
    For Y1Y_1:

    P(Y1=1)=510=12P(Y_1=1) = \frac{5}{10} = \frac{1}{2}

    E[Y1]=P(Y1=1)=12E[Y_1] = P(Y_1=1) = \frac{1}{2}

    For Y2Y_2:
    By symmetry, the probability that the second ball drawn is blue is the same as the first. Alternatively, using Law of Total Probability:

    P(Y2=1)=P(Y2=1∣Y1=1)P(Y1=1)+P(Y2=1∣Y1=0)P(Y1=0)P(Y_2=1) = P(Y_2=1|Y_1=1)P(Y_1=1) + P(Y_2=1|Y_1=0)P(Y_1=0)

    P(Y2=1)=(49)(510)+(59)(510)P(Y_2=1) = \left(\frac{4}{9}\right)\left(\frac{5}{10}\right) + \left(\frac{5}{9}\right)\left(\frac{5}{10}\right)

    P(Y2=1)=2090+2590=4590=12P(Y_2=1) = \frac{20}{90} + \frac{25}{90} = \frac{45}{90} = \frac{1}{2}

    So,
    E[Y2]=12E[Y_2] = \frac{1}{2}

    For Y3Y_3:
    Similarly,

    E[Y3]=12E[Y_3] = \frac{1}{2}

    Step 4: Apply linearity of expectation.

    E[Y]=E[Y1+Y2+Y3]E[Y] = E[Y_1 + Y_2 + Y_3]

    E[Y]=E[Y1]+E[Y2]+E[Y3]E[Y] = E[Y_1] + E[Y_2] + E[Y_3]

    E[Y]=12+12+12E[Y] = \frac{1}{2} + \frac{1}{2} + \frac{1}{2}

    E[Y]=32=1.5E[Y] = \frac{3}{2} = 1.5

    Answer: \boxed{1.5}"
    :::

    :::question type="MSQ" question="Let XX be a random variable with E[X]=5E[X]=5 and V(X)=4V(X)=4. Which of the following statements are correct?" options=["E[2X+3]=13E[2X+3] = 13","V(2X+3)=16V(2X+3) = 16","E[X2]=29E[X^2] = 29","P(∣Xβˆ’5∣β‰₯4)≀1/4P(|X-5| \ge 4) \le 1/4"] answer="A,B,C,D" hint="Apply the properties of expectation and variance, and Chebyshev's inequality." solution="Let's evaluate each option:

    Option A: E[2X+3]=13E[2X+3] = 13
    Using linearity of expectation:

    E[2X+3]=E[2X]+E[3]E[2X+3] = E[2X] + E[3]

    E[2X+3]=2E[X]+3E[2X+3] = 2E[X] + 3

    Given E[X]=5E[X]=5:
    E[2X+3]=2(5)+3=10+3=13E[2X+3] = 2(5) + 3 = 10 + 3 = 13

    This statement is correct.

    Option B: V(2X+3)=16V(2X+3) = 16
    Using properties of variance:

    V(aX+b)=a2V(X)V(aX+b) = a^2 V(X)

    V(2X+3)=22V(X)V(2X+3) = 2^2 V(X)

    Given V(X)=4V(X)=4:
    V(2X+3)=4(4)=16V(2X+3) = 4(4) = 16

    This statement is correct.

    Option C: E[X2]=29E[X^2] = 29
    Using the computational formula for variance:

    V(X)=E[X2]βˆ’(E[X])2V(X) = E[X^2] - (E[X])^2

    We are given V(X)=4V(X)=4 and E[X]=5E[X]=5.
    4=E[X2]βˆ’(5)24 = E[X^2] - (5)^2

    4=E[X2]βˆ’254 = E[X^2] - 25

    E[X2]=4+25=29E[X^2] = 4 + 25 = 29

    This statement is correct.

    Option D: P(∣Xβˆ’5∣β‰₯4)≀1/4P(|X-5| \ge 4) \le 1/4
    Using Chebyshev's Inequality:

    P(∣Xβˆ’E[X]∣β‰₯k)≀V(X)k2P(|X - E[X]| \ge k) \le \frac{V(X)}{k^2}

    Here, E[X]=5E[X]=5, V(X)=4V(X)=4, and k=4k=4.
    P(∣Xβˆ’5∣β‰₯4)≀442P(|X-5| \ge 4) \le \frac{4}{4^2}

    P(∣Xβˆ’5∣β‰₯4)≀416P(|X-5| \ge 4) \le \frac{4}{16}

    P(∣Xβˆ’5∣β‰₯4)≀14P(|X-5| \ge 4) \le \frac{1}{4}

    This statement is correct.

    All statements are correct.
    Answer: \boxed{A,B,C,D}"
    :::

    :::question type="NAT" question="A discrete random variable YY has P(Y=0)=0.4P(Y=0)=0.4, P(Y=1)=0.3P(Y=1)=0.3, and P(Y=2)=0.3P(Y=2)=0.3. What is the standard deviation of YY (round to two decimal places)?" answer="0.83" hint="Calculate E[Y]E[Y] and E[Y2]E[Y^2], then V(Y)V(Y), and finally ΟƒY=V(Y)\sigma_Y = \sqrt{V(Y)}." solution="Step 1: Calculate E[Y]E[Y].

    E[Y]=βˆ‘yP(Y=y)E[Y] = \sum y P(Y=y)

    E[Y]=(0)(0.4)+(1)(0.3)+(2)(0.3)E[Y] = (0)(0.4) + (1)(0.3) + (2)(0.3)

    E[Y]=0+0.3+0.6=0.9E[Y] = 0 + 0.3 + 0.6 = 0.9

    Step 2: Calculate E[Y2]E[Y^2].

    E[Y2]=βˆ‘y2P(Y=y)E[Y^2] = \sum y^2 P(Y=y)

    E[Y2]=(02)(0.4)+(12)(0.3)+(22)(0.3)E[Y^2] = (0^2)(0.4) + (1^2)(0.3) + (2^2)(0.3)

    E[Y2]=(0)(0.4)+(1)(0.3)+(4)(0.3)E[Y^2] = (0)(0.4) + (1)(0.3) + (4)(0.3)

    E[Y2]=0+0.3+1.2=1.5E[Y^2] = 0 + 0.3 + 1.2 = 1.5

    Step 3: Calculate V(Y)V(Y).

    V(Y)=E[Y2]βˆ’(E[Y])2V(Y) = E[Y^2] - (E[Y])^2

    V(Y)=1.5βˆ’(0.9)2V(Y) = 1.5 - (0.9)^2

    V(Y)=1.5βˆ’0.81V(Y) = 1.5 - 0.81

    V(Y)=0.69V(Y) = 0.69

    Step 4: Calculate the standard deviation ΟƒY\sigma_Y.

    ΟƒY=V(Y)\sigma_Y = \sqrt{V(Y)}

    ΟƒY=0.69\sigma_Y = \sqrt{0.69}

    ΟƒYβ‰ˆ0.83066...\sigma_Y \approx 0.83066...

    Rounding to two decimal places, ΟƒY=0.83\sigma_Y = 0.83.
    Answer: \boxed{0.83}"
    :::

    ---

    Summary

    ❗ Key Takeaways for CMI

    • Expectation (E[X]E[X]): Represents the long-run average. For discrete XX,

    • E[X]=βˆ‘xP(X=x)E[X] = \sum x P(X=x)

      For continuous XX,
      E[X]=∫xf(x)dxE[X] = \int x f(x) dx

    • Linearity of Expectation:

    • E[βˆ‘aiXi]=βˆ‘aiE[Xi]E\left[\sum a_i X_i\right] = \sum a_i E[X_i]

      is a powerful tool. It holds always, regardless of whether XiX_i are independent or dependent. This is crucial for problems involving sums of indicator variables.
    • Variance (V(X)V(X)): Measures the spread around the mean. The computational formula

    • V(X)=E[X2]βˆ’(E[X])2V(X) = E[X^2] - (E[X])^2

      is generally preferred.
    • Properties of Variance:

    • V(aX+b)=a2V(X)V(aX+b) = a^2 V(X)

      For independent random variables,
      V(βˆ‘Xi)=βˆ‘V(Xi)V\left(\sum X_i\right) = \sum V(X_i)

      For dependent variables, covariance terms must be included.
    • Indicator Random Variables: IA=1I_A = 1 if event AA occurs, 00 otherwise.

    • E[IA]=P(A)E[I_A] = P(A)

      They simplify complex counting problems when combined with linearity of expectation.
    • Chebyshev's Inequality:

    P(∣Xβˆ’E[X]∣β‰₯k)≀V(X)k2P(|X - E[X]| \ge k) \le \frac{V(X)}{k^2}

    provides a general bound on deviations from the mean for any distribution with finite mean and variance.

    ---

    What's Next?

    πŸ’‘ Continue Learning

    This topic connects to:

      • Covariance and Correlation: Understanding dependence between random variables, which is essential for calculating variance of sums of dependent variables

      V(X+Y)=V(X)+V(Y)+2Cov⁑(X,Y)V(X+Y) = V(X) + V(Y) + 2\operatorname{Cov}(X,Y)

      • Moment Generating Functions (MGFs): MGFs are powerful tools for finding expectations and variances of random variables, especially for sums of independent random variables. They provide an alternative, often simpler, method to derive these moments.

      • Common Probability Distributions: Knowing the specific formulas for E[X]E[X] and V(X)V(X) for distributions like Binomial, Poisson, Geometric, Uniform, Normal, and Exponential is critical for applying these concepts in specific scenarios.


    Master these connections for comprehensive CMI preparation!

    ---

    πŸ’‘ Moving Forward

    Now that you understand Expectation and Variance, let's explore Standard Distributions which builds on these concepts.

    ---

    Part 4: Standard Distributions

    Introduction

    Standard distributions are fundamental building blocks in probability theory and statistics, providing models for a wide array of random phenomena encountered in data science. Each distribution describes the probabilities of different outcomes for a specific type of random variable, characterized by its parameters. Understanding these distributions is crucial for modeling real-world data, performing statistical inference, and making informed decisions.

    In the CMI exam, a deep understanding of standard discrete and continuous distributions is essential. This includes knowing their probability mass/density functions, cumulative distribution functions, expected values, variances, and how to apply them to calculate probabilities and estimate parameters in various scenarios. Mastery of these concepts forms the bedrock for advanced topics like hypothesis testing, regression analysis, and machine learning algorithms.

    πŸ“– Random Variable

    A random variable is a function that maps outcomes from a sample space to numerical values.

      • A discrete random variable can take on a finite or countably infinite number of values.

      • A continuous random variable can take on any value within a given range or interval.

    ---

    Key Concepts

    1. Discrete Distributions

    Discrete distributions model scenarios where the outcomes are countable.

    1.1 Bernoulli Distribution

    The Bernoulli distribution models a single trial with two possible outcomes: "success" (usually denoted by 1) or "failure" (usually denoted by 0).

    πŸ“ Bernoulli PMF
    P(X=x)=px(1βˆ’p)1βˆ’xforΒ x∈{0,1}P(X=x) = p^x (1-p)^{1-x} \quad \text{for } x \in \{0, 1\}

    Variables:

      • XX = Bernoulli random variable

      • pp = probability of success (0≀p≀10 \le p \le 1)

      • xx = outcome (0 or 1)


    When to use: For a single trial with binary outcome.

    Properties:

    • Mean: E[X]=pE[X] = p

    • Variance: Var(X)=p(1βˆ’p)Var(X) = p(1-p)


    ---

    1.2 Binomial Distribution

    The Binomial distribution models the number of successes in a fixed number of independent Bernoulli trials.

    πŸ“ Binomial PMF
    P(X=k)=(nk)pk(1βˆ’p)nβˆ’kforΒ k∈{0,1,…,n}P(X=k) = \binom{n}{k} p^k (1-p)^{n-k} \quad \text{for } k \in \{0, 1, \dots, n\}

    Variables:

      • XX = Binomial random variable

      • nn = number of trials

      • kk = number of successes

      • pp = probability of success in a single trial

      • (nk)=n!k!(nβˆ’k)!\binom{n}{k} = \frac{n!}{k!(n-k)!} = binomial coefficient


    When to use: When counting the number of successes in a fixed number of independent trials, each with the same probability of success.

    Properties:

    • Mean: E[X]=npE[X] = np

    • Variance: Var(X)=np(1βˆ’p)Var(X) = np(1-p)


    Worked Example:

    Problem: A fair coin is tossed 10 times. What is the probability of getting exactly 7 heads?

    Solution:

    Step 1: Identify parameters for Binomial distribution.
    Here, n=10n=10 (number of tosses), k=7k=7 (number of heads), and p=0.5p=0.5 (probability of heads for a fair coin).

    Step 2: Apply the Binomial PMF.

    P(X=7)=(107)(0.5)7(1βˆ’0.5)10βˆ’7P(X=7) = \binom{10}{7} (0.5)^7 (1-0.5)^{10-7}

    Step 3: Calculate the binomial coefficient and simplify.

    (107)=10!7!(10βˆ’7)!=10!7!3!=10Γ—9Γ—83Γ—2Γ—1=10Γ—3Γ—4=120\binom{10}{7} = \frac{10!}{7!(10-7)!} = \frac{10!}{7!3!} = \frac{10 \times 9 \times 8}{3 \times 2 \times 1} = 10 \times 3 \times 4 = 120
    P(X=7)=120Γ—(0.5)7Γ—(0.5)3P(X=7) = 120 \times (0.5)^7 \times (0.5)^3
    P(X=7)=120Γ—(0.5)10P(X=7) = 120 \times (0.5)^{10}
    P(X=7)=120Γ—0.0009765625P(X=7) = 120 \times 0.0009765625
    P(X=7)=0.1171875P(X=7) = 0.1171875

    Answer: 0.11718750.1171875

    ---

    1.3 Poisson Distribution

    The Poisson distribution models the number of events occurring in a fixed interval of time or space, given a constant average rate (Ξ»\lambda) of occurrence and that these events occur independently.

    πŸ“ Poisson PMF
    P(X=k)=eβˆ’Ξ»Ξ»kk!forΒ k∈{0,1,2,… }P(X=k) = \frac{e^{-\lambda} \lambda^k}{k!} \quad \text{for } k \in \{0, 1, 2, \dots\}

    Variables:

      • XX = Poisson random variable

      • kk = number of events

      • Ξ»\lambda = average rate of events in the interval (Ξ»>0\lambda > 0)


    When to use: For counts of rare events over a specified interval or region.

    Properties:

    • Mean: E[X]=Ξ»E[X] = \lambda

    • Variance: Var(X)=Ξ»Var(X) = \lambda




    ❗
    Poisson Approximation to Binomial

    When the number of trials nn is large (nβ‰₯20n \ge 20) and the probability of success pp is small (p≀0.05p \le 0.05), the Binomial distribution B(n,p)B(n, p) can be approximated by a Poisson distribution with parameter Ξ»=np\lambda = np. This approximation is useful for simplifying calculations when nn is large and pp is small.


    Worked Example:

    Problem: A call center receives an average of 5 calls per hour. What is the probability of receiving exactly 3 calls in the next hour?

    Solution:

    Step 1: Identify the parameter for Poisson distribution.
    Here, Ξ»=5\lambda = 5 (average calls per hour), and k=3k=3 (number of calls).

    Step 2: Apply the Poisson PMF.

    P(X=3)=eβˆ’5533!P(X=3) = \frac{e^{-5} 5^3}{3!}

    Step 3: Calculate the terms and simplify.

    P(X=3)=0.0067379Γ—1256P(X=3) = \frac{0.0067379 \times 125}{6}
    P(X=3)=0.84223756P(X=3) = \frac{0.8422375}{6}
    P(X=3)=0.1403729P(X=3) = 0.1403729

    Answer: 0.14037290.1403729

    ---

    2. Continuous Distributions

    Continuous distributions model scenarios where the outcomes can take any value within a range.

    2.1 Uniform Distribution

    The Uniform distribution assigns equal probability to all values within a specified interval [a,b][a, b].

    πŸ“ Uniform PDF
    f(x)={1bβˆ’aforΒ a≀x≀b0otherwisef(x) = \begin{cases} \frac{1}{b-a} & \text{for } a \le x \le b \\ 0 & \text{otherwise} \end{cases}

    Variables:

      • XX = Uniform random variable

      • aa = minimum value

      • bb = maximum value


    When to use: When all outcomes within an interval are equally likely.

    πŸ“ Uniform CDF
    F(x)={0forΒ x<axβˆ’abβˆ’aforΒ a≀x<b1forΒ xβ‰₯bF(x) = \begin{cases} 0 & \text{for } x < a \\ \frac{x-a}{b-a} & \text{for } a \le x < b \\ 1 & \text{for } x \ge b \end{cases}

    Properties:

    • Mean: E[X]=a+b2E[X] = \frac{a+b}{2}

    • Variance: Var(X)=(bβˆ’a)212Var(X) = \frac{(b-a)^2}{12}


    Worked Example:

    Problem: A random variable XX is uniformly distributed between 0 and 10. What is the probability that XX is between 3 and 7?

    Solution:

    Step 1: Identify parameters.
    Here, a=0a=0, b=10b=10. We want to find P(3<X<7)P(3 < X < 7).

    Step 2: Use the PDF or CDF. Using PDF:

    P(3<X<7)=∫37f(x) dxP(3 < X < 7) = \int_{3}^{7} f(x) \, dx
    P(3<X<7)=∫37110βˆ’0 dxP(3 < X < 7) = \int_{3}^{7} \frac{1}{10-0} \, dx
    P(3<X<7)=∫37110 dxP(3 < X < 7) = \int_{3}^{7} \frac{1}{10} \, dx

    Step 3: Evaluate the integral.

    P(3<X<7)=[x10]37P(3 < X < 7) = \left[ \frac{x}{10} \right]_{3}^{7}
    P(3<X<7)=710βˆ’310P(3 < X < 7) = \frac{7}{10} - \frac{3}{10}
    P(3<X<7)=410P(3 < X < 7) = \frac{4}{10}
    P(3<X<7)=0.4P(3 < X < 7) = 0.4

    Answer: 0.40.4

    ---

    2.2 Exponential Distribution

    The Exponential distribution models the time until an event occurs in a Poisson process, where events occur continuously and independently at a constant average rate. It is memoryless.

    πŸ“ Exponential PDF
    f(x)=Ξ»eβˆ’Ξ»xforΒ xβ‰₯0f(x) = \lambda e^{-\lambda x} \quad \text{for } x \ge 0

    Variables:

      • XX = Exponential random variable (time to event)

      • Ξ»\lambda = rate parameter (average number of events per unit time, Ξ»>0\lambda > 0)


    When to use: For modeling waiting times or lifetimes when the rate of occurrence is constant.

    πŸ“ Exponential CDF
    F(x)=P(X≀x)=1βˆ’eβˆ’Ξ»xforΒ xβ‰₯0F(x) = P(X \le x) = 1 - e^{-\lambda x} \quad \text{for } x \ge 0

    Properties:

    • Mean: E[X]=1Ξ»E[X] = \frac{1}{\lambda}

    • Variance: Var(X)=1Ξ»2Var(X) = \frac{1}{\lambda^2}

    • Memoryless Property: P(X>s+t∣X>s)=P(X>t)P(X > s+t | X > s) = P(X > t). The future waiting time does not depend on past waiting time.


    Worked Example:

    Problem: The lifespan of a certain electronic component follows an exponential distribution with a mean lifespan of 5 years. What is the probability that a component will last less than 3 years?

    Solution:

    Step 1: Determine the rate parameter Ξ»\lambda.
    Given mean E[X]=5E[X] = 5 years. For exponential distribution, E[X]=1/Ξ»E[X] = 1/\lambda.

    5=1Ξ»5 = \frac{1}{\lambda}
    Ξ»=15=0.2\lambda = \frac{1}{5} = 0.2

    Step 2: Use the CDF to find P(X<3)P(X < 3).

    P(X<3)=F(3)=1βˆ’eβˆ’Ξ»Γ—3P(X < 3) = F(3) = 1 - e^{-\lambda \times 3}
    P(X<3)=1βˆ’eβˆ’0.2Γ—3P(X < 3) = 1 - e^{-0.2 \times 3}
    P(X<3)=1βˆ’eβˆ’0.6P(X < 3) = 1 - e^{-0.6}
    P(X<3)=1βˆ’0.5488P(X < 3) = 1 - 0.5488
    P(X<3)=0.4512P(X < 3) = 0.4512

    Answer: 0.45120.4512

    ---

    2.3 Normal (Gaussian) Distribution

    The Normal distribution is arguably the most important distribution in statistics. It is symmetric, bell-shaped, and characterized by its mean (ΞΌ\mu) and standard deviation (Οƒ\sigma). Many natural phenomena follow this distribution, and it is central to the Central Limit Theorem.

    πŸ“ Normal PDF
    f(x)=1Οƒ2Ο€eβˆ’12(xβˆ’ΞΌΟƒ)2forΒ βˆ’βˆž<x<∞f(x) = \frac{1}{\sigma\sqrt{2\pi}} e^{-\frac{1}{2} \left(\frac{x-\mu}{\sigma}\right)^2} \quad \text{for } -\infty < x < \infty

    Variables:

      • XX = Normal random variable

      • ΞΌ\mu = mean of the distribution

      • Οƒ\sigma = standard deviation of the distribution (Οƒ>0\sigma > 0)


    When to use: For modeling continuous data that clusters around a central value and is symmetric, or when applying the Central Limit Theorem.

    Properties:

    • Mean: E[X]=ΞΌE[X] = \mu

    • Variance: Var(X)=Οƒ2Var(X) = \sigma^2

    • Median = Mode = Mean = ΞΌ\mu.

    • The curve is symmetric about ΞΌ\mu.

    • The total area under the curve is 1.




    πŸ“–
    Standard Normal Distribution

    A Standard Normal distribution is a Normal distribution with a mean of ΞΌ=0\mu=0 and a standard deviation of Οƒ=1\sigma=1. It is typically denoted by Z∼N(0,1)Z \sim N(0,1). Its PDF is:

    f(z)=12Ο€eβˆ’z22f(z) = \frac{1}{\sqrt{2\pi}} e^{-\frac{z^2}{2}}

    The Cumulative Distribution Function (CDF) for the Standard Normal distribution is denoted by Ξ¦(z)=P(Z≀z)\Phi(z) = P(Z \le z). This value is typically found using a Z-table.


    πŸ“ Standardization (Z-score)
    Z=Xβˆ’ΞΌΟƒZ = \frac{X - \mu}{\sigma}

    Variables:

      • XX = value from a Normal distribution

      • ΞΌ\mu = mean of XX

      • Οƒ\sigma = standard deviation of XX

      • ZZ = corresponding value in the Standard Normal distribution


    When to use: To convert any Normal random variable XX into a Standard Normal random variable ZZ, allowing the use of a standard Z-table to calculate probabilities.

    Worked Example (Probability Calculation):

    Problem: The height of adult males in a city is normally distributed with a mean of 175 cm and a standard deviation of 7 cm. What is the probability that a randomly selected male is between 168 cm and 182 cm tall? (Use Ξ¦(1)=0.8413\Phi(1)=0.8413, Ξ¦(βˆ’1)=0.1587\Phi(-1)=0.1587)

    Solution:

    Step 1: Identify parameters and values.
    ΞΌ=175\mu = 175, Οƒ=7\sigma = 7. We want to find P(168<X<182)P(168 < X < 182).

    Step 2: Standardize the values.
    For x1=168x_1 = 168:

    z1=168βˆ’1757=βˆ’77=βˆ’1z_1 = \frac{168 - 175}{7} = \frac{-7}{7} = -1

    For x2=182x_2 = 182:

    z2=182βˆ’1757=77=1z_2 = \frac{182 - 175}{7} = \frac{7}{7} = 1

    Step 3: Use the Standard Normal CDF (Ξ¦\Phi) to find the probability.

    P(168<X<182)=P(βˆ’1<Z<1)P(168 < X < 182) = P(-1 < Z < 1)
    P(βˆ’1<Z<1)=Ξ¦(1)βˆ’Ξ¦(βˆ’1)P(-1 < Z < 1) = \Phi(1) - \Phi(-1)
    P(βˆ’1<Z<1)=0.8413βˆ’0.1587P(-1 < Z < 1) = 0.8413 - 0.1587
    P(βˆ’1<Z<1)=0.6826P(-1 < Z < 1) = 0.6826

    Answer: 0.68260.6826









    ΞΌ\mu


    x1x_1

    x2x_2


    P(x1<X<x2)P(x_1 < X < x_2)
    XX

    ❗ Central Limit Theorem (CLT)

    For a sufficiently large sample size nn, the distribution of the sample mean Xˉ\bar{X} of nn independent and identically distributed (i.i.d.) random variables, each with mean μ\mu and finite variance σ2\sigma^2, will be approximately normally distributed, regardless of the original distribution of the individual variables.
    The sample mean Xˉ\bar{X} will have:

      • Mean: E[XΛ‰]=ΞΌE[\bar{X}] = \mu

      • Standard Deviation: SD(XΛ‰)=ΟƒnSD(\bar{X}) = \frac{\sigma}{\sqrt{n}} (also called the standard error of the mean)
        So, XΛ‰βˆΌN(ΞΌ,Οƒ2n)\bar{X} \sim N\left(\mu, \frac{\sigma^2}{n}\right) approximately for large nn.

    Worked Example (CLT and Parameter Estimation):

    Problem: Suppose the individual scores on an exam are normally distributed with an unknown mean ΞΌ\mu and standard deviation Οƒ\sigma. A candidate fails if they score below 35% and passes with distinction if they score above 80%. In a large group, 16% fail and 2% pass with distinction. Find ΞΌ\mu and Οƒ\sigma. (Use Ξ¦(βˆ’1)=0.16\Phi(-1)=0.16, Ξ¦(2)=0.98\Phi(2)=0.98)

    Solution:

    Step 1: Set up equations based on the given probabilities and Z-scores.
    Let XX be the score on the exam.
    We are given P(X<35)=0.16P(X < 35) = 0.16.
    Standardize X=35X=35: Z1=35βˆ’ΞΌΟƒZ_1 = \frac{35 - \mu}{\sigma}.
    From the Z-table, P(Z<βˆ’1)=0.16P(Z < -1) = 0.16, so Z1=βˆ’1Z_1 = -1.

    35βˆ’ΞΌΟƒ=βˆ’1(EquationΒ 1)\frac{35 - \mu}{\sigma} = -1 \quad (\text{Equation } 1)

    We are given P(X>80)=0.02P(X > 80) = 0.02.
    This means P(X≀80)=1βˆ’0.02=0.98P(X \le 80) = 1 - 0.02 = 0.98.
    Standardize X=80X=80: Z2=80βˆ’ΞΌΟƒZ_2 = \frac{80 - \mu}{\sigma}.
    From the Z-table, P(Z≀2)=0.98P(Z \le 2) = 0.98, so Z2=2Z_2 = 2.

    80βˆ’ΞΌΟƒ=2(EquationΒ 2)\frac{80 - \mu}{\sigma} = 2 \quad (\text{Equation } 2)

    Step 2: Solve the system of linear equations.
    From Equation 1:

    35βˆ’ΞΌ=βˆ’Οƒ35 - \mu = -\sigma

    ΞΌβˆ’Οƒ=35(EquationΒ 3)\mu - \sigma = 35 \quad (\text{Equation } 3)

    From Equation 2:

    80βˆ’ΞΌ=2Οƒ80 - \mu = 2\sigma

    ΞΌ+2Οƒ=80(EquationΒ 4)\mu + 2\sigma = 80 \quad (\text{Equation } 4)

    Subtract Equation 3 from Equation 4:

    (ΞΌ+2Οƒ)βˆ’(ΞΌβˆ’Οƒ)=80βˆ’35(\mu + 2\sigma) - (\mu - \sigma) = 80 - 35

    3Οƒ=453\sigma = 45

    Οƒ=15\sigma = 15

    Substitute Οƒ=15\sigma=15 into Equation 3:

    ΞΌβˆ’15=35\mu - 15 = 35

    ΞΌ=35+15\mu = 35 + 15

    ΞΌ=50\mu = 50

    Answer: ΞΌ=50\mu = 50 and Οƒ=15\sigma = 15.

    ---

    2.4 Gamma Distribution

    The Gamma distribution is a versatile continuous distribution that generalizes the exponential distribution. It is often used to model waiting times for multiple events or the sum of independent exponentially distributed random variables.

    πŸ“ Gamma PDF
    f(x)=Ξ²Ξ±Ξ“(Ξ±)xΞ±βˆ’1eβˆ’Ξ²xforΒ x>0f(x) = \frac{\beta^\alpha}{\Gamma(\alpha)} x^{\alpha-1} e^{-\beta x} \quad \text{for } x > 0

    Variables:

      • XX = Gamma random variable

      • Ξ±\alpha = shape parameter (Ξ±>0\alpha > 0)

      • Ξ²\beta = rate parameter (Ξ²>0\beta > 0)

      • Ξ“(Ξ±)\Gamma(\alpha) = Gamma function, Ξ“(z)=∫0∞tzβˆ’1eβˆ’tdt\Gamma(z) = \int_{0}^{\infty} t^{z-1} e^{-t} dt. For positive integers, Ξ“(n)=(nβˆ’1)!\Gamma(n) = (n-1)!.


    When to use: For modeling waiting times (e.g., in queuing theory), or when a variable is a sum of several independent exponential variables.

    Properties:

    • Mean: E[X]=Ξ±Ξ²E[X] = \frac{\alpha}{\beta}

    • Variance: Var(X)=Ξ±Ξ²2Var(X) = \frac{\alpha}{\beta^2}

    • If Ξ±=1\alpha=1, the Gamma distribution reduces to the Exponential distribution with rate Ξ²\beta.


    Worked Example:

    Problem: The lifespan of a device follows a Gamma distribution. Historical data suggests the mean lifespan is 6 years and the variance is 12 years2^2. Find the parameters Ξ±\alpha and Ξ²\beta of this distribution.

    Solution:

    Step 1: Write down the equations for mean and variance in terms of Ξ±\alpha and Ξ²\beta.
    E[X]=Ξ±Ξ²=6E[X] = \frac{\alpha}{\beta} = 6
    Var(X)=Ξ±Ξ²2=12Var(X) = \frac{\alpha}{\beta^2} = 12

    Step 2: Solve the system of equations for Ξ±\alpha and Ξ²\beta.
    From the mean equation, Ξ±=6Ξ²\alpha = 6\beta.
    Substitute this into the variance equation:

    6Ξ²Ξ²2=12\frac{6\beta}{\beta^2} = 12
    6Ξ²=12\frac{6}{\beta} = 12
    Ξ²=612\beta = \frac{6}{12}
    Ξ²=0.5\beta = 0.5

    Now substitute Ξ²=0.5\beta = 0.5 back into the equation for Ξ±\alpha:

    Ξ±=6Γ—0.5\alpha = 6 \times 0.5

    Ξ±=3\alpha = 3

    Answer: Ξ±=3\alpha = 3 and Ξ²=0.5\beta = 0.5.

    ---

    2.5 Beta Distribution

    The Beta distribution is a continuous probability distribution defined on the interval [0,1][0, 1]. It is particularly useful for modeling probabilities or proportions, as its values are naturally constrained within this range.

    πŸ“ Beta PDF
    f(x)=1B(Ξ±,Ξ²)xΞ±βˆ’1(1βˆ’x)Ξ²βˆ’1forΒ 0<x<1f(x) = \frac{1}{B(\alpha, \beta)} x^{\alpha-1} (1-x)^{\beta-1} \quad \text{for } 0 < x < 1

    Variables:

      • XX = Beta random variable

      • Ξ±\alpha = shape parameter (Ξ±>0\alpha > 0)

      • Ξ²\beta = shape parameter (Ξ²>0\beta > 0)

      • B(Ξ±,Ξ²)B(\alpha, \beta) = Beta function, B(Ξ±,Ξ²)=Ξ“(Ξ±)Ξ“(Ξ²)Ξ“(Ξ±+Ξ²)B(\alpha, \beta) = \frac{\Gamma(\alpha)\Gamma(\beta)}{\Gamma(\alpha+\beta)}


    When to use: For modeling proportions, probabilities, or quantities constrained between 0 and 1 (e.g., success rates, market share).

    Properties:

    • Mean: E[X]=Ξ±Ξ±+Ξ²E[X] = \frac{\alpha}{\alpha+\beta}

    • Variance: Var(X)=Ξ±Ξ²(Ξ±+Ξ²)2(Ξ±+Ξ²+1)Var(X) = \frac{\alpha\beta}{(\alpha+\beta)^2(\alpha+\beta+1)}

    • Mode: Mode(X)=Ξ±βˆ’1Ξ±+Ξ²βˆ’2Mode(X) = \frac{\alpha-1}{\alpha+\beta-2} (for Ξ±>1,Ξ²>1\alpha > 1, \beta > 1)


    Worked Example:

    Problem: The proportion of defective items produced by a machine follows a Beta distribution. From historical data, the mean proportion of defective items is 0.25 and the mode is 0.20. Find the parameters Ξ±\alpha and Ξ²\beta of this distribution.

    Solution:

    Step 1: Write down the equations for mean and mode in terms of Ξ±\alpha and Ξ²\beta.
    E[X]=Ξ±Ξ±+Ξ²=0.25E[X] = \frac{\alpha}{\alpha+\beta} = 0.25
    Mode(X)=Ξ±βˆ’1Ξ±+Ξ²βˆ’2=0.20Mode(X) = \frac{\alpha-1}{\alpha+\beta-2} = 0.20

    Step 2: Solve the system of equations.
    From the mean equation:

    Ξ±Ξ±+Ξ²=0.25\frac{\alpha}{\alpha+\beta} = 0.25

    Ξ±=0.25(Ξ±+Ξ²)\alpha = 0.25(\alpha+\beta)

    Ξ±=0.25Ξ±+0.25Ξ²\alpha = 0.25\alpha + 0.25\beta

    0.75Ξ±=0.25Ξ²0.75\alpha = 0.25\beta

    Ξ²=3Ξ±(EquationΒ 1)\beta = 3\alpha \quad (\text{Equation } 1)

    Substitute Ξ²=3Ξ±\beta = 3\alpha into the mode equation:

    Ξ±βˆ’1Ξ±+3Ξ±βˆ’2=0.20\frac{\alpha-1}{\alpha+3\alpha-2} = 0.20

    Ξ±βˆ’14Ξ±βˆ’2=0.20\frac{\alpha-1}{4\alpha-2} = 0.20

    Ξ±βˆ’1=0.20(4Ξ±βˆ’2)\alpha-1 = 0.20(4\alpha-2)

    Ξ±βˆ’1=0.8Ξ±βˆ’0.4\alpha-1 = 0.8\alpha - 0.4

    Ξ±βˆ’0.8Ξ±=1βˆ’0.4\alpha - 0.8\alpha = 1 - 0.4

    0.2Ξ±=0.60.2\alpha = 0.6

    Ξ±=0.60.2\alpha = \frac{0.6}{0.2}

    Ξ±=3\alpha = 3

    Substitute Ξ±=3\alpha=3 back into Equation 1:

    Ξ²=3Γ—3\beta = 3 \times 3

    Ξ²=9\beta = 9

    Answer: Ξ±=3\alpha = 3 and Ξ²=9\beta = 9.

    ---

    Problem-Solving Strategies

    πŸ’‘ CMI Strategy

    • Identify the Distribution: Carefully read the problem statement to determine which standard distribution best models the scenario. Look for keywords (e.g., "number of successes in nn trials" β†’\to Binomial; "average rate of events" β†’\to Poisson/Exponential; "mean and standard deviation" β†’\to Normal; "proportion" β†’\to Beta).

    • Extract Parameters: Identify all given parameters (ΞΌ,Οƒ,n,p,Ξ»,Ξ±,Ξ²\mu, \sigma, n, p, \lambda, \alpha, \beta) and what you need to find.

    • Standardize for Normal: If dealing with a Normal distribution, always standardize the variable to a Z-score to use the standard normal table/CDF.

    • Use CDF for Range Probabilities: For continuous distributions, P(a<X<b)=F(b)βˆ’F(a)P(a < X < b) = F(b) - F(a). For Normal, this becomes Ξ¦(Zb)βˆ’Ξ¦(Za)\Phi(Z_b) - \Phi(Z_a). Remember P(X>x)=1βˆ’P(X≀x)P(X>x) = 1 - P(X \le x).

    • Parameter Estimation: If mean/variance/mode are given, set up simultaneous equations to solve for the distribution's parameters (Ξ±,Ξ²,ΞΌ,Οƒ\alpha, \beta, \mu, \sigma).

    • Approximations: Recall when Poisson can approximate Binomial (nn large, pp small, Ξ»=np\lambda=np).

    ---

    Common Mistakes

    ⚠️ Avoid These Errors
      • ❌ Confusing PMF and PDF: Using integration for discrete distributions or summing for continuous distributions.
    βœ… Correct: Use PMF for discrete (summation for multiple values), PDF for continuous (integration for ranges).
      • ❌ Incorrect Z-score Calculation: Forgetting to subtract the mean or divide by the standard deviation when standardizing a normal variable.
    βœ… Correct: Always use Z=(Xβˆ’ΞΌ)/ΟƒZ = (X - \mu) / \sigma. For sample mean, use Z=(XΛ‰βˆ’ΞΌ)/(Οƒ/n)Z = (\bar{X} - \mu) / (\sigma/\sqrt{n}).
      • ❌ Misinterpreting Z-table Values: Directly using Ξ¦(z)\Phi(z) for P(Z>z)P(Z > z).
    βœ… Correct: P(Z>z)=1βˆ’Ξ¦(z)P(Z > z) = 1 - \Phi(z). Use symmetry P(Z<βˆ’z)=P(Z>z)=1βˆ’Ξ¦(z)P(Z < -z) = P(Z > z) = 1 - \Phi(z).
      • ❌ Ignoring Distribution Domain: Calculating probabilities outside the valid range (e.g., negative time for Exponential, values outside [0,1][0,1] for Beta).
    βœ… Correct: Always respect the domain of the random variable.
      • ❌ Parameter Estimation Errors: Incorrectly setting up equations for mean/variance/mode for a specific distribution.
    βœ… Correct: Memorize or correctly derive the formulas for mean, variance, and mode for each distribution.
      • ❌ Forgetting nn in CLT: When dealing with sample means, failing to divide Οƒ\sigma by n\sqrt{n} for the standard error. βœ… Correct: The standard deviation of the sample mean is ΟƒXΛ‰=Οƒ/n\sigma_{\bar{X}} = \sigma/\sqrt{n}.

    ---

    Practice Questions

    :::question type="MCQ" question="A call center receives calls at an average rate of 20 calls per hour. What is the probability that exactly 15 calls are received in a 30-minute interval?" options=["eβˆ’10101515!\frac{e^{-10} 10^{15}}{15!}","eβˆ’20201515!\frac{e^{-20} 20^{15}}{15!}","eβˆ’10151010!\frac{e^{-10} 15^{10}}{10!}","eβˆ’20152020!\frac{e^{-20} 15^{20}}{20!}"] answer="A" hint="Adjust the average rate to match the given time interval before applying the Poisson PMF." solution="Step 1: Determine the average rate for the given interval.
    The average rate is 20 calls per hour. For a 30-minute interval (0.5 hours), the average rate Ξ»\lambda will be:

    Ξ»=20Β calls/hourΓ—0.5Β hours=10Β calls\lambda = 20 \text{ calls/hour} \times 0.5 \text{ hours} = 10 \text{ calls}

    Step 2: Apply the Poisson Probability Mass Function (PMF).
    The Poisson PMF is P(X=k)=eβˆ’Ξ»Ξ»kk!P(X=k) = \frac{e^{-\lambda} \lambda^k}{k!}.
    Here, Ξ»=10\lambda = 10 and k=15k = 15.

    P(X=15)=eβˆ’10101515!P(X=15) = \frac{e^{-10} 10^{15}}{15!}
    The correct option is A." :::

    :::question type="NAT" question="The scores on a standardized test are normally distributed with a mean of 600 and a standard deviation of 100. If a student scores 750, what is their Z-score? Report to two decimal places." answer="1.50" hint="Use the Z-score formula Z=(Xβˆ’ΞΌ)/ΟƒZ = (X - \mu) / \sigma." solution="Step 1: Identify the given values.
    X=750X = 750 (student's score)
    ΞΌ=600\mu = 600 (mean score)
    Οƒ=100\sigma = 100 (standard deviation)

    Step 2: Apply the Z-score formula.

    Z=Xβˆ’ΞΌΟƒZ = \frac{X - \mu}{\sigma}
    Z=750βˆ’600100Z = \frac{750 - 600}{100}
    Z=150100Z = \frac{150}{100}
    Z=1.5Z = 1.5

    The Z-score is 1.50."
    :::

    :::question type="MSQ" question="A quality control process inspects batches of 50 items. Each item has a 1% chance of being defective, independently. Which of the following statements are correct?" options=["The number of defective items in a batch follows a Binomial distribution.","The probability of finding exactly 1 defective item in a batch is (501)(0.01)1(0.99)49\binom{50}{1} (0.01)^1 (0.99)^{49}.","The mean number of defective items in a batch is 0.5.","The Poisson approximation to this distribution would use Ξ»=50\lambda = 50."] answer="A,B,C" hint="Identify the distribution type and its parameters. Check the conditions for Poisson approximation." solution="Statement A: The number of defective items in a fixed number of independent trials (50 items) with a constant probability of success (1% defect) follows a Binomial distribution. This is correct.

    Statement B: For a Binomial distribution B(n,p)B(n,p), P(X=k)=(nk)pk(1βˆ’p)nβˆ’kP(X=k) = \binom{n}{k} p^k (1-p)^{n-k}. Here n=50n=50, p=0.01p=0.01, k=1k=1. So P(X=1)=(501)(0.01)1(0.99)49P(X=1) = \binom{50}{1} (0.01)^1 (0.99)^{49}. This is correct.

    Statement C: The mean of a Binomial distribution is E[X]=npE[X] = np. Here E[X]=50Γ—0.01=0.5E[X] = 50 \times 0.01 = 0.5. This is correct.

    Statement D: The Poisson approximation to a Binomial distribution uses Ξ»=np\lambda = np. Here Ξ»=50Γ—0.01=0.5\lambda = 50 \times 0.01 = 0.5. The statement says Ξ»=50\lambda=50, which is incorrect. This is incorrect.

    Therefore, statements A, B, and C are correct."
    :::

    :::question type="SUB" question="The time (in minutes) a customer spends waiting for a service representative follows an exponential distribution. If 80% of customers wait longer than 5 minutes, what is the average waiting time (in minutes)? Report to two decimal places." answer="22.40" hint="Use the CDF of the exponential distribution and its relationship with the mean." solution="Step 1: Set up the probability statement using the Exponential CDF.
    Let XX be the waiting time. X∼Exp⁑(λ)X \sim \operatorname{Exp}(\lambda).
    We are given P(X>5)=0.80P(X > 5) = 0.80.
    We know P(X>x)=eβˆ’Ξ»xP(X > x) = e^{-\lambda x}.

    So, eβˆ’5Ξ»=0.80e^{-5\lambda} = 0.80.

    Step 2: Solve for Ξ»\lambda.
    Take the natural logarithm of both sides:

    ln⁑(eβˆ’5Ξ»)=ln⁑(0.80)\ln(e^{-5\lambda}) = \ln(0.80)
    βˆ’5Ξ»=ln⁑(0.80)-5\lambda = \ln(0.80)
    βˆ’5Ξ»β‰ˆβˆ’0.22314355-5\lambda \approx -0.22314355
    Ξ»β‰ˆβˆ’0.22314355βˆ’5\lambda \approx \frac{-0.22314355}{-5}
    Ξ»β‰ˆ0.04462871\lambda \approx 0.04462871

    Step 3: Calculate the average waiting time (mean).
    For an Exponential distribution, the mean is E[X]=1/Ξ»E[X] = 1/\lambda.

    E[X]=10.04462871E[X] = \frac{1}{0.04462871}
    E[X]β‰ˆ22.4045E[X] \approx 22.4045

    Rounding to two decimal places, E[X]β‰ˆ22.40E[X] \approx 22.40.
    The average waiting time is 22.40 minutes."
    :::

    ---

    Chapter Summary

    πŸ“– Random Variables and Distributions - Key Takeaways

    To excel in CMI, a deep understanding of Random Variables and Distributions is fundamental. Here are the most crucial points you must internalize:

    • Random Variables (RVs): Understand the formal definition of a random variable as a function mapping outcomes from a sample space to real numbers. Differentiate clearly between discrete and continuous random variables and their respective characteristics.

    • Probability Mass Function (PMF), Probability Density Function (PDF), and Cumulative Distribution Function (CDF):

    • Know the definitions and properties of PMF (for discrete RVs), PDF (for continuous RVs), and CDF (for both).
      Master how to calculate probabilities using these functions, including P(a≀X≀b)=FX(b)βˆ’FX(a)P(a \le X \le b) = F_X(b) - F_X(a) for CDFs, and using integration/summation for PDFs/PMFs.
      Understand the relationship between PDF/PMF and CDF: FX(x)=βˆ‘t≀xP(X=t)F_X(x) = \sum_{t \le x} P(X=t) or FX(x)=βˆ«βˆ’βˆžxfX(t)dtF_X(x) = \int_{-\infty}^{x} f_X(t) dt, and fX(x)=FXβ€²(x)f_X(x) = F_X'(x).
    • Expectation and Variance:

    • Memorize the definitions of expectation E[X]E[X] and variance Var[X]Var[X] for both discrete and continuous RVs.
      Crucially, understand and apply their properties: Linearity of Expectation (E[aX+bY]=aE[X]+bE[Y]E[aX+bY] = aE[X]+bE[Y]) and properties of variance (Var[aX+b]=a2Var[X]Var[aX+b] = a^2Var[X]).
      Be proficient in calculating E[g(X)]E[g(X)] using the Law of the Unconscious Statistician (LOTUS).
    • Standard Distributions: Be thoroughly familiar with the key properties (parameters, PMF/PDF, mean, variance, typical shape) of the most common distributions:

    • Discrete: Bernoulli, Binomial, Poisson, Geometric.
      Continuous: Uniform, Exponential, Normal (Gaussian), Gamma.
      Recognize scenarios where each distribution is applicable.
    • Moment Generating Functions (MGFs):

    • Understand the definition MX(t)=E[etX]M_X(t) = E[e^{tX}].
      Know how to use MGFs to find moments (E[Xn]=MX(n)(0)E[X^n] = M_X^{(n)}(0)) and, more importantly, to uniquely identify the distribution of an RV.
      Be familiar with MGFs of standard distributions.
    • Transformations of Random Variables: Master techniques for finding the PMF/PDF of a new random variable Y=g(X)Y = g(X) given the distribution of XX. This often involves using the CDF method or the change-of-variable formula for continuous RVs.

    ---

    Chapter Review Questions

    :::question type="MCQ" question="Let XX be a continuous random variable with probability density function (PDF):

    fX(x)={c(1βˆ’x2)forΒ βˆ’1≀x≀10otherwisef_X(x) = \begin{cases} c(1-x^2) & \text{for } -1 \le x \le 1 \\ 0 & \text{otherwise} \end{cases}

    Which of the following statements is TRUE?
    (I) The constant c=34c = \frac{3}{4}.
    (II) P(X>0)=12P(X > 0) = \frac{1}{2}.
    (III) E[X]=0E[X] = 0.
    (IV) Var[X]=25Var[X] = \frac{2}{5}.
    " options=["A) (I) and (II) only" "B) (I), (II) and (III) only" "C) (I), (II), (III) and (IV)" "D) (I), (III) and (IV) only"] answer="B" hint="Remember the properties of a PDF: it must integrate to 1. Also, leverage symmetry to simplify calculations for expectation and probability." solution="Let's analyze each statement:

    (I) The constant c=34c = \frac{3}{4}:
    For fX(x)f_X(x) to be a valid PDF, βˆ«βˆ’βˆžβˆžfX(x)dx=1\int_{-\infty}^{\infty} f_X(x) dx = 1.

    βˆ«βˆ’11c(1βˆ’x2)dx=1\int_{-1}^{1} c(1-x^2) dx = 1

    c[xβˆ’x33]βˆ’11=1c \left[ x - \frac{x^3}{3} \right]_{-1}^{1} = 1

    c[(1βˆ’13)βˆ’(βˆ’1βˆ’βˆ’13)]=1c \left[ \left(1 - \frac{1}{3}\right) - \left(-1 - \frac{-1}{3}\right) \right] = 1

    c[(23)βˆ’(βˆ’23)]=1c \left[ \left(\frac{2}{3}\right) - \left(-\frac{2}{3}\right) \right] = 1

    c[43]=1β€…β€ŠβŸΉβ€…β€Šc=34c \left[ \frac{4}{3} \right] = 1 \implies c = \frac{3}{4}

    So, statement (I) is TRUE.

    (II) P(X>0)=12P(X > 0) = \frac{1}{2}:

    P(X>0)=∫0134(1βˆ’x2)dxP(X > 0) = \int_{0}^{1} \frac{3}{4}(1-x^2) dx

    =34[xβˆ’x33]01= \frac{3}{4} \left[ x - \frac{x^3}{3} \right]_{0}^{1}

    =34[(1βˆ’13)βˆ’(0βˆ’0)]= \frac{3}{4} \left[ \left(1 - \frac{1}{3}\right) - (0 - 0) \right]

    =34[23]=12= \frac{3}{4} \left[ \frac{2}{3} \right] = \frac{1}{2}

    Alternatively, since fX(x)f_X(x) is symmetric about x=0x=0 (fX(x)=fX(βˆ’x)f_X(x) = f_X(-x)), P(X>0)=P(X<0)=12P(X > 0) = P(X < 0) = \frac{1}{2}.
    So, statement (II) is TRUE.

    (III) E[X]=0E[X] = 0:

    E[X]=βˆ«βˆ’11xβ‹…34(1βˆ’x2)dx=34βˆ«βˆ’11(xβˆ’x3)dxE[X] = \int_{-1}^{1} x \cdot \frac{3}{4}(1-x^2) dx = \frac{3}{4} \int_{-1}^{1} (x-x^3) dx

    Since (xβˆ’x3)(x-x^3) is an odd function and the integration interval is symmetric about 0, the integral is 0.
    So, statement (III) is TRUE.

    (IV) Var[X]=25Var[X] = \frac{2}{5}:
    Var[X]=E[X2]βˆ’(E[X])2Var[X] = E[X^2] - (E[X])^2. Since E[X]=0E[X]=0, Var[X]=E[X2]Var[X] = E[X^2].

    E[X2]=βˆ«βˆ’11x2β‹…34(1βˆ’x2)dx=34βˆ«βˆ’11(x2βˆ’x4)dxE[X^2] = \int_{-1}^{1} x^2 \cdot \frac{3}{4}(1-x^2) dx = \frac{3}{4} \int_{-1}^{1} (x^2-x^4) dx

    Since (x2βˆ’x4)(x^2-x^4) is an even function, we can write:
    E[X2]=34β‹…2∫01(x2βˆ’x4)dxE[X^2] = \frac{3}{4} \cdot 2 \int_{0}^{1} (x^2-x^4) dx

    =32[x33βˆ’x55]01= \frac{3}{2} \left[ \frac{x^3}{3} - \frac{x^5}{5} \right]_{0}^{1}

    =32[(13βˆ’15)βˆ’(0)]= \frac{3}{2} \left[ \left(\frac{1}{3} - \frac{1}{5}\right) - (0) \right]

    =32[5βˆ’315]=32[215]=15= \frac{3}{2} \left[ \frac{5-3}{15} \right] = \frac{3}{2} \left[ \frac{2}{15} \right] = \frac{1}{5}

    So, Var[X]=15Var[X] = \frac{1}{5}. Therefore, statement (IV) Var[X]=25Var[X] = \frac{2}{5} is FALSE.

    Based on the analysis, statements (I), (II), and (III) are TRUE. The correct option is B."
    :::

    :::question type="NAT" question="A fair six-sided die is rolled repeatedly. Let XX be the number of rolls until a '6' appears for the first time. Let YY be the number of rolls until a '6' appears for the second time. Find E[Y∣X=1]E[Y|X=1]. (Enter your answer as a plain number)." answer="7" hint="Consider the nature of the geometric distribution and the memoryless property. If the first '6' occurs on the 1st roll, how many additional rolls are needed for the second '6'?" solution="Let XX be the number of rolls until the first '6' appears. XX follows a Geometric distribution with p=1/6p = 1/6.
    Let YY be the number of rolls until the second '6' appears.

    We are asked to find E[Y∣X=1]E[Y|X=1].
    Given that the first '6' appeared on the 1st roll, this means roll 1 was a '6'.
    Now, we need to find the expected number of additional rolls from roll 2 onwards until the second '6' appears.
    Let ZZ be the number of additional rolls needed after the 1st roll for the second '6' to appear.
    Since die rolls are independent and the probability of rolling a '6' remains p=1/6p=1/6 for each subsequent roll, ZZ also follows a Geometric distribution with parameter p=1/6p=1/6.
    The expected value of a Geometric distribution (number of trials until the first success) is 1/p1/p.
    So, E[Z]=1/p=1/(1/6)=6E[Z] = 1/p = 1/(1/6) = 6.

    The total number of rolls until the second '6' appears, YY, can be expressed as Y=X+ZY = X + Z.
    Therefore, E[Y∣X=1]=E[1+Z∣X=1]E[Y|X=1] = E[1 + Z | X=1].
    Since 11 is a fixed value in the conditional expectation, and ZZ is independent of X=1X=1 (due to the memoryless property of the process), we have:
    E[Y∣X=1]=1+E[Z]E[Y|X=1] = 1 + E[Z]
    E[Y∣X=1]=1+6=7E[Y|X=1] = 1 + 6 = 7.

    The expected value is 7."
    :::

    :::question type="MCQ" question="Let XX be a random variable with Moment Generating Function (MGF) MX(t)=e2t1βˆ’3tM_X(t) = \frac{e^{2t}}{1-3t} for t<13t < \frac{1}{3}.
    Which of the following is the variance of XX, Var[X]Var[X]?
    " options=["A) 3" "B) 9" "C) 11" "D) 13"] answer="B" hint="Recall that MXβ€²(0)=E[X]M_X'(0) = E[X] and MXβ€²β€²(0)=E[X2]M_X''(0) = E[X^2]. Then use Var[X]=E[X2]βˆ’(E[X])2Var[X] = E[X^2] - (E[X])^2." solution="The Moment Generating Function (MGF) is given by MX(t)=e2t(1βˆ’3t)βˆ’1M_X(t) = e^{2t}(1-3t)^{-1}.

    First, we find E[X]=MXβ€²(0)E[X] = M_X'(0).
    Using the product rule (uv)β€²=uβ€²v+uvβ€²(uv)' = u'v + uv':
    Let u=e2tu = e^{2t} and v=(1βˆ’3t)βˆ’1v = (1-3t)^{-1}.
    Then uβ€²=2e2tu' = 2e^{2t} and vβ€²=βˆ’1(1βˆ’3t)βˆ’2(βˆ’3)=3(1βˆ’3t)βˆ’2v' = -1(1-3t)^{-2}(-3) = 3(1-3t)^{-2}.

    So, MXβ€²(t)=(2e2t)(1βˆ’3t)βˆ’1+(e2t)(3(1βˆ’3t)βˆ’2)M_X'(t) = (2e^{2t})(1-3t)^{-1} + (e^{2t})(3(1-3t)^{-2}).
    Now, evaluate at t=0t=0:
    MXβ€²(0)=(2e0)(1βˆ’0)βˆ’1+(e0)(3(1βˆ’0)βˆ’2)M_X'(0) = (2e^0)(1-0)^{-1} + (e^0)(3(1-0)^{-2})
    MXβ€²(0)=2(1)(1)+1(3)(1)=2+3=5M_X'(0) = 2(1)(1) + 1(3)(1) = 2 + 3 = 5.
    So, E[X]=5E[X] = 5.

    Next, we find E[X2]=MXβ€²β€²(0)E[X^2] = M_X''(0).
    We need to differentiate MXβ€²(t)=2e2t(1βˆ’3t)βˆ’1+3e2t(1βˆ’3t)βˆ’2M_X'(t) = 2e^{2t}(1-3t)^{-1} + 3e^{2t}(1-3t)^{-2}.
    Let MXβ€²(t)=A(t)+B(t)M_X'(t) = A(t) + B(t), where A(t)=2e2t(1βˆ’3t)βˆ’1A(t) = 2e^{2t}(1-3t)^{-1} and B(t)=3e2t(1βˆ’3t)βˆ’2B(t) = 3e^{2t}(1-3t)^{-2}.

    For A(t)A(t):
    Aβ€²(t)=(2e2tβ‹…2)(1βˆ’3t)βˆ’1+(2e2t)(βˆ’1(1βˆ’3t)βˆ’2(βˆ’3))A'(t) = (2e^{2t} \cdot 2)(1-3t)^{-1} + (2e^{2t})(-1(1-3t)^{-2}(-3))
    Aβ€²(t)=4e2t(1βˆ’3t)βˆ’1+6e2t(1βˆ’3t)βˆ’2A'(t) = 4e^{2t}(1-3t)^{-1} + 6e^{2t}(1-3t)^{-2}.
    At t=0t=0: Aβ€²(0)=4(1)(1)+6(1)(1)=4+6=10A'(0) = 4(1)(1) + 6(1)(1) = 4+6=10.

    For B(t)B(t):
    Bβ€²(t)=(3e2tβ‹…2)(1βˆ’3t)βˆ’2+(3e2t)(βˆ’2(1βˆ’3t)βˆ’3(βˆ’3))B'(t) = (3e^{2t} \cdot 2)(1-3t)^{-2} + (3e^{2t})(-2(1-3t)^{-3}(-3))
    Bβ€²(t)=6e2t(1βˆ’3t)βˆ’2+18e2t(1βˆ’3t)βˆ’3B'(t) = 6e^{2t}(1-3t)^{-2} + 18e^{2t}(1-3t)^{-3}.
    At t=0t=0: Bβ€²(0)=6(1)(1)+18(1)(1)=6+18=24B'(0) = 6(1)(1) + 18(1)(1) = 6+18=24.

    So, MXβ€²β€²(0)=Aβ€²(0)+Bβ€²(0)=10+24=34M_X''(0) = A'(0) + B'(0) = 10 + 24 = 34.
    Thus, E[X2]=34E[X^2] = 34.

    Finally, Var[X]=E[X2]βˆ’(E[X])2Var[X] = E[X^2] - (E[X])^2.
    Var[X]=34βˆ’(5)2=34βˆ’25=9Var[X] = 34 - (5)^2 = 34 - 25 = 9.

    The correct option is B.

    Alternatively, recognize the MGF.
    The MGF of an Exponential distribution with rate Ξ»\lambda is Ξ»Ξ»βˆ’t\frac{\lambda}{\lambda-t}.
    The MGF of X0∼Exp⁑(Ξ»)X_0 \sim \operatorname{Exp}(\lambda) is MX0(t)=Ξ»Ξ»βˆ’tM_{X_0}(t) = \frac{\lambda}{\lambda-t}.
    The MGF of X1∼Exp⁑(1/3)X_1 \sim \operatorname{Exp}(1/3) is 1/31/3βˆ’t=11βˆ’3t\frac{1/3}{1/3-t} = \frac{1}{1-3t}.
    The MGF of Y=X1+2Y = X_1 + 2 is E[et(X1+2)]=E[etX1e2t]=e2tE[etX1]=e2t11βˆ’3tE[e^{t(X_1+2)}] = E[e^{tX_1} e^{2t}] = e^{2t} E[e^{tX_1}] = e^{2t} \frac{1}{1-3t}.
    So XX is distributed as an Exp⁑(1/3)\operatorname{Exp}(1/3) random variable shifted by 2.
    Let X1∼Exp⁑(1/3)X_1 \sim \operatorname{Exp}(1/3). Then E[X1]=1/λ=3E[X_1] = 1/\lambda = 3 and Var[X1]=1/λ2=9Var[X_1] = 1/\lambda^2 = 9.
    X=X1+2X = X_1 + 2.
    E[X]=E[X1+2]=E[X1]+2=3+2=5E[X] = E[X_1+2] = E[X_1] + 2 = 3+2 = 5.
    Var[X]=Var[X1+2]=Var[X1]=9Var[X] = Var[X_1+2] = Var[X_1] = 9.
    The variance of XX is 9."
    :::

    :::question type="NAT" question="Let XX be a continuous random variable uniformly distributed on the interval (0,2)(0, 2). Define a new random variable Y=X2Y = X^2. Find E[Y]E[Y]. (Enter your answer as a plain number in decimal form, rounded to two decimal places)." answer="1.33" hint="First, determine the PDF of XX. Then, use the Law of the Unconscious Statistician (LOTUS) to compute E[Y]E[Y] without explicitly finding the PDF of YY." solution="The random variable XX is uniformly distributed on (0,2)(0, 2).
    Its PDF is given by:

    fX(x)={12βˆ’0=12forΒ 0<x<20otherwisef_X(x) = \begin{cases} \frac{1}{2-0} = \frac{1}{2} & \text{for } 0 < x < 2 \\ 0 & \text{otherwise} \end{cases}

    We want to find E[Y]E[Y] where Y=X2Y = X^2.
    Using the Law of the Unconscious Statistician (LOTUS), E[g(X)]=βˆ«βˆ’βˆžβˆžg(x)fX(x)dxE[g(X)] = \int_{-\infty}^{\infty} g(x) f_X(x) dx.
    Here, g(x)=x2g(x) = x^2.
    E[Y]=E[X2]=∫02x2β‹…12dxE[Y] = E[X^2] = \int_{0}^{2} x^2 \cdot \frac{1}{2} dx

    =12∫02x2dx= \frac{1}{2} \int_{0}^{2} x^2 dx

    =12[x33]02= \frac{1}{2} \left[ \frac{x^3}{3} \right]_{0}^{2}

    =12[233βˆ’033]= \frac{1}{2} \left[ \frac{2^3}{3} - \frac{0^3}{3} \right]

    =12[83]= \frac{1}{2} \left[ \frac{8}{3} \right]

    =43= \frac{4}{3}

    Now, we need to calculate the numerical value and round it to two decimal places.
    E[Y]=43β‰ˆ1.3333...E[Y] = \frac{4}{3} \approx 1.3333...
    Rounded to two decimal places, E[Y]β‰ˆ1.33E[Y] \approx 1.33.
    The expected value of YY is 1.33."
    :::

    ---

    What's Next?

    πŸ’‘ Continue Your CMI Journey

    You've mastered Random Variables and Distributions! This chapter is the bedrock for much of advanced probability theory and statistics.

    Key connections:
    Building on Previous Learning: This chapter heavily relies on your understanding of basic probability (sample spaces, events, conditional probability, independence) and calculus (integration, differentiation) for continuous random variables. A solid grasp of set theory is also beneficial for defining events and sample spaces.
    Paving the Way for Future Chapters: The concepts learned here are foundational for:
    Joint Distributions: Understanding how multiple random variables interact.
    Conditional Expectation: A deeper dive into expected values given certain conditions.
    Limit Theorems: The Law of Large Numbers (LLN) and the Central Limit Theorem (CLT), which are crucial for statistical inference, build directly on the properties of expectation and variance of random variables.
    Statistical Inference: Chapters on estimation (e.g., maximum likelihood estimation) and hypothesis testing rely on the distributions of sample statistics.
    * Stochastic Processes: Many advanced topics in probability and applied mathematics begin with discrete-time or continuous-time random variables.

    Keep practicing problems that combine these concepts, as CMI questions often integrate knowledge across multiple topics!

    🎯 Key Points to Remember

    • βœ“ Master the core concepts in Random Variables and Distributions before moving to advanced topics
    • βœ“ Practice with previous year questions to understand exam patterns
    • βœ“ Review short notes regularly for quick revision before exams

    Related Topics in Probability and Statistics

    More Resources

    Why Choose MastersUp?

    🎯

    AI-Powered Plans

    Personalized study schedules based on your exam date and learning pace

    πŸ“š

    15,000+ Questions

    Verified questions with detailed solutions from past papers

    πŸ“Š

    Smart Analytics

    Track your progress with subject-wise performance insights

    πŸ”–

    Bookmark & Revise

    Save important questions for quick revision before exams

    Start Your Free Preparation β†’

    No credit card required β€’ Free forever for basic features