Random Variables and Distributions

Overview

Welcome to the foundational chapter on Random Variables and Distributions, a cornerstone of your CMI Masters in Data Science curriculum. This chapter is absolutely critical, as it lays the theoretical and practical groundwork for understanding and applying nearly every statistical and machine learning concept you will encounter. Without a firm grasp of random variables and their distributions, topics like hypothesis testing, regression analysis, and even advanced deep learning architectures become abstract and difficult to interpret effectively.

In the CMI context, mastering this material is not just about theoretical understanding; it's about developing the intuition and analytical tools to tackle real-world data challenges. You'll learn how to mathematically model uncertainty, quantify variability, and make informed decisions based on probabilistic outcomes. This chapter directly addresses core competencies required for the CMI exams, ensuring you can correctly identify, apply, and interpret different probabilistic models crucial for data analysis and predictive modeling.

By the end of this chapter, you will possess the essential framework for reasoning about data generation processes, understanding the behavior of estimators, and interpreting the output of complex algorithms. This knowledge is indispensable for building robust data science solutions and effectively communicating insights, making it a high-yield area for your CMI success.

---

Chapter Contents

| # | Topic | What You'll Learn |
|---|-------|-------------------|
| 1 | Random Variables | Quantify outcomes of random experiments numerically. |
| 2 | Distribution Functions | Describe probability of variable taking values. |
| 3 | Expectation and Variance | Measure central tendency and data spread. |
| 4 | Standard Distributions | Explore common, well-understood probabilistic models. |

---

Learning Objectives

❗ By the End of This Chapter

After studying this chapter, you will be able to:

Define and classify discrete and continuous random variables, and understand their role in modeling uncertainty.

Calculate and interpret Probability Mass Functions ( $PMF$ ), Probability Density Functions ( $PDF$ ), and Cumulative Distribution Functions ( $CDF$ ).

Compute and explain the expectation, variance, and standard deviation of random variables.

Identify characteristics and apply properties of key standard distributions (e.g., Bernoulli, Binomial, Poisson, Uniform, Normal, Exponential).

---

Now let's begin with Random Variables...

Part 1: Random Variables

Introduction

In the realm of probability and statistics, a random variable serves as a fundamental concept, bridging the gap between abstract outcomes of a random experiment and numerical values. It is a function that assigns a real number to each outcome in the sample space of a random experiment. This transformation allows us to apply mathematical tools, such as algebra and calculus, to analyze the probabilities associated with these numerical outcomes.

Understanding random variables is crucial for CMI as it forms the bedrock for analyzing data, modeling uncertainty, and making predictions. In data science, almost every piece of data collected or generated can be viewed as a realization of one or more random variables, from the success rate of an algorithm to the error in a measurement. This unit will rigorously define random variables, explore their types, and detail methods for characterizing their behavior through probability distributions.

📖 Random Variable

A random variable $X$ is a function that maps each outcome $\omega$ in the sample space $\Omega$ of a random experiment to a unique real number.

X: \Omega \to \mathbb{R}

The set of all possible values that a random variable $X$ can take is called its range or support, denoted by $R_X$ .

---

Key Concepts

1. Types of Random Variables

Random variables are primarily classified into two types based on their range:

* Discrete Random Variables: A random variable is discrete if its range $R_X$ is a finite or countably infinite set of real numbers. These variables typically arise from counting processes.
* Examples: The number of heads in three coin flips ( $R_X = \{0, 1, 2, 3\}$ ), the number of customers arriving at a store in an hour ( $R_X = \{0, 1, 2, \dots\}$ ).

* Continuous Random Variables: A random variable is continuous if its range $R_X$ is an uncountable infinite set, typically an interval or a collection of intervals on the real line. These variables usually arise from measurements.
* Examples: The height of a student, the time it takes for a process to complete, the temperature of a room.

For the purpose of this chapter and the CMI exam, we will primarily focus on discrete random variables, as they are frequently encountered and directly relevant to the provided PYQ.

---

2. Probability Mass Function (PMF)

For a discrete random variable, its probability distribution is described by a Probability Mass Function (PMF). The PMF specifies the probability that the random variable takes on each of its possible values.

📖 Probability Mass Function (PMF)

For a discrete random variable $X$ with range $R_X = \{x_1, x_2, \dots\}$ , its Probability Mass Function (PMF), denoted by $p_X(x)$ or $P(X=x)$ , is a function such that:

$p_X(x) \ge 0$ for all $x \in R_X$ .

$\sum_{x \in R_X} p_X(x) = 1$

$p_X(x) = 0$ for $x \notin R_X$ .

The value $p_X(x)$ represents the probability that the random variable $X$ takes on the specific value $x$ .

Worked Example:

Problem: A fair coin is flipped three times. Let $X$ be the random variable representing the number of heads obtained. Determine the PMF of $X$ .

Solution:

Step 1: Identify the sample space and the values of $X$ .

The sample space $\Omega$ consists of $2^3 = 8$ equally likely outcomes:
$\Omega = \{HHH, HHT, HTH, THH, HTT, THT, TTH, TTT\}$

The random variable $X$ maps each outcome to the number of heads:
$X(HHH) = 3$
$X(HHT) = 2$
$X(HTH) = 2$
$X(THH) = 2$
$X(HTT) = 1$
$X(THT) = 1$
$X(TTH) = 1$
$X(TTT) = 0$

The range of $X$ is $R_X = \{0, 1, 2, 3\}$ .

Step 2: Calculate the probability for each value in $R_X$ .

Since each outcome in $\Omega$ has a probability of $1/8$ :

For $X=0$ : Only $TTT$ maps to $0$ .
$p_X(0) = P(X=0) = P(\{TTT\}) = \frac{1}{8}$

For $X=1$ : $HTT, THT, TTH$ map to $1$ .
$p_X(1) = P(X=1) = P(\{HTT, THT, TTH\}) = \frac{3}{8}$

For $X=2$ : $HHT, HTH, THH$ map to $2$ .
$p_X(2) = P(X=2) = P(\{HHT, HTH, THH\}) = \frac{3}{8}$

For $X=3$ : Only $HHH$ maps to $3$ .
$p_X(3) = P(X=3) = P(\{HHH\}) = \frac{1}{8}$

Step 3: Verify the properties of a PMF.

All $p_X(x) \ge 0$ .

\sum_{x \in R_X} p_X(x) = \frac{1}{8} + \frac{3}{8} + \frac{3}{8} + \frac{1}{8} = \frac{8}{8} = 1

Answer: The PMF of $X$ is:
$p_X(0) = 1/8$
$p_X(1) = 3/8$
$p_X(2) = 3/8$
$p_X(3) = 1/8$
and $p_X(x) = 0$ for $x \notin \{0, 1, 2, 3\}$ .

---

3. Cumulative Distribution Function (CDF)

The CDF provides a cumulative view of the probabilities, indicating the probability that a random variable $X$ takes on a value less than or equal to a given value $x$ .

📖 Cumulative Distribution Function (CDF)

The Cumulative Distribution Function (CDF) of a random variable $X$ , denoted by $F_X(x)$ , is defined for any real number $x$ as:

F_X(x) = P(X \le x)

For a discrete random variable, the CDF can be calculated by summing the PMF values:

F_X(x) = \sum_{x_i \le x} p_X(x_i)

Properties of a CDF:

$0 \le F_X(x) \le 1$ for all $x \in \mathbb{R}$ .

$F_X(x)$ is non-decreasing: if $a < b$ , then $F_X(a) \le F_X(b)$ .

$\lim_{x \to -\infty} F_X(x) = 0$

$\lim_{x \to \infty} F_X(x) = 1$

$\lim_{t \to x^+} F_X(t) = F_X(x)$

Worked Example:

Problem: Using the PMF from the previous example ( $X$ = number of heads in 3 coin flips), find the CDF of $X$ .

Solution:

Step 1: Recall the PMF values.
$p_X(0) = 1/8$
$p_X(1) = 3/8$
$p_X(2) = 3/8$
$p_X(3) = 1/8$

Step 2: Calculate $F_X(x)$ for different intervals of $x$ .

For $x < 0$ :
$F_X(x) = P(X \le x) = 0$ (since $X$ cannot be negative)

For $0 \le x < 1$ :
$F_X(x) = P(X \le x) = p_X(0) = \frac{1}{8}$

For $1 \le x < 2$ :
$F_X(x) = P(X \le x) = p_X(0) + p_X(1) = \frac{1}{8} + \frac{3}{8} = \frac{4}{8} = \frac{1}{2}$

For $2 \le x < 3$ :
$F_X(x) = P(X \le x) = p_X(0) + p_X(1) + p_X(2) = \frac{1}{8} + \frac{3}{8} + \frac{3}{8} = \frac{7}{8}$

For $x \ge 3$ :
$F_X(x) = P(X \le x) = p_X(0) + p_X(1) + p_X(2) + p_X(3) = \frac{1}{8} + \frac{3}{8} + \frac{3}{8} + \frac{1}{8} = \frac{8}{8} = 1$

Answer: The CDF of $X$ is:

F_X(x) = \begin{cases} 0 & x < 0 \\ 1/8 & 0 \le x < 1 \\ 1/2 & 1 \le x < 2 \\ 7/8 & 2 \le x < 3 \\ 1 & x \ge 3 \end{cases}

---

4. Functions of Random Variables

Often, we are interested in a new random variable $Y$ that is a function of an existing random variable $X$ , i.e., $Y = g(X)$ . To find the PMF of $Y$ , we need to identify the possible values of $Y$ and sum the probabilities of all $X$ values that map to each $Y$ value. This concept is directly tested in the provided CMI PYQ.

❗ Finding the PMF of Y = g(X)

Let $X$ be a discrete random variable with PMF $p_X(x)$ and range $R_X$ .
Let $Y = g(X)$ be a new discrete random variable.
The range of $Y$ is $R_Y = \{y | y = g(x) \text{ for some } x \in R_X\}$ .
The PMF of $Y$ , $p_Y(y)$ , is given by:

p_Y(y) = P(Y=y) = \sum_{x \in R_X: g(x)=y} p_X(x)

This means for each value $y$ in the range of $Y$ , we sum the probabilities of all $x$ values in $R_X$ that are mapped to $y$ by the function $g$ .

Worked Example:

Problem: Let $X$ be a discrete random variable with PMF given by:
$p_X(1) = 0.2$ , $p_X(2) = 0.3$ , $p_X(3) = 0.3$ , $p_X(4) = 0.2$ .
Let $Y = (X-2)^2$ . Find the PMF of $Y$ .

Solution:

Step 1: Identify the range of $X$ and its PMF.
$R_X = \{1, 2, 3, 4\}$
$p_X(1) = 0.2$
$p_X(2) = 0.3$
$p_X(3) = 0.3$
$p_X(4) = 0.2$

Step 2: Determine the possible values of $Y = (X-2)^2$ by applying the function $g(x) = (x-2)^2$ to each value in $R_X$ .

For $x=1$ : $y = (1-2)^2 = (-1)^2 = 1$
For $x=2$ : $y = (2-2)^2 = (0)^2 = 0$
For $x=3$ : $y = (3-2)^2 = (1)^2 = 1$
For $x=4$ : $y = (4-2)^2 = (2)^2 = 4$

The range of $Y$ is $R_Y = \{0, 1, 4\}$ .

Step 3: Calculate the PMF of $Y$ for each value in $R_Y$ .

For $Y=0$ : Only $X=2$ maps to $Y=0$ .
$p_Y(0) = P(Y=0) = P(X=2) = p_X(2) = 0.3$

For $Y=1$ : $X=1$ and $X=3$ map to $Y=1$ .
$p_Y(1) = P(Y=1) = P(X=1 \text{ or } X=3) = p_X(1) + p_X(3) = 0.2 + 0.3 = 0.5$

For $Y=4$ : Only $X=4$ maps to $Y=4$ .
$p_Y(4) = P(Y=4) = P(X=4) = p_X(4) = 0.2$

Step 4: Verify the PMF properties.
All $p_Y(y) \ge 0$ .

\sum_{y \in R_Y} p_Y(y) = 0.3 + 0.5 + 0.2 = 1.0

Answer: The PMF of $Y$ is:
$p_Y(0) = 0.3$
$p_Y(1) = 0.5$
$p_Y(4) = 0.2$
and $p_Y(y) = 0$ for $y \notin \{0, 1, 4\}$ .

---

5. Expected Value of a Discrete Random Variable

The expected value (or mean) of a random variable is a measure of its central tendency, representing the average value we would expect to observe if the experiment were repeated many times.

📐 Expected Value (Mean)

For a discrete random variable $X$ with PMF $p_X(x)$ and range $R_X$ , the Expected Value (or Mean), denoted by $\operatorname{E}[X]$ or $\mu_X$ , is:

\operatorname{E}[X] = \sum_{x \in R_X} x \cdot p_X(x)

Variables:

$X$ = discrete random variable

$x$ = a specific value in the range of $X$

$p_X(x)$ = probability that $X$ takes the value $x$

When to use: To find the long-run average of a random variable, or its central location.

📐 Expected Value of a Function of a Random Variable

If $Y = g(X)$ is a function of a discrete random variable $X$ , its expected value can be calculated directly from the PMF of $X$ :

\operatorname{E}[g(X)] = \sum_{x \in R_X} g(x) \cdot p_X(x)

Variables:

$g(X)$ = function of the random variable $X$

$x$ = a specific value in the range of $X$

$p_X(x)$ = probability that $X$ takes the value $x$

When to use: To find the average value of a transformation of a random variable without first finding the PMF of

Y=g(X)

Properties of Expected Value:

\operatorname{E}[c] = c

for any constant

c

\operatorname{E}[aX + b] = a\operatorname{E}[X] + b

for constants

a, b

. (Linearity of Expectation)

\operatorname{E}[X+Y] = \operatorname{E}[X] + \operatorname{E}[Y]

(for any random variables

X, Y

, not necessarily independent).

Worked Example:

Problem: For the random variable $X$ (number of heads in 3 coin flips) with PMF: $p_X(0) = 1/8$ , $p_X(1) = 3/8$ , $p_X(2) = 3/8$ , $p_X(3) = 1/8$ , calculate $\operatorname{E}[X]$ and $\operatorname{E}[2X+1]$ .

Solution:

Step 1: Calculate $\operatorname{E}[X]$ .

\operatorname{E}[X] = \sum_{x \in R_X} x \cdot p_X(x)

\operatorname{E}[X] = (0 \cdot \frac{1}{8}) + (1 \cdot \frac{3}{8}) + (2 \cdot \frac{3}{8}) + (3 \cdot \frac{1}{8})

\operatorname{E}[X] = 0 + \frac{3}{8} + \frac{6}{8} + \frac{3}{8}

\operatorname{E}[X] = \frac{12}{8} = \frac{3}{2} = 1.5

Step 2: Calculate $\operatorname{E}[2X+1]$ using linearity of expectation.

\operatorname{E}[2X+1] = 2\operatorname{E}[X] + 1

\operatorname{E}[2X+1] = 2(1.5) + 1

\operatorname{E}[2X+1] = 3 + 1

\operatorname{E}[2X+1] = 4

Answer: $\boxed{\operatorname{E}[X] = 1.5 \text{ and } \operatorname{E}[2X+1] = 4}$

---

6. Variance of a Discrete Random Variable

The variance measures the spread or dispersion of the values of a random variable around its mean. A higher variance indicates greater variability.

📐 Variance

For a discrete random variable $X$ with PMF $p_X(x)$ and mean $\operatorname{E}[X] = \mu_X$ , the Variance, denoted by $\operatorname{Var}(X)$ or $\sigma_X^2$ , is:

\operatorname{Var}(X) = \operatorname{E}[(X - \mu_X)^2] = \sum_{x \in R_X} (x - \mu_X)^2 p_X(x)

An often more convenient computational formula for variance is:

\operatorname{Var}(X) = \operatorname{E}[X^2] - (\operatorname{E}[X])^2

where $\operatorname{E}[X^2] = \sum_{x \in R_X} x^2 p_X(x)$ .

Variables:

$X$ = discrete random variable

$\mu_X$ = expected value (mean) of $X$

$p_X(x)$ = probability that $X$ takes the value $x$

When to use: To quantify the spread or variability of a random variable's distribution.

📖 Standard Deviation

The Standard Deviation of a random variable $X$ , denoted by $\sigma_X$ , is the positive square root of its variance:

\sigma_X = \sqrt{\operatorname{Var}(X)}

Properties of Variance:

\operatorname{Var}(c) = 0

for any constant

c

\operatorname{Var}(aX + b) = a^2 \operatorname{Var}(X)

for constants

a, b

. Note that the constant

b

does not affect variance.

\operatorname{Var}(X) \ge 0

Worked Example:

Problem: For the random variable $X$ (number of heads in 3 coin flips) with PMF: $p_X(0) = 1/8$ , $p_X(1) = 3/8$ , $p_X(2) = 3/8$ , $p_X(3) = 1/8$ , and $\operatorname{E}[X] = 1.5$ , calculate $\operatorname{Var}(X)$ and $\operatorname{Var}(2X+1)$ .

Solution:

Step 1: Calculate $\operatorname{E}[X^2]$ .

\operatorname{E}[X^2] = \sum_{x \in R_X} x^2 p_X(x)

\operatorname{E}[X^2] = (0^2 \cdot \frac{1}{8}) + (1^2 \cdot \frac{3}{8}) + (2^2 \cdot \frac{3}{8}) + (3^2 \cdot \frac{1}{8})

\operatorname{E}[X^2] = (0 \cdot \frac{1}{8}) + (1 \cdot \frac{3}{8}) + (4 \cdot \frac{3}{8}) + (9 \cdot \frac{1}{8})

\operatorname{E}[X^2] = 0 + \frac{3}{8} + \frac{12}{8} + \frac{9}{8}

\operatorname{E}[X^2] = \frac{24}{8} = 3

Step 2: Calculate $\operatorname{Var}(X)$ using the computational formula.

\operatorname{Var}(X) = \operatorname{E}[X^2] - (\operatorname{E}[X])^2

\operatorname{Var}(X) = 3 - (1.5)^2

\operatorname{Var}(X) = 3 - 2.25

\operatorname{Var}(X) = 0.75

Step 3: Calculate $\operatorname{Var}(2X+1)$ using properties of variance.

\operatorname{Var}(2X+1) = 2^2 \operatorname{Var}(X)

\operatorname{Var}(2X+1) = 4 \cdot 0.75

\operatorname{Var}(2X+1) = 3

Answer: $\boxed{\operatorname{Var}(X) = 0.75 \text{ and } \operatorname{Var}(2X+1) = 3}$

---

7. Joint Probability and Independence of Random Variables

When dealing with multiple random variables, we often need to understand their joint behavior.

📖 Joint Probability Mass Function (Joint PMF)

For two discrete random variables $X$ and $Y$ , their Joint Probability Mass Function (Joint PMF), denoted by $p_{X,Y}(x,y)$ or $P(X=x, Y=y)$ , is a function such that:

$p_{X,Y}(x,y) \ge 0$ for all $(x,y)$ in the joint range.

$\sum_x \sum_y p_{X,Y}(x,y) = 1$ .

The value $p_{X,Y}(x,y)$ represents the probability that $X$ takes value $x$ AND $Y$ takes value $y$ simultaneously.

From a joint PMF, we can derive the marginal PMFs for $X$ and $Y$ :

p_X(x) = \sum_y p_{X,Y}(x,y)

p_Y(y) = \sum_x p_{X,Y}(x,y)

📖 Independence of Discrete Random Variables

Two discrete random variables $X$ and $Y$ are said to be independent if and only if their joint PMF is equal to the product of their marginal PMFs for all possible values $x$ and $y$ :

p_{X,Y}(x,y) = p_X(x) \cdot p_Y(y) \quad \text{for all } x, y

Equivalently, $X$ and $Y$ are independent if $P(X=x, Y=y) = P(X=x)P(Y=y)$ for all $x,y$ .

⚠️ Independence of X and g(X)

❌ A common mistake is assuming that if $Y = g(X)$ , then $X$ and $Y$ are independent.
✅ This is generally false. If $Y$ is a non-trivial function of $X$ , they are dependent. Knowing the value of $X$ directly tells you the value of $Y$ , which is the definition of dependence. The only exception is if $g(X)$ is a constant, in which case $Y$ is not truly random (and independence holds vacuously for $X$ and a constant). The PYQ explicitly tests this concept.

---

8. Uniform Discrete Distribution

A random variable follows a uniform discrete distribution if each value in its finite range has an equal probability of being observed. This is directly stated in the PYQ.

📖 Uniform Discrete Random Variable

A discrete random variable $X$ has a uniform distribution over a finite set of $N$ values $\{x_1, x_2, \dots, x_N\}$ if its PMF is given by:

p_X(x_i) = \frac{1}{N} \quad \text{for } i = 1, 2, \dots, N

and $p_X(x) = 0$ otherwise.

Expected Value: $\operatorname{E}[X] = \frac{1}{N} \sum_{i=1}^N x_i$
Variance: $\operatorname{Var}(X) = \frac{1}{N} \sum_{i=1}^N x_i^2 - \left(\frac{1}{N} \sum_{i=1}^N x_i\right)^2$

Worked Example:

Problem: Let $X$ be a random variable sampled uniformly at random from the set $S = \{0, 1, 2, 3, 4\}$ .
a) What is the PMF of $X$ ?
b) Calculate $\operatorname{E}[X]$ .

Solution:

Step 1: Identify the size of the set $S$ .
The set $S$ has $N=5$ elements.

Step 2: Determine the PMF.
Since $X$ is sampled uniformly, each element has a probability of $1/N$ .

a) The PMF of $X$ is:
$p_X(x) = \frac{1}{5}$ for $x \in \{0, 1, 2, 3, 4\}$
and $p_X(x) = 0$ otherwise.

Step 3: Calculate $\operatorname{E}[X]$ .

\operatorname{E}[X] = \sum_{x \in S} x \cdot p_X(x)

\operatorname{E}[X] = (0 \cdot \frac{1}{5}) + (1 \cdot \frac{1}{5}) + (2 \cdot \frac{1}{5}) + (3 \cdot \frac{1}{5}) + (4 \cdot \frac{1}{5})

\operatorname{E}[X] = \frac{1}{5} (0 + 1 + 2 + 3 + 4)

\operatorname{E}[X] = \frac{10}{5} = 2

Answer: $\boxed{\text{a) } p_X(x) = 1/5 \text{ for } x \in \{0,1,2,3,4\}. \text{ b) } \operatorname{E}[X] = 2.}$

---

Problem-Solving Strategies

💡 CMI Strategy: Functions of RVs

When asked about the distribution or probability of $Y = g(X)$ :

List $R_X$ and $p_X(x)$ : Clearly write down the range and PMF of the original random variable $X$ .

Determine $R_Y$ : For each $x \in R_X$ , calculate $y = g(x)$ . Collect these unique $y$ values to form $R_Y$ .

Map $X$ to $Y$ : For each $y \in R_Y$ , identify all $x \in R_X$ such that $g(x) = y$ .

Sum Probabilities: $p_Y(y) = \sum_{x: g(x)=y} p_X(x)$ .

Verify: Ensure $\sum_{y \in R_Y} p_Y(y) = 1$ .

This systematic approach minimizes errors, especially when $g(X)$ is not a one-to-one function.

💡 CMI Strategy: Independence Check

To verify if $X$ and $Y$ are independent:

Calculate Marginal PMFs: Find $p_X(x)$ and $p_Y(y)$ from the joint PMF $p_{X,Y}(x,y)$ .

Check Condition: For all pairs $(x,y)$ , verify if $p_{X,Y}(x,y) = p_X(x) \cdot p_Y(y)$ .

One Counterexample is Enough: If you find even one pair $(x,y)$ for which the equality does not hold, then $X$ and $Y$ are dependent.

---

Common Mistakes

⚠️ Avoid These Errors

❌ Assuming $X$ and $g(X)$ are independent: This is a very common trap. As discussed, knowing $X$ usually determines $g(X)$ , making them dependent. For instance, if $X$ is the number of heads, and $Y=X^2$ , they are clearly dependent.

✅ Correct approach: Always assume

X

and

g(X)

are dependent unless

g(X)

is a constant function or specifically proven otherwise.

❌ Incorrectly calculating PMF for $Y=g(X)$ : Forgetting to sum probabilities for all $X$ values that map to the same $Y$ value.

✅ Correct approach: Systematically list all

x

values, calculate their corresponding

y

values, group

x

values that result in the same

y

, and sum their original

p_X(x)

values.

❌ Confusing PMF and CDF: Using $P(X=x)$ when $P(X \le x)$ is required, or vice versa.

✅ Correct approach: Remember

p_X(x)

is for a single value,

F_X(x)

is for values up to and including

x

. For discrete RVs,

P(a < X \le b) = F_X(b) - F_X(a)

❌ Arithmetic errors with modulo operator: Misunderstanding the range of values produced by $a \pmod n$ .

✅ Correct approach: Recall that

a \pmod n

always results in a value in the set

\{0, 1, \dots, n-1\}

for positive

n

. For example,

5 \pmod 3 = 2

, and

0 \pmod 3 = 0

---

Practice Questions

:::question type="MCQ" question="Let $X$ be a discrete random variable with PMF $p_X(x)$ given by $p_X(1)=0.1$ , $p_X(2)=0.3$ , $p_X(3)=0.4$ , $p_X(4)=0.2$ . Let $Y = |X-2|$ . Which of the following is the correct PMF for $Y$ ?" options=[" $p_Y(0)=0.3, p_Y(1)=0.5, p_Y(2)=0.2$ "," $p_Y(0)=0.1, p_Y(1)=0.3, p_Y(2)=0.4, p_Y(3)=0.2$ "," $p_Y(0)=0.3, p_Y(1)=0.3, p_Y(2)=0.4$ "," $p_Y(0)=0.3, p_Y(1)=0.6, p_Y(2)=0.1$ "] answer=" $p_Y(0)=0.3, p_Y(1)=0.5, p_Y(2)=0.2$ " hint="Map each value of $X$ to $Y$ and sum probabilities for repeated $Y$ values." solution="
Step 1: Determine the values of $Y = |X-2|$ for each $x \in R_X$ .

If $X=1$ , $Y = |1-2| = |-1| = 1$ .

If $X=2$ , $Y = |2-2| = |0| = 0$ .

If $X=3$ , $Y = |3-2| = |1| = 1$ .

If $X=4$ , $Y = |4-2| = |2| = 2$ .

Step 2: Identify the range of

Y

, which is

R_Y = \{0, 1, 2\}

Step 3: Calculate the PMF for $Y$ .

$p_Y(0) = P(Y=0) = P(X=2) = p_X(2) = 0.3$ .

$p_Y(1) = P(Y=1) = P(X=1 \text{ or } X=3) = p_X(1) + p_X(3) = 0.1 + 0.4 = 0.5$ .

$p_Y(2) = P(Y=2) = P(X=4) = p_X(4) = 0.2$ .

Step 4: Verify that the probabilities sum to 1:

0.3 + 0.5 + 0.2 = 1.0

.
Answer:

\boxed{p_Y(0)=0.3, p_Y(1)=0.5, p_Y(2)=0.2}

"
:::

:::question type="NAT" question="A discrete random variable $X$ has the following PMF: $p_X(x) = c(x+1)$ for $x \in \{0, 1, 2\}$ , and $0$ otherwise. Calculate the value of $\operatorname{E}[X^2]$ . (Enter your answer as a decimal rounded to two decimal places.)" answer="2.33" hint="First find the constant $c$ by ensuring the sum of probabilities is 1. Then calculate $\operatorname{E}[X^2]$ ." solution="
Step 1: Find the constant $c$ .
The sum of probabilities must be 1:

\sum_{x=0}^2 p_X(x) = 1

c(0+1) + c(1+1) + c(2+1) = 1

c(1) + c(2) + c(3) = 1

c(1+2+3) = 1

6c = 1

c = \frac{1}{6}

Step 2: Write out the full PMF.
$p_X(0) = \frac{1}{6}(0+1) = \frac{1}{6}$
$p_X(1) = \frac{1}{6}(1+1) = \frac{2}{6}$
$p_X(2) = \frac{1}{6}(2+1) = \frac{3}{6}$

Step 3: Calculate $\operatorname{E}[X^2]$ .

\operatorname{E}[X^2] = \sum_{x \in R_X} x^2 p_X(x)

\operatorname{E}[X^2] = (0^2 \cdot \frac{1}{6}) + (1^2 \cdot \frac{2}{6}) + (2^2 \cdot \frac{3}{6})

\operatorname{E}[X^2] = (0 \cdot \frac{1}{6}) + (1 \cdot \frac{2}{6}) + (4 \cdot \frac{3}{6})

\operatorname{E}[X^2] = 0 + \frac{2}{6} + \frac{12}{6}

\operatorname{E}[X^2] = \frac{14}{6} = \frac{7}{3}

Step 4: Convert to decimal rounded to two places.
$7/3 \approx 2.3333...$
Rounded to two decimal places, $\operatorname{E}[X^2] = 2.33$ .
Answer: $\boxed{2.33}$
"
:::

:::question type="MSQ" question="Let $X$ be a random variable representing the outcome of rolling a fair six-sided die, so $R_X = \{1, 2, 3, 4, 5, 6\}$ . Let $Y = X \pmod 3$ . Which of the following statements is/are true?" options=[" $P(Y=0) = 1/3$ "," $X$ and $Y$ are independent"," $\operatorname{E}[Y] = 1$ "," $\operatorname{Var}(Y) = 2/3$ "] answer="A,C,D" hint="Calculate the PMF of $Y$ first. Then evaluate independence, expected value, and variance." solution="
Step 1: Determine the PMF of $X$ . Since it's a fair die, $p_X(x) = 1/6$ for $x \in \{1, 2, 3, 4, 5, 6\}$ .

Step 2: Determine the PMF of $Y = X \pmod 3$ .

If $X=1$ , $Y = 1 \pmod 3 = 1$ .

If $X=2$ , $Y = 2 \pmod 3 = 2$ .

If $X=3$ , $Y = 3 \pmod 3 = 0$ .

If $X=4$ , $Y = 4 \pmod 3 = 1$ .

If $X=5$ , $Y = 5 \pmod 3 = 2$ .

If $X=6$ , $Y = 6 \pmod 3 = 0$ .

The range of

Y

R_Y = \{0, 1, 2\}

Step 3: Calculate $p_Y(y)$ .

$P(Y=0) = P(X=3 \text{ or } X=6) = p_X(3) + p_X(6) = 1/6 + 1/6 = 2/6 = 1/3$ . (Statement A is TRUE)

$P(Y=1) = P(X=1 \text{ or } X=4) = p_X(1) + p_X(4) = 1/6 + 1/6 = 2/6 = 1/3$ .

$P(Y=2) = P(X=2 \text{ or } X=5) = p_X(2) + p_X(5) = 1/6 + 1/6 = 2/6 = 1/3$ .

Step 4: Evaluate the statements.

Statement A: $P(Y=0) = 1/3$ . This is TRUE from our calculation above.

Statement B: $X$ and $Y$ are independent.
Since $Y$ is a function of $X$ ( $Y = g(X)$ ), they are generally dependent. For example, if we know $X=1$ , then $Y$ must be $1 \pmod 3 = 1$ . This means $P(Y=1 | X=1) = 1 \ne P(Y=1) = 1/3$ . Thus, $X$ and $Y$ are dependent. (Statement B is FALSE)

Statement C: $\operatorname{E}[Y] = 1$ .

\operatorname{E}[Y] = \sum_{y \in R_Y} y \cdot p_Y(y) = (0 \cdot \frac{1}{3}) + (1 \cdot \frac{1}{3}) + (2 \cdot \frac{1}{3})

\operatorname{E}[Y] = 0 + \frac{1}{3} + \frac{2}{3} = \frac{3}{3} = 1

(Statement C is TRUE)

Statement D: $\operatorname{Var}(Y) = 2/3$ .
First, calculate $\operatorname{E}[Y^2]$ .

\operatorname{E}[Y^2] = (0^2 \cdot \frac{1}{3}) + (1^2 \cdot \frac{1}{3}) + (2^2 \cdot \frac{1}{3})

\operatorname{E}[Y^2] = 0 + \frac{1}{3} + \frac{4}{3} = \frac{5}{3}

Now, calculate

\operatorname{Var}(Y)

\operatorname{Var}(Y) = \operatorname{E}[Y^2] - (\operatorname{E}[Y])^2 = \frac{5}{3} - (1)^2 = \frac{5}{3} - 1 = \frac{2}{3}

(Statement D is TRUE)
Answer:

\boxed{\text{A, C, D}}

"
:::

:::question type="SUB" question="Let $X$ be a discrete random variable with PMF $p_X(x) = \frac{1}{2^x}$ for $x \in \{1, 2, 3, \dots\}$ , and $0$ otherwise.
a) Prove that this is a valid PMF.
b) Derive the expression for the CDF, $F_X(x)$ .
c) Calculate $\operatorname{E}[X]$ . " answer="a) $\sum p_X(x) = 1$ . b) $F_X(x) = 1 - \frac{1}{2^{\lfloor x \rfloor}}$ . c) $\operatorname{E}[X] = 2$ ." hint="For part a), use the sum of a geometric series. For part b), use the definition of CDF. For part c), use the formula for $\operatorname{E}[X]$ and the sum $\sum_{k=1}^{\infty} k r^k = \frac{r}{(1-r)^2}$ for $|r|<1$ ." solution="
Part a) Prove that this is a valid PMF.

Step 1: Check non-negativity.
For $x \in \{1, 2, 3, \dots\}$ , $p_X(x) = \frac{1}{2^x} > 0$ . For other $x$ , $p_X(x)=0$ . So, $p_X(x) \ge 0$ for all $x$ .

Step 2: Check if the sum of probabilities is 1.
We need to evaluate $\sum_{x=1}^{\infty} p_X(x)$ .

\sum_{x=1}^{\infty} \frac{1}{2^x} = \frac{1}{2^1} + \frac{1}{2^2} + \frac{1}{2^3} + \dots

This is a geometric series with first term $a = 1/2$ and common ratio $r = 1/2$ .
The sum of an infinite geometric series is $\frac{a}{1-r}$ for $|r|<1$ .

\sum_{x=1}^{\infty} \frac{1}{2^x} = \frac{1/2}{1-1/2} = \frac{1/2}{1/2} = 1

Since both conditions are met, $p_X(x)$ is a valid PMF.

Part b) Derive the expression for the CDF, $F_X(x)$ .

Step 1: Define $F_X(x) = P(X \le x)$ .

For $x < 1$ :
$F_X(x) = 0$ (since $X$ cannot take values less than 1)

For $x \ge 1$ :

F_X(x) = \sum_{k=1}^{\lfloor x \rfloor} p_X(k) = \sum_{k=1}^{\lfloor x \rfloor} \frac{1}{2^k}

This is a finite geometric series with $a=1/2$ , $r=1/2$ , and $n = \lfloor x \rfloor$ terms.
The sum of a finite geometric series is $a \frac{1-r^n}{1-r}$ .

F_X(x) = \frac{1/2 (1 - (1/2)^{\lfloor x \rfloor})}{1 - 1/2} = \frac{1/2 (1 - (1/2)^{\lfloor x \rfloor})}{1/2}

F_X(x) = 1 - \left(\frac{1}{2}\right)^{\lfloor x \rfloor} = 1 - \frac{1}{2^{\lfloor x \rfloor}}

Thus, the CDF is:

F_X(x) = \begin{cases} 0 & x < 1 \\ 1 - \frac{1}{2^{\lfloor x \rfloor}} & x \ge 1 \end{cases}

Part c) Calculate $\operatorname{E}[X]$ .

Step 1: Use the definition of expected value.

\operatorname{E}[X] = \sum_{x=1}^{\infty} x \cdot p_X(x) = \sum_{x=1}^{\infty} x \cdot \frac{1}{2^x}

This is a known series sum. For a geometric series $\sum_{k=1}^{\infty} k r^k = \frac{r}{(1-r)^2}$ for $|r|<1$ .
Here, $r = 1/2$ .

\operatorname{E}[X] = \frac{1/2}{(1 - 1/2)^2} = \frac{1/2}{(1/2)^2} = \frac{1/2}{1/4}

\operatorname{E}[X] = \frac{1}{2} \cdot 4 = 2

Therefore, $\operatorname{E}[X] = 2$ .
Answer: $\boxed{\text{a) } \sum p_X(x) = 1. \text{ b) } F_X(x) = 1 - \frac{1}{2^{\lfloor x \rfloor}}. \text{ c) } \operatorname{E}[X] = 2.}$
"
:::

---

Summary

❗ Key Takeaways for CMI

Random Variables Map Outcomes to Numbers: A random variable $X$ is a function $X: \Omega \to \mathbb{R}$ . Its range $R_X$ is the set of all possible numerical values it can take.

PMF for Discrete RVs: The Probability Mass Function $p_X(x) = P(X=x)$ describes the probability of a discrete random variable taking a specific value. It must satisfy $p_X(x) \ge 0$ and $\sum_x p_X(x) = 1$ .

CDF Provides Cumulative Probabilities: The Cumulative Distribution Function $F_X(x) = P(X \le x)$ gives the probability that $X$ is less than or equal to $x$ .

Functions of RVs are Crucial: To find the PMF of $Y=g(X)$ , sum the probabilities $p_X(x)$ for all $x$ values that map to the same $y$ value. This is a common exam concept.

Expected Value and Variance: $E[X]$ measures central tendency, and $Var(X)$ measures spread. Remember their formulas and properties, especially linearity of expectation and $Var(aX+b) = a^2 Var(X)$ .

Independence of $X$ and $g(X)$ is Rare: $X$ and $Y=g(X)$ are generally dependent. Do not assume independence unless $g(X)$ is a constant. Check $P(X=x, Y=y) = P(X=x)P(Y=y)$ for independence.

---

What's Next?

💡 Continue Learning

This topic connects to:

Common Discrete Distributions: Understanding specific PMFs (e.g., Bernoulli, Binomial, Poisson) that arise from specific random experiments. These distributions are built upon the fundamental concepts of random variables.

Joint Distributions of Multiple Random Variables: Extending the concepts of PMF, CDF, expectation, and variance to scenarios involving two or more random variables, exploring their relationships (e.g., covariance, correlation).

Continuous Random Variables: While this chapter focused on discrete RVs, the principles extend to continuous RVs using Probability Density Functions (PDFs) and integrals instead of sums.

Master these connections for comprehensive CMI preparation!

---

💡 Moving Forward

Now that you understand Random Variables, let's explore Distribution Functions which builds on these concepts.

---

Part 2: Distribution Functions

Introduction

Distribution functions are fundamental to probability theory and statistics, providing a comprehensive way to describe the behavior of random variables. In the context of the CMI Masters in Data Science, a deep understanding of these functions is crucial for modeling real-world phenomena, performing statistical inference, and building predictive models. This topic covers the essential concepts of how probabilities are distributed across the possible values of a random variable, whether discrete or continuous. Mastery of distribution functions allows us to quantify uncertainty, calculate probabilities of events, and characterize key aspects like the central tendency and spread of data, which are indispensable skills for any data scientist.

📖 Random Variable

A random variable is a function that maps the outcomes of a random experiment to real numbers. Random variables can be broadly classified into two types:

Discrete Random Variable: A random variable whose set of possible values is finite or countably infinite.

Continuous Random Variable: A random variable whose set of possible values is an interval (finite or infinite) on the real number line.

---

Key Concepts

1. Probability Mass Function (PMF)

The Probability Mass Function (PMF) is used to describe the probability distribution of a discrete random variable. It assigns a probability to each possible value that the random variable can take.

📖 Probability Mass Function (PMF)

For a discrete random variable $X$ , its Probability Mass Function (PMF), denoted by $p_X(x)$ or $P(X=x)$ , satisfies the following properties:

$p_X(x) \ge 0$ for all possible values $x$ .

$\sum_{x} p_X(x) = 1$ , where the sum is over all possible values of $X$ .

Worked Example:

Problem: Let $X$ be the number of heads in two coin tosses. Determine its PMF.

Solution:

Step 1: Identify the sample space and possible values of $X$ .

The sample space for two coin tosses is $S = \{HH, HT, TH, TT\}$ .
The possible values for $X$ (number of heads) are $0, 1, 2$ .

Step 2: Calculate the probability for each value of $X$ .

P(X=0) = P(\{TT\}) = \frac{1}{4}

P(X=1) = P(\{HT, TH\}) = \frac{2}{4} = \frac{1}{2}

P(X=2) = P(\{HH\}) = \frac{1}{4}

Step 3: Write down the PMF.

p_X(x) = \begin{cases} \frac{1}{4}, & x=0 \\ \frac{1}{2}, & x=1 \\ \frac{1}{4}, & x=2 \\ 0, & \text{otherwise} \end{cases}

Answer: The PMF is $p_X(0)=1/4$ , $p_X(1)=1/2$ , $p_X(2)=1/4$ .

---

2. Probability Density Function (PDF)

The Probability Density Function (PDF) is used to describe the probability distribution of a continuous random variable. Unlike the PMF, the PDF does not give the probability of a specific value, but rather the relative likelihood of the random variable taking on a given value. Probabilities for continuous random variables are calculated over intervals.

📖 Probability Density Function (PDF)

For a continuous random variable $X$ , its Probability Density Function (PDF), denoted by $f_X(x)$ or $f(x)$ , satisfies the following properties:

$f(x) \ge 0$ for all $x \in \mathbb{R}$ .

$\int_{-\infty}^{\infty} f(x) dx = 1$ .

The probability that

X

falls into an interval

[a, b]

is given by

P(a \le X \le b) = \int_a^b f(x) dx

❗ Must Remember

For a continuous random variable $X$ , the probability of $X$ taking any single specific value is $0$ . That is, $P(X=x_0) = 0$ for any $x_0$ . Consequently, $P(a \le X \le b) = P(a < X \le b) = P(a \le X < b) = P(a < X < b)$ .

Worked Example:

Problem: Let $X$ be a continuous random variable with PDF $f(x) = cx(1-x)$ for $0 \le x \le 1$ , and $0$ otherwise.
(a) Determine the value of $c$ .
(b) Find the probability $P(X > 0.5)$ .

Solution (a):

Step 1: Apply the normalization property of a PDF.

\int_{-\infty}^{\infty} f(x) dx = 1

Step 2: Substitute the given PDF and integrate over its non-zero range.

\int_{0}^{1} cx(1-x) dx = 1

Step 3: Simplify the integrand and perform the integration.

c \int_{0}^{1} (x - x^2) dx = 1

c \left[ \frac{x^2}{2} - \frac{x^3}{3} \right]_{0}^{1} = 1

c \left( \left( \frac{1^2}{2} - \frac{1^3}{3} \right) - \left( \frac{0^2}{2} - \frac{0^3}{3} \right) \right) = 1

c \left( \frac{1}{2} - \frac{1}{3} \right) = 1

c \left( \frac{3-2}{6} \right) = 1

c \left( \frac{1}{6} \right) = 1

Step 4: Solve for $c$ .

c = 6

Answer (a): $c=6$ .

Solution (b):

Step 1: Set up the integral for $P(X > 0.5)$ using the determined PDF.

P(X > 0.5) = \int_{0.5}^{1} f(x) dx

Step 2: Substitute the PDF with the value of $c$ .

P(X > 0.5) = \int_{0.5}^{1} 6x(1-x) dx

Step 3: Perform the integration.

P(X > 0.5) = 6 \int_{0.5}^{1} (x - x^2) dx

P(X > 0.5) = 6 \left[ \frac{x^2}{2} - \frac{x^3}{3} \right]_{0.5}^{1}

P(X > 0.5) = 6 \left( \left( \frac{1^2}{2} - \frac{1^3}{3} \right) - \left( \frac{0.5^2}{2} - \frac{0.5^3}{3} \right) \right)

P(X > 0.5) = 6 \left( \left( \frac{1}{2} - \frac{1}{3} \right) - \left( \frac{0.25}{2} - \frac{0.125}{3} \right) \right)

P(X > 0.5) = 6 \left( \frac{1}{6} - \left( \frac{1}{8} - \frac{1}{24} \right) \right)

P(X > 0.5) = 6 \left( \frac{1}{6} - \left( \frac{3-1}{24} \right) \right)

P(X > 0.5) = 6 \left( \frac{1}{6} - \frac{2}{24} \right)

P(X > 0.5) = 6 \left( \frac{1}{6} - \frac{1}{12} \right)

P(X > 0.5) = 6 \left( \frac{2-1}{12} \right)

P(X > 0.5) = 6 \left( \frac{1}{12} \right)

P(X > 0.5) = \frac{1}{2}

Answer (b): $P(X > 0.5) = 0.5$ .

$x$
$f(x)$

0

0.5

1

$P(X > 0.5)$

Max $(0.5, 1.5)$

$f(x) = 6x(1-x)$

---

3. Cumulative Distribution Function (CDF)

The Cumulative Distribution Function (CDF) provides the probability that a random variable $X$ takes a value less than or equal to a given value $x$ . It is defined for both discrete and continuous random variables.

📖 Cumulative Distribution Function (CDF)

For any random variable $X$ , its Cumulative Distribution Function (CDF), denoted by $F_X(x)$ or $F(x)$ , is defined as:

F(x) = P(X \le x)

Properties of a CDF:

$0 \le F(x) \le 1$ for all $x \in \mathbb{R}$ .

$F(x)$ is non-decreasing: if $a < b$ , then $F(a) \le F(b)$ .

$\lim_{x \to -\infty} F(x) = 0$ .

$\lim_{x \to \infty} F(x) = 1$ .

$F(x)$ is right-continuous: $\lim_{t \to x^+} F(t) = F(x)$ .

For a discrete random variable $X$ with PMF $p_X(x)$ :

F(x) = \sum_{t \le x} p_X(t)

For a continuous random variable

X

with PDF

f_X(x)

F(x) = \int_{-\infty}^{x} f_X(t) dt

Conversely, if

F(x)

is differentiable, then

f_X(x) = \frac{d}{dx} F(x)

📐 Probability from CDF

For any random variable $X$ :

P(a < X \le b) = F(b) - F(a)

For a continuous random variable:

P(X > a) = 1 - F(a)

Variables:

$F(x)$ = Cumulative Distribution Function

$P(X \le x)$ = Probability that $X$ is less than or equal to $x$

When to use: Calculating probabilities over intervals for any type of random variable.

Worked Example:

Problem: For the continuous random variable $X$ with PDF $f(x) = 6x(1-x)$ for $0 \le x \le 1$ , and $0$ otherwise, find its CDF $F(x)$ . Then, use the CDF to find $P(X > 0.5)$ .

Solution:

Step 1: Define $F(x)$ for different ranges of $x$ .

For $x < 0$ :

F(x) = \int_{-\infty}^{x} 0 \, dt = 0

For $0 \le x \le 1$ :

F(x) = \int_{-\infty}^{x} f(t) dt = \int_{0}^{x} 6t(1-t) dt

F(x) = 6 \int_{0}^{x} (t - t^2) dt

F(x) = 6 \left[ \frac{t^2}{2} - \frac{t^3}{3} \right]_{0}^{x}

F(x) = 6 \left( \frac{x^2}{2} - \frac{x^3}{3} \right)

F(x) = 3x^2 - 2x^3

For $x > 1$ :

F(x) = \int_{-\infty}^{1} f(t) dt + \int_{1}^{x} 0 \, dt = \int_{0}^{1} 6t(1-t) dt

From the previous example, we know this integral equals 1.

F(x) = 1

Step 2: Combine the parts to write the full CDF.

F(x) = \begin{cases} 0, & x < 0 \\ 3x^2 - 2x^3, & 0 \le x \le 1 \\ 1, & x > 1 \end{cases}

Step 3: Use the CDF to find $P(X > 0.5)$ .

P(X > 0.5) = 1 - F(0.5)

F(0.5) = 3(0.5)^2 - 2(0.5)^3

F(0.5) = 3(0.25) - 2(0.125)

F(0.5) = 0.75 - 0.25

F(0.5) = 0.5

P(X > 0.5) = 1 - 0.5 = 0.5

Answer: The CDF is $F(x) = 3x^2 - 2x^3$ for $0 \le x \le 1$ , and $P(X > 0.5) = 0.5$ .

---

4. Expected Value (Mean)

The expected value, or mean, of a random variable is a measure of its central tendency. It represents the average value one would expect if the experiment were repeated many times.

📖 Expected Value (Mean)

For a discrete random variable $X$ with PMF $p_X(x)$ :

E[X] = \sum_{x} x \cdot p_X(x)

For a continuous random variable

X

with PDF

f_X(x)

E[X] = \int_{-\infty}^{\infty} x \cdot f_X(x) dx

📐 Expected Value of a Function

For a discrete random variable $X$ and a function $g(X)$ :

E[g(X)] = \sum_{x} g(x) \cdot p_X(x)

For a continuous random variable

X

and a function

g(X)

E[g(X)] = \int_{-\infty}^{\infty} g(x) \cdot f_X(x) dx

Variables:

$X$ = Random variable

$p_X(x)$ = PMF of $X$

$f_X(x)$ = PDF of $X$

$g(X)$ = A function of $X$

When to use: To find the average value of a random variable or a function of a random variable.

Worked Example:

Problem: Find the expected value of $X$ for the continuous random variable with PDF $f(x) = 6x(1-x)$ for $0 \le x \le 1$ .

Solution:

Step 1: Apply the formula for the expected value of a continuous random variable.

E[X] = \int_{-\infty}^{\infty} x \cdot f(x) dx

Step 2: Substitute the PDF and integrate over its non-zero range.

E[X] = \int_{0}^{1} x \cdot [6x(1-x)] dx

E[X] = 6 \int_{0}^{1} (x^2 - x^3) dx

Step 3: Perform the integration.

E[X] = 6 \left[ \frac{x^3}{3} - \frac{x^4}{4} \right]_{0}^{1}

E[X] = 6 \left( \left( \frac{1^3}{3} - \frac{1^4}{4} \right) - \left( \frac{0^3}{3} - \frac{0^4}{4} \right) \right)

E[X] = 6 \left( \frac{1}{3} - \frac{1}{4} \right)

E[X] = 6 \left( \frac{4-3}{12} \right)

E[X] = 6 \left( \frac{1}{12} \right)

E[X] = \frac{1}{2}

Answer: $E[X] = 0.5$ .

---

5. Variance

The variance measures the spread or dispersion of a random variable's values around its expected value. A higher variance indicates greater variability.

📖 Variance

The variance of a random variable $X$ , denoted by $\operatorname{Var}(X)$ or $\sigma^2_X$ , is defined as:

\operatorname{Var}(X) = E[(X - E[X])^2]

An equivalent and often more convenient formula is:

\operatorname{Var}(X) = E[X^2] - (E[X])^2

The standard deviation,

\sigma_X

, is the positive square root of the variance:

\sigma_X = \sqrt{\operatorname{Var}(X)}

📐 Variance Calculation

For a discrete random variable $X$ :

\operatorname{Var}(X) = \sum_{x} x^2 p_X(x) - \left( \sum_{x} x p_X(x) \right)^2

For a continuous random variable

X

\operatorname{Var}(X) = \int_{-\infty}^{\infty} x^2 f_X(x) dx - \left( \int_{-\infty}^{\infty} x f_X(x) dx \right)^2

Variables:

$X$ = Random variable

$p_X(x)$ = PMF of $X$

$f_X(x)$ = PDF of $X$

When to use: To quantify the spread or dispersion of a random variable's values.

Worked Example:

Problem: Find the variance of $X$ for the continuous random variable with PDF $f(x) = 6x(1-x)$ for $0 \le x \le 1$ . (We already found $E[X] = 0.5$ .)

Solution:

Step 1: Calculate $E[X^2]$ .

E[X^2] = \int_{-\infty}^{\infty} x^2 \cdot f(x) dx

E[X^2] = \int_{0}^{1} x^2 \cdot [6x(1-x)] dx

E[X^2] = 6 \int_{0}^{1} (x^3 - x^4) dx

E[X^2] = 6 \left[ \frac{x^4}{4} - \frac{x^5}{5} \right]_{0}^{1}

E[X^2] = 6 \left( \frac{1}{4} - \frac{1}{5} \right)

E[X^2] = 6 \left( \frac{5-4}{20} \right)

E[X^2] = 6 \left( \frac{1}{20} \right)

E[X^2] = \frac{6}{20} = \frac{3}{10} = 0.3

Step 2: Use the variance formula $\operatorname{Var}(X) = E[X^2] - (E[X])^2$ .

We have $E[X^2] = 0.3$ and $E[X] = 0.5$ .

\operatorname{Var}(X) = 0.3 - (0.5)^2

\operatorname{Var}(X) = 0.3 - 0.25

\operatorname{Var}(X) = 0.05

Answer: $\operatorname{Var}(X) = 0.05$ .

---

6. Properties of Expectation and Variance

These properties simplify calculations involving sums and linear transformations of random variables.

📐 Linearity of Expectation

For any random variables $X_1, X_2, \dots, X_n$ and constants $a_1, a_2, \dots, a_n, b$ :

E[a_1 X_1 + a_2 X_2 + \dots + a_n X_n + b] = a_1 E[X_1] + a_2 E[X_2] + \dots + a_n E[X_n] + b

A special case for a single random variable

X

E[aX + b] = a E[X] + b

When to use: To easily find the expected value of linear combinations of random variables, regardless of their independence.

📐 Properties of Variance

For any random variable $X$ and constants $a, b$ :

\operatorname{Var}(aX + b) = a^2 \operatorname{Var}(X)

For independent random variables

X_1, X_2, \dots, X_n

\operatorname{Var}(X_1 + X_2 + \dots + X_n) = \operatorname{Var}(X_1) + \operatorname{Var}(X_2) + \dots + \operatorname{Var}(X_n)

And for independent random variables

X_1, \dots, X_n

and constants

a_1, \dots, a_n

\operatorname{Var}(a_1 X_1 + \dots + a_n X_n) = a_1^2 \operatorname{Var}(X_1) + \dots + a_n^2 \operatorname{Var}(X_n)

When to use: To find the variance of linear transformations or sums of independent random variables.

⚠️ Common Mistake

❌ Assuming $\operatorname{Var}(X+Y) = \operatorname{Var}(X) + \operatorname{Var}(Y)$ for any random variables $X, Y$ .
✅ The property $\operatorname{Var}(X+Y) = \operatorname{Var}(X) + \operatorname{Var}(Y)$ holds only if $X$ and $Y$ are independent. If they are not independent, the covariance term must be included: $\operatorname{Var}(X+Y) = \operatorname{Var}(X) + \operatorname{Var}(Y) + 2\operatorname{Cov}(X,Y)$ .

---

7. Central Limit Theorem (CLT) and Normal Approximation

The Central Limit Theorem (CLT) is one of the most powerful theorems in statistics. It explains why many natural phenomena follow a normal distribution, even if the individual components contributing to them do not.

📖 Central Limit Theorem (CLT)

Let $X_1, X_2, \dots, X_n$ be a sequence of independent and identically distributed (i.i.d.) random variables, each with finite mean $E[X_i] = \mu$ and finite variance $\operatorname{Var}(X_i) = \sigma^2$ .
As $n$ approaches infinity, the distribution of the sample mean $\bar{X}_n = \frac{1}{n} \sum_{i=1}^{n} X_i$ approaches a normal distribution with mean $\mu$ and variance $\frac{\sigma^2}{n}$ .
That is, for large $n$ :

\bar{X}_n \sim N\left(\mu, \frac{\sigma^2}{n}\right)

Equivalently, the distribution of the sum

S_n = \sum_{i=1}^{n} X_i

approaches a normal distribution with mean

n\mu

and variance

n\sigma^2

S_n \sim N(n\mu, n\sigma^2)

The standardized random variable

Z = \frac{\bar{X}_n - \mu}{\sigma/\sqrt{n}}

(or

Z = \frac{S_n - n\mu}{\sigma\sqrt{n}}

) approaches a standard normal distribution

N(0,1)

n \to \infty

💡 Exam Shortcut

For problems involving sums or averages of a large number of i.i.d. random variables, immediately think of applying the Central Limit Theorem to approximate the distribution as normal. This allows you to use Z-scores and standard normal tables for probability calculations.

Worked Example:

Problem: The time taken (in minutes) for a data scientist to complete a specific task is a random variable with mean $15$ minutes and standard deviation $4$ minutes. If a data scientist completes $100$ such tasks independently, what is the approximate probability that the total time taken for these $100$ tasks is less than $1450$ minutes?

Solution:

Step 1: Identify the given parameters for a single task $X_i$ .

Mean $E[X_i] = \mu = 15$ minutes.
Standard deviation $\sigma = 4$ minutes.
Number of tasks $n = 100$ .

Step 2: Define the total time $S_{100}$ and apply CLT.

The total time for $100$ tasks is $S_{100} = \sum_{i=1}^{100} X_i$ .
By the Central Limit Theorem, for large $n$ , $S_n$ is approximately normally distributed.

Calculate the mean of $S_{100}$ :

E[S_{100}] = n\mu = 100 \times 15 = 1500

Calculate the variance of $S_{100}$ :

\operatorname{Var}(S_{100}) = n\sigma^2 = 100 \times (4^2) = 100 \times 16 = 1600

Calculate the standard deviation of $S_{100}$ :

\sigma_{S_{100}} = \sqrt{1600} = 40

So, $S_{100} \sim N(1500, 1600)$ approximately.

Step 3: Standardize the random variable to use the Z-score.

We want to find $P(S_{100} < 1450)$ .

Z = \frac{S_{100} - E[S_{100}]}{\sigma_{S_{100}}}

Z = \frac{1450 - 1500}{40}

Z = \frac{-50}{40}

Z = -1.25

Step 4: Look up the probability using the standard normal CDF (or Z-table).

P(S_{100} < 1450) \approx P(Z < -1.25)

Using a standard normal table or calculator, $P(Z < -1.25) \approx 0.1056$ .

Answer: The approximate probability that the total time taken is less than $1450$ minutes is $0.1056$ .

---

8. Standardization (Z-score)

Standardization transforms a random variable into a standard score (Z-score), which represents how many standard deviations an observation is from the mean. This is particularly useful for comparing values from different normal distributions or for using standard normal tables.

📐 Z-score

For a random variable $X$ with mean $\mu$ and standard deviation $\sigma$ :

Z = \frac{X - \mu}{\sigma}

Variables:

$Z$ = Standardized score (Z-score)

$X$ = Value of the random variable

$\mu$ = Mean of $X$

$\sigma$ = Standard deviation of $X$

When to use: To transform any normally distributed variable into a standard normal variable

N(0,1)

, allowing for the use of standard normal tables to find probabilities. Also used in conjunction with the CLT.

---

Problem-Solving Strategies

💡 CMI Strategy

Identify Random Variable Type: First, determine if the random variable is discrete or continuous. This dictates whether to use PMF/summation or PDF/integration.

Check PDF/PMF Properties: For questions involving determining constants or verifying a function, always use $\sum p_X(x) = 1$ (for discrete) or $\int f(x) dx = 1$ (for continuous). Remember $f(x) \ge 0$ .

Probability from CDF/PDF: $P(a < X \le b) = F(b) - F(a)$ for CDF. For PDF, it's $\int_a^b f(x) dx$ .

Expectation & Variance: Remember the "shortcut" formula for variance: $\operatorname{Var}(X) = E[X^2] - (E[X])^2$ .

CLT Application: When dealing with sums or averages of a large number of independent and identically distributed random variables, the Central Limit Theorem is your go-to. This implies a normal approximation, and thus Z-scores.

Read Carefully: Pay attention to "total number," "average number," "more than," "less than," "at least," etc., to set up the correct integral or sum limits and inequalities.

---

Common Mistakes

⚠️ Avoid These Errors

❌ Confusing PMF and PDF: Using integration for a discrete random variable or summation for a continuous one.

✅ Correct: PMF for discrete (summation), PDF for continuous (integration).

❌ Incorrect PDF Properties: Forgetting to check $f(x) \ge 0$ or not normalizing the integral to 1.

✅ Correct: Always ensure

f(x) \ge 0

and

\int f(x) dx = 1

❌ Probability of a single point for continuous RV: Assuming $P(X=x_0)$ is non-zero for a continuous random variable.

✅ Correct: For continuous RVs,

P(X=x_0) = 0

. Probabilities are over intervals.

❌ Ignoring Independence for Variance Sums: Applying $\operatorname{Var}(X+Y) = \operatorname{Var}(X) + \operatorname{Var}(Y)$ when $X$ and $Y$ are not independent.

✅ Correct: This property requires independence. If not independent, use

\operatorname{Var}(X+Y) = \operatorname{Var}(X) + \operatorname{Var}(Y) + 2\operatorname{Cov}(X,Y)

❌ Misapplying CLT: Using CLT for small sample sizes or for variables that are not i.i.d.

✅ Correct: CLT is for large

n

(typically

n \ge 30

) and i.i.d. random variables.

❌ Calculation Errors in Integration/Summation: Simple algebraic or calculus mistakes when evaluating integrals or sums.

✅ Correct: Double-check calculations, especially definite integrals and series summations.

---

Practice Questions

:::question type="MCQ" question="Let $X$ be a continuous random variable with the probability density function given by:

f(x) = \begin{cases} k e^{-2x}, & x > 0 \\ 0, & \text{otherwise} \end{cases}

What is the value of

k

that makes

f(x)

a valid PDF?" options=["

1/2

","

1

","

2

","

e

"] answer="

2

" hint="Use the property that the integral of a PDF over its entire range must equal 1." solution="Step 1: Apply the normalization condition for a PDF.

\int_{-\infty}^{\infty} f(x) dx = 1

Step 2: Substitute the given PDF into the integral.

\int_{0}^{\infty} k e^{-2x} dx = 1

Step 3: Evaluate the integral.

k \left[ -\frac{1}{2} e^{-2x} \right]_{0}^{\infty} = 1

k \left( \lim_{b \to \infty} \left( -\frac{1}{2} e^{-2b} \right) - \left( -\frac{1}{2} e^{0} \right) \right) = 1

k \left( 0 - \left( -\frac{1}{2} \right) \right) = 1

k \left( \frac{1}{2} \right) = 1

Step 4: Solve for

k

k = 2

Answer: \boxed{2}"
:::

:::question type="NAT" question="A discrete random variable $Y$ has the following Probability Mass Function:

P(Y=y) = \frac{c}{y+1}, \quad \text{for } y=0, 1, 2

What is the value of

c

(rounded to two decimal places)?" answer="0.55" hint="The sum of all probabilities in a PMF must be equal to 1." solution="Step 1: Apply the normalization condition for a PMF.

\sum_{y=0}^{2} P(Y=y) = 1

Step 2: Sum the probabilities for each possible value of

Y

P(Y=0) = \frac{c}{0+1} = c

P(Y=1) = \frac{c}{1+1} = \frac{c}{2}

P(Y=2) = \frac{c}{2+1} = \frac{c}{3}

Step 3: Set the sum equal to 1 and solve for

c

c + \frac{c}{2} + \frac{c}{3} = 1

Find a common denominator (6).

\frac{6c}{6} + \frac{3c}{6} + \frac{2c}{6} = 1

\frac{11c}{6} = 1

c = \frac{6}{11}

Step 4: Round to two decimal places.

c \approx 0.5454... \approx 0.55

Answer: \boxed{0.55}"
:::

:::question type="MSQ" question="Let $X$ be a continuous random variable with CDF given by:

F(x) = \begin{cases} 0, & x < 0 \\ x^2, & 0 \le x < 1 \\ 1, & x \ge 1 \end{cases}

Which of the following statements is/are true?" options=["The PDF of

X

f(x) = 2x

for

0 \le x < 1

.","

P(X \le 0.5) = 0.25

.","

P(0.2 < X < 0.8) = 0.6

.","

E[X] = 2/3

." ] answer="A,B,C,D" hint="Remember that

f(x) = dF(x)/dx

for continuous random variables. Use the CDF to find probabilities. Calculate

E[X] = \int x f(x) dx

." solution="Statement A: The PDF of $X$ is $f(x) = 2x$ for $0 \le x < 1$ .
To find the PDF from the CDF, differentiate the CDF:

f(x) = \frac{d}{dx} F(x) = \frac{d}{dx} (x^2) = 2x

This is valid for

0 \le x < 1

. For

x<0

and

x \ge 1

f(x)=0

. So, statement A is true.

Statement B: $P(X \le 0.5) = 0.25$ .
Using the CDF definition:

P(X \le 0.5) = F(0.5)

Since

0 \le 0.5 < 1

, we use

F(x) = x^2

F(0.5) = (0.5)^2 = 0.25

So, statement B is true.

Statement C: $P(0.2 < X < 0.8) = 0.6$ .
Using the CDF property $P(a < X < b) = F(b) - F(a)$ :

P(0.2 < X < 0.8) = F(0.8) - F(0.2)

F(0.8) = (0.8)^2 = 0.64

F(0.2) = (0.2)^2 = 0.04

P(0.2 < X < 0.8) = 0.64 - 0.04 = 0.60

So, statement C is true.

Statement D: $E[X] = 2/3$ .
Using the PDF $f(x) = 2x$ for $0 \le x < 1$ :

E[X] = \int_{-\infty}^{\infty} x f(x) dx = \int_{0}^{1} x (2x) dx

E[X] = \int_{0}^{1} 2x^2 dx

E[X] = \left[ \frac{2x^3}{3} \right]_{0}^{1}

E[X] = \frac{2(1)^3}{3} - \frac{2(0)^3}{3} = \frac{2}{3}

So, statement D is true.

All options A, B, C, D are true.
Answer: \boxed{A,B,C,D}"
:::

:::question type="SUB" question="A manufacturing process produces items whose weights are independent random variables with a mean of $10$ kg and a standard deviation of $2$ kg. A sample of $64$ items is taken from the production line.
(a) What is the probability that the average weight of the items in the sample is less than $9.5$ kg?
(b) What is the total expected weight of the $64$ items?" answer=" $0.0228$ , $640$ kg" hint="For part (a), use the Central Limit Theorem to approximate the distribution of the sample mean. For part (b), use the linearity of expectation for a sum of random variables." solution="(a) Probability that the average weight is less than $9.5$ kg:

Step 1: Identify parameters for a single item $X_i$ .
Mean $E[X_i] = \mu = 10$ kg.
Standard deviation $\sigma = 2$ kg.
Sample size $n = 64$ .

Step 2: Apply the Central Limit Theorem to the sample mean $\bar{X}_n$ .
For large $n$ , $\bar{X}_n$ is approximately normally distributed with:
Mean of sample mean: $E[\bar{X}_n] = \mu = 10$ kg.
Standard deviation of sample mean (standard error): $\sigma_{\bar{X}_n} = \frac{\sigma}{\sqrt{n}} = \frac{2}{\sqrt{64}} = \frac{2}{8} = 0.25$ kg.
So, $\bar{X}_{64} \sim N(10, (0.25)^2)$ approximately.

Step 3: Standardize the value $9.5$ kg.
We want to find $P(\bar{X}_{64} < 9.5)$ .

Z = \frac{\bar{X}_{64} - E[\bar{X}_{64}]}{\sigma_{\bar{X}_{64}}}

Z = \frac{9.5 - 10}{0.25}

Z = \frac{-0.5}{0.25}

Z = -2

Step 4: Look up the probability using the standard normal CDF.

P(\bar{X}_{64} < 9.5) \approx P(Z < -2)

Using a standard normal table or calculator,

P(Z < -2) \approx 0.0228

(b) Total expected weight of the $64$ items:

Step 1: Define the total weight $S_{64}$ .
$S_{64} = \sum_{i=1}^{64} X_i$ .

Step 2: Apply the linearity of expectation.

E[S_{64}] = E\left[\sum_{i=1}^{64} X_i\right] = \sum_{i=1}^{64} E[X_i]

Since each

E[X_i] = \mu = 10

kg:

E[S_{64}] = 64 \times 10 = 640

Answer: (a) \boxed{0.0228}, (b) \boxed{640 \text{ kg}}"
:::

---

Summary

❗ Key Takeaways for CMI

PMF vs. PDF: Discrete random variables use Probability Mass Functions (PMF) which sum to 1. Continuous random variables use Probability Density Functions (PDF) which integrate to 1.

CDF for All: The Cumulative Distribution Function (CDF) $F(x) = P(X \le x)$ is defined for both discrete and continuous variables, is non-decreasing, and ranges from 0 to 1. $P(a < X \le b) = F(b) - F(a)$ .

Expected Value & Variance: These measure central tendency and spread. Remember $E[aX+b] = aE[X]+b$ and $\operatorname{Var}(aX+b) = a^2\operatorname{Var}(X)$ . For independent variables, $\operatorname{Var}(X+Y) = \operatorname{Var}(X) + \operatorname{Var}(Y)$ .

Central Limit Theorem (CLT): For a large number of i.i.d. random variables, their sum or average is approximately normally distributed. This is critical for inferential statistics and often tested in CMI.

Standardization (Z-score): Use $Z = (X - \mu)/\sigma$ to convert any normal random variable to a standard normal $N(0,1)$ , which allows for probability look-ups in Z-tables.

---

What's Next?

💡 Continue Learning

This topic connects to:

Specific Probability Distributions: Understanding distribution functions is the foundation for studying named distributions like Bernoulli, Binomial, Poisson, Exponential, Uniform, and Normal distributions. Each has its own PMF/PDF and CDF.

Joint Distributions: Extending these concepts to multiple random variables, understanding their joint behavior, and concepts like covariance and correlation.

Statistical Inference: The Central Limit Theorem forms the bedrock for hypothesis testing and confidence intervals, allowing us to make inferences about population parameters from sample data.

Master these connections for comprehensive CMI preparation!

---

💡 Moving Forward

Now that you understand Distribution Functions, let's explore Expectation and Variance which builds on these concepts.

---

Part 3: Expectation and Variance

Introduction

Expectation and variance are fundamental concepts in probability theory, providing concise summaries of the central tendency and spread of a random variable's distribution. The expectation, or expected value, quantifies the "average" outcome of a random variable over a large number of trials. It represents the weighted average of all possible values a random variable can take, with weights given by their respective probabilities. The variance, on the other hand, measures the dispersion or spread of the random variable's values around its expected value. A low variance indicates that values tend to be close to the mean, while a high variance suggests that values are spread out over a wider range.

In the CMI examination, a deep understanding of expectation and variance is crucial. These concepts are extensively tested, often through complex scenarios involving multiple random variables, indicator functions, and various probability distributions. Mastery of linearity of expectation and the properties of variance is essential for efficiently solving problems that might otherwise appear intractable.

📖 Random Variable

A random variable is a function that maps the outcomes of a random experiment to real numbers. It can be discrete (taking on a finite or countably infinite number of values) or continuous (taking on any value within a given interval).

---

Key Concepts

1. Expectation of a Random Variable

The expectation, also known as the expected value or mean, of a random variable $X$ is denoted by $E[X]$ or $\mu$ . It represents the long-run average value of the variable.

1.1. Discrete Random Variables

For a discrete random variable $X$ with probability mass function (PMF) $P(X=x)$ , the expectation is calculated by summing the products of each possible value of $X$ and its corresponding probability.

📐 Expectation of a Discrete Random Variable

E[X] = \sum_{x} x P(X=x)

Variables:

$X$ = discrete random variable

$x$ = a possible value of $X$

$P(X=x)$ = probability mass function (PMF) at $x$

When to use: To find the average value of a discrete random variable.

Worked Example:

Problem: A fair six-sided die is rolled. Let $X$ be the number rolled. Calculate $E[X]$ .

Solution:

Step 1: Identify the possible values of $X$ and their probabilities.
The possible values are $1, 2, 3, 4, 5, 6$ . Since the die is fair, each outcome has a probability of $1/6$ .

P(X=x) = \frac{1}{6} \quad \text{for } x \in \{1, 2, 3, 4, 5, 6\}

Step 2: Apply the formula for the expectation of a discrete random variable.

E[X] = \sum_{x=1}^{6} x P(X=x)

E[X] = 1 \cdot \frac{1}{6} + 2 \cdot \frac{1}{6} + 3 \cdot \frac{1}{6} + 4 \cdot \frac{1}{6} + 5 \cdot \frac{1}{6} + 6 \cdot \frac{1}{6}

Step 3: Simplify the expression.

E[X] = \frac{1}{6} (1+2+3+4+5+6)

E[X] = \frac{21}{6}

E[X] = 3.5

Answer: \boxed{3.5}

---

1.2. Continuous Random Variables

For a continuous random variable $X$ with probability density function (PDF) $f(x)$ , the expectation is calculated by integrating the product of $x$ and its PDF over the entire range of possible values.

📐 Expectation of a Continuous Random Variable

E[X] = \int_{-\infty}^{\infty} x f(x) dx

Variables:

$X$ = continuous random variable

$f(x)$ = probability density function (PDF) of $X$

When to use: To find the average value of a continuous random variable.

Worked Example:

Problem: Let $X$ be a continuous random variable with PDF $f(x) = 2x$ for $0 \le x \le 1$ , and $f(x) = 0$ otherwise. Calculate $E[X]$ .

Solution:

Step 1: Identify the PDF and its range.

f(x) = 2x \quad \text{for } 0 \le x \le 1

Step 2: Apply the formula for the expectation of a continuous random variable.

E[X] = \int_{-\infty}^{\infty} x f(x) dx

Since $f(x)$ is non-zero only for $0 \le x \le 1$ , the integral limits change.

E[X] = \int_{0}^{1} x (2x) dx

Step 3: Evaluate the integral.

E[X] = \int_{0}^{1} 2x^2 dx

E[X] = \left[ \frac{2x^3}{3} \right]_{0}^{1}

E[X] = \left( \frac{2(1)^3}{3} \right) - \left( \frac{2(0)^3}{3} \right)

E[X] = \frac{2}{3}

Answer: \boxed{\frac{2}{3}}

---

1.3. Properties of Expectation

Expectation has several important properties that simplify calculations, especially when dealing with sums or transformations of random variables.

📐 Properties of Expectation

Expectation of a constant: $E[c] = c$

Scalar multiplication: $E[aX] = a E[X]$

Addition of a constant: $E[X + b] = E[X] + b$

Linearity of Expectation: For any random variables $X_1, X_2, \ldots, X_n$ (whether independent or dependent) and constants $a_1, a_2, \ldots, a_n$ :

E\left[\sum_{i=1}^{n} a_i X_i\right] = \sum_{i=1}^{n} a_i E[X_i]

E[X+Y] = E[X] + E[Y]

Expectation of a function of a random variable:

For discrete $X$ : $E[g(X)] = \sum_{x} g(x) P(X=x)$
For continuous

X

E[g(X)] = \int_{-\infty}^{\infty} g(x) f(x) dx

Worked Example (Linearity of Expectation):

Problem: A box contains 10 balls: 3 red and 7 blue. Two balls are drawn without replacement. Let $X$ be the number of red balls drawn. Find $E[X]$ .

Solution:

Step 1: Define indicator random variables.
Let $X_1$ be an indicator variable for the first ball drawn being red.
Let $X_2$ be an indicator variable for the second ball drawn being red.

X_1 = \begin{cases} 1 & \text{if the first ball is red} \\ 0 & \text{if the first ball is blue} \end{cases}

X_2 = \begin{cases} 1 & \text{if the second ball is red} \\ 0 & \text{if the second ball is blue} \end{cases}

Step 2: Express $X$ as a sum of indicator variables.
The total number of red balls drawn is $X = X_1 + X_2$ .

Step 3: Calculate the expectation of each indicator variable.
For $X_1$ :

P(X_1=1) = \frac{3}{10}

E[X_1] = 1 \cdot P(X_1=1) + 0 \cdot P(X_1=0) = \frac{3}{10}

For $X_2$ :
The probability that the second ball is red can be found using the Law of Total Probability:

P(X_2=1) = P(X_2=1 | X_1=1)P(X_1=1) + P(X_2=1 | X_1=0)P(X_1=0)

P(X_2=1) = \left(\frac{2}{9}\right)\left(\frac{3}{10}\right) + \left(\frac{3}{9}\right)\left(\frac{7}{10}\right)

P(X_2=1) = \frac{6}{90} + \frac{21}{90} = \frac{27}{90} = \frac{3}{10}

So, $E[X_2] = \frac{3}{10}$ .

Step 4: Apply linearity of expectation.

E[X] = E[X_1 + X_2]

E[X] = E[X_1] + E[X_2]

E[X] = \frac{3}{10} + \frac{3}{10}

E[X] = \frac{6}{10} = \frac{3}{5}

Answer: $\frac{3}{5}$

---

2. Variance of a Random Variable

The variance of a random variable $X$ , denoted by $V(X)$ or $\text{Var}(X)$ or $\sigma^2$ , measures the spread or dispersion of its values around the mean. It is the expected value of the squared deviation from the mean.

📐 Variance of a Random Variable

V(X) = E[(X - E[X])^2]

Alternative (Computational) Formula:

V(X) = E[X^2] - (E[X])^2

Variables:

$X$ = random variable

$E[X]$ = expected value of $X$

$E[X^2]$ = expected value of $X^2$

When to use: To quantify the spread of a random variable's distribution. The alternative formula is often easier for calculation.

Derivation of $V(X) = E[X^2] - (E[X])^2$ :

Step 1: Start with the definition of variance.

V(X) = E[(X - E[X])^2]

Step 2: Expand the squared term inside the expectation. Let $\mu = E[X]$ for simplicity.

V(X) = E[(X - \mu)^2]

V(X) = E[X^2 - 2\mu X + \mu^2]

Step 3: Apply linearity of expectation.

V(X) = E[X^2] - E[2\mu X] + E[\mu^2]

Step 4: Use properties of expectation ( $E[aX] = aE[X]$ and $E[c] = c$ ).

V(X) = E[X^2] - 2\mu E[X] + \mu^2

Step 5: Substitute $\mu = E[X]$ back into the expression.

V(X) = E[X^2] - 2 E[X] E[X] + (E[X])^2

V(X) = E[X^2] - 2 (E[X])^2 + (E[X])^2

Step 6: Simplify the expression.

V(X) = E[X^2] - (E[X])^2

---

Worked Example:

Problem: A fair six-sided die is rolled. Let $X$ be the number rolled. Calculate $V(X)$ .

Solution:

Step 1: Recall $E[X]$ from the previous example.

E[X] = 3.5

Step 2: Calculate $E[X^2]$ .
Using the formula $E[g(X)] = \sum_{x} g(x) P(X=x)$ with $g(X) = X^2$ .

E[X^2] = \sum_{x=1}^{6} x^2 P(X=x)

E[X^2] = 1^2 \cdot \frac{1}{6} + 2^2 \cdot \frac{1}{6} + 3^2 \cdot \frac{1}{6} + 4^2 \cdot \frac{1}{6} + 5^2 \cdot \frac{1}{6} + 6^2 \cdot \frac{1}{6}

E[X^2] = \frac{1}{6} (1 + 4 + 9 + 16 + 25 + 36)

E[X^2] = \frac{91}{6}

Step 3: Apply the computational formula for variance.

V(X) = E[X^2] - (E[X])^2

V(X) = \frac{91}{6} - (3.5)^2

V(X) = \frac{91}{6} - \left(\frac{7}{2}\right)^2

V(X) = \frac{91}{6} - \frac{49}{4}

Step 4: Find a common denominator and simplify.

V(X) = \frac{182}{12} - \frac{147}{12}

V(X) = \frac{35}{12}

Answer: $\frac{35}{12}$

---

2.1. Properties of Variance

Variance also has several key properties.

📐 Properties of Variance

Non-negativity: $V(X) \ge 0$

Variance of a constant: $V(c) = 0$

Scalar multiplication and addition of a constant:

V(aX + b) = a^2 V(X)

Variance of a sum of independent random variables: If $X_1, X_2, \ldots, X_n$ are independent random variables, then:

V\left[\sum_{i=1}^{n} X_i\right] = \sum_{i=1}^{n} V(X_i)

V(X+Y) = V(X) + V(Y)

X

Y

Variance of a sum of dependent random variables: If $X$ and $Y$ are dependent:

V(X+Y) = V(X) + V(Y) + 2 \text{Cov}(X,Y)

where

\text{Cov}(X,Y) = E[(X-E[X])(Y-E[Y])]

is the covariance between

X

and

Y

❗ Independence for Variance

Unlike expectation, which is always linear ( $E[X+Y] = E[X]+E[Y]$ regardless of independence), the variance of a sum is only the sum of variances if the random variables are independent. If they are dependent, the covariance term must be included.

---

3. Standard Deviation

The standard deviation is the square root of the variance and is denoted by $\sigma$ . It has the same units as the random variable itself, making it more interpretable than variance in many contexts.

📐 Standard Deviation

\sigma_X = \sqrt{V(X)}

Variables:

$\sigma_X$ = standard deviation of $X$

$V(X)$ = variance of $X$

When to use: To express the spread of data in the original units of the random variable.

---

4. Indicator Random Variables

An indicator random variable is a special type of discrete random variable that takes on a value of $1$ if a particular event occurs and $0$ otherwise. They are incredibly powerful when used with the linearity of expectation, especially in counting problems.

📖 Indicator Random Variable

For an event $A$ , the indicator random variable $I_A$ is defined as:

I_A = \begin{cases} 1 & \text{if event } A \text{ occurs} \\ 0 & \text{if event } A \text{ does not occur} \end{cases}

📐 Expectation of an Indicator Variable

E[I_A] = P(A)

Variables:

$I_A$ = indicator random variable for event $A$

$P(A)$ = probability of event $A$

Why:

E[I_A] = 1 \cdot P(I_A=1) + 0 \cdot P(I_A=0) = 1 \cdot P(A) + 0 \cdot (1-P(A)) = P(A)

Worked Example (Using Indicator Variables for Expectation):

Problem: In a group of $n$ people, what is the expected number of people who share the same birthday (ignoring leap years)?

Solution:

Step 1: Define indicator variables for each possible pair of people.
Let $N = \binom{n}{2}$ be the total number of pairs of people.
Let $I_{ij}$ be an indicator variable that people $i$ and $j$ share a birthday, for $1 \le i < j \le n$ .

I_{ij} = \begin{cases} 1 & \text{if person } i \text{ and person } j \text{ share a birthday} \\ 0 & \text{otherwise} \end{cases}

Step 2: Express the total number of shared birthdays ( $X$ ) as a sum of indicator variables.

X = \sum_{1 \le i < j \le n} I_{ij}

Step 3: Calculate the expectation of a single indicator variable.
Assuming each day of the year (365 days) is equally likely for a birthday.
The probability that two specific people share a birthday is $P(I_{ij}=1) = \frac{1}{365}$ .

E[I_{ij}] = P(I_{ij}=1) = \frac{1}{365}

Step 4: Apply linearity of expectation.

E[X] = E\left[\sum_{1 \le i < j \le n} I_{ij}\right]

E[X] = \sum_{1 \le i < j \le n} E[I_{ij}]

Since there are $\binom{n}{2}$ such indicator variables, and each has the same expectation:

E[X] = \binom{n}{2} \cdot \frac{1}{365}

E[X] = \frac{n(n-1)}{2} \cdot \frac{1}{365}

Answer: $\frac{n(n-1)}{730}$

---

5. Chebyshev's Inequality

Chebyshev's Inequality provides a bound on the probability that a random variable deviates from its mean by a certain amount. It is a powerful tool because it applies to any probability distribution for which the mean and variance exist, without requiring knowledge of the specific distribution shape.

📐 Chebyshev's Inequality

For any random variable $X$ with finite mean $E[X]$ and finite variance $V(X)$ , and for any real number $k > 0$ :

P(|X - E[X]| \ge k) \le \frac{V(X)}{k^2}

Alternative form: Let $k = c\sigma$ , where $\sigma = \sqrt{V(X)}$ is the standard deviation and $c > 0$ .

P(|X - E[X]| \ge c\sigma) \le \frac{1}{c^2}

Variables:

$X$ = random variable

$E[X]$ = mean of $X$

$V(X)$ = variance of $X$

$k$ = a positive constant representing the deviation from the mean

When to use: To provide a general upper bound on the probability of extreme deviations from the mean when the exact distribution is unknown or complex.

Worked Example:

Problem: The average height of students in a university is $170$ cm with a standard deviation of $5$ cm. What is the minimum percentage of students whose height is between $160$ cm and $180$ cm?

Solution:

Step 1: Identify the given values.
$E[X] = 170$ cm
$\sigma_X = 5$ cm
We want to find $P(160 \le X \le 180)$ .

Step 2: Rephrase the probability in terms of deviation from the mean.
The interval $[160, 180]$ is $170 \pm 10$ . So, we are interested in $P(|X - 170| \le 10)$ .

Step 3: Apply Chebyshev's Inequality for the complementary event.
Chebyshev's inequality gives an upper bound for $P(|X - E[X]| \ge k)$ .
Here, $k = 10$ .

P(|X - 170| \ge 10) \le \frac{V(X)}{10^2}

First, calculate $V(X) = \sigma_X^2 = 5^2 = 25$ .

P(|X - 170| \ge 10) \le \frac{25}{100}

P(|X - 170| \ge 10) \le \frac{1}{4}

Step 4: Find the probability for the desired interval.
The probability of being within the interval is $1 - P(|X - 170| \ge 10)$ .

P(160 \le X \le 180) = 1 - P(|X - 170| \ge 10)

P(160 \le X \le 180) \ge 1 - \frac{1}{4}

P(160 \le X \le 180) \ge \frac{3}{4}

Step 5: Convert to percentage.

\frac{3}{4} = 0.75 = 75\%

Answer: At least $75\%$ of students have heights between $160$ cm and $180$ cm.

---

Problem-Solving Strategies

💡 CMI Strategy: Linearity of Expectation with Indicators

Many CMI problems involving counting the expected number of "events" (e.g., matching items, shared birthdays, special points) are most efficiently solved using linearity of expectation with indicator random variables.

Define the overall random variable $X$ as the quantity you need to find the expectation of.

Decompose $X$ into a sum of simpler random variables $X_i$ . Often, these $X_i$ will be indicator variables. For example, if $X$ is the number of items with property $A$ , define $X_i = 1$ if item $i$ has property $A$ , and $0$ otherwise.

Calculate $E[X_i]$ for each individual $X_i$ . For an indicator variable $I_A$ , this is simply $P(A)$ .

Apply linearity of expectation:

E[X] = E\left[\sum X_i\right] = \sum E[X_i]

This works even if the

X_i

are dependent, which is a major advantage.

💡 CMI Strategy: Using

V(X) = E[X^2] - (E[X])^2

This formula is almost always easier for calculating variance than the definition $E[(X - E[X])^2]$ , especially for complex distributions or when $E[X]$ is not an integer.

First, calculate $E[X]$ .

Then, calculate $E[X^2]$ . Remember $E[X^2]$ is not $(E[X])^2$ . For discrete variables, it's $\sum x^2 P(X=x)$ ; for continuous, $\int x^2 f(x) dx$ .

Finally, subtract $(E[X])^2$ from $E[X^2]$ .

💡 CMI Strategy: Handling Conditional Information

When a problem provides conditional probabilities or usage statistics (like in server outage problems), use the Law of Total Probability to find unconditional probabilities, which can then be used in expectation calculations.
For example, if you need the overall probability of an event $A$ that depends on conditions $B_i$ :

P(A) = \sum P(A|B_i)P(B_i)

Then, if

X

is a value associated with

A

E[X]

might involve these combined probabilities.

---

Common Mistakes

⚠️ Avoid These Errors

❌ Assuming independence for variance of a sum: Students often mistakenly write $V(X+Y) = V(X) + V(Y)$ even when $X$ and $Y$ are dependent.

✅ Correct approach: Remember that

V(X+Y) = V(X) + V(Y) + 2\operatorname{Cov}(X,Y)

X

and

Y

are independent,

\operatorname{Cov}(X,Y)=0

, so

V(X+Y) = V(X) + V(Y)

. Always check for independence.

❌ Confusing $E[X^2]$ with $(E[X])^2$ : These are generally not equal ( $E[X^2] \ge (E[X])^2$ always).

✅ Correct approach:

E[X^2]

is the expectation of the squared random variable.

(E[X])^2

is the square of the expected value. Calculate them separately.

❌ Incorrectly applying linearity of expectation to products: $E[XY] \ne E[X]E[Y]$ unless $X$ and $Y$ are independent.

✅ Correct approach: Linearity applies to sums. For products, use

E[XY] = E[X]E[Y] + \operatorname{Cov}(X,Y)

If independent, then

E[XY] = E[X]E[Y]

❌ Misinterpreting probability in indicator variable problems: For $E[I_A]$ , the probability $P(A)$ must be calculated correctly, considering all conditions of the event $A$ .

✅ Correct approach: Carefully define the event

A

for each indicator variable and calculate its probability precisely. This often involves basic combinatorial probability.

---

Practice Questions

:::question type="NAT" question="A company manufactures light bulbs. The lifespan of a bulb, $X$ , in years, has a probability density function $f(x) = \lambda e^{-\lambda x}$ for $x \ge 0$ , where $\lambda = 0.5$ . What is the expected lifespan of a bulb in years?" answer="2" hint="Recall the formula for the expectation of a continuous random variable and the properties of the exponential distribution." solution="Step 1: Identify the PDF and its parameter.
The PDF is $f(x) = 0.5 e^{-0.5x}$ for $x \ge 0$ . This is an exponential distribution with rate parameter $\lambda = 0.5$ .

Step 2: Apply the formula for the expectation of a continuous random variable.

E[X] = \int_{0}^{\infty} x f(x) dx

E[X] = \int_{0}^{\infty} x (0.5 e^{-0.5x}) dx

This is the mean of an exponential distribution, which is

1/\lambda

Step 3: Calculate the expectation.

E[X] = \frac{1}{\lambda} = \frac{1}{0.5} = 2

The expected lifespan is 2 years.
Answer: \boxed{2}"
:::

:::question type="MCQ" question="Let $X$ be a discrete random variable with $P(X=1) = 0.2$ , $P(X=2) = 0.3$ , and $P(X=3) = 0.5$ . Which of the following statements about $V(X)$ is correct?" options=[" $V(X) = 0.61$ "," $V(X) = 0.76$ "," $V(X) = 1.69$ "," $V(X) = 2.3$ "] answer=" $V(X) = 0.61$ " hint="First calculate $E[X]$ and $E[X^2]$ , then use the formula $V(X) = E[X^2] - (E[X])^2$ ." solution="Step 1: Calculate $E[X]$ .

E[X] = \sum x P(X=x)

E[X] = (1)(0.2) + (2)(0.3) + (3)(0.5)

E[X] = 0.2 + 0.6 + 1.5 = 2.3

Step 2: Calculate $E[X^2]$ .

E[X^2] = \sum x^2 P(X=x)

E[X^2] = (1^2)(0.2) + (2^2)(0.3) + (3^2)(0.5)

E[X^2] = (1)(0.2) + (4)(0.3) + (9)(0.5)

E[X^2] = 0.2 + 1.2 + 4.5 = 5.9

Step 3: Calculate $V(X)$ .

V(X) = E[X^2] - (E[X])^2

V(X) = 5.9 - (2.3)^2

V(X) = 5.9 - 5.29

V(X) = 0.61

Answer: \boxed{0.61}"
:::

:::question type="SUB" question="A bag contains 5 red balls and 5 blue balls. Three balls are drawn without replacement. Let $Y$ be the number of blue balls drawn. Calculate $E[Y]$ using indicator random variables." answer=" $E[Y] = 1.5$ " hint="Define an indicator variable for each draw, then use linearity of expectation." solution="Step 1: Define indicator random variables.
Let $Y_1$ , $Y_2$ , $Y_3$ be indicator variables for the first, second, and third ball drawn being blue, respectively.

Y_i = \begin{cases} 1 & \text{if the } i\text{-th ball drawn is blue} \\ 0 & \text{otherwise} \end{cases}

Step 2: Express $Y$ as a sum of indicator variables.

Y = Y_1 + Y_2 + Y_3

Step 3: Calculate the expectation of each indicator variable.
For $Y_1$ :

P(Y_1=1) = \frac{5}{10} = \frac{1}{2}

E[Y_1] = P(Y_1=1) = \frac{1}{2}

For $Y_2$ :
By symmetry, the probability that the second ball drawn is blue is the same as the first. Alternatively, using Law of Total Probability:

P(Y_2=1) = P(Y_2=1|Y_1=1)P(Y_1=1) + P(Y_2=1|Y_1=0)P(Y_1=0)

P(Y_2=1) = \left(\frac{4}{9}\right)\left(\frac{5}{10}\right) + \left(\frac{5}{9}\right)\left(\frac{5}{10}\right)

P(Y_2=1) = \frac{20}{90} + \frac{25}{90} = \frac{45}{90} = \frac{1}{2}

So,

E[Y_2] = \frac{1}{2}

For $Y_3$ :
Similarly,

E[Y_3] = \frac{1}{2}

Step 4: Apply linearity of expectation.

E[Y] = E[Y_1 + Y_2 + Y_3]

E[Y] = E[Y_1] + E[Y_2] + E[Y_3]

E[Y] = \frac{1}{2} + \frac{1}{2} + \frac{1}{2}

E[Y] = \frac{3}{2} = 1.5

Answer: \boxed{1.5}"
:::

:::question type="MSQ" question="Let $X$ be a random variable with $E[X]=5$ and $V(X)=4$ . Which of the following statements are correct?" options=[" $E[2X+3] = 13$ "," $V(2X+3) = 16$ "," $E[X^2] = 29$ "," $P(|X-5| \ge 4) \le 1/4$ "] answer="A,B,C,D" hint="Apply the properties of expectation and variance, and Chebyshev's inequality." solution="Let's evaluate each option:

Option A: $E[2X+3] = 13$
Using linearity of expectation:

E[2X+3] = E[2X] + E[3]

E[2X+3] = 2E[X] + 3

Given

E[X]=5

E[2X+3] = 2(5) + 3 = 10 + 3 = 13

This statement is correct.

Option B: $V(2X+3) = 16$
Using properties of variance:

V(aX+b) = a^2 V(X)

V(2X+3) = 2^2 V(X)

Given

V(X)=4

V(2X+3) = 4(4) = 16

This statement is correct.

Option C: $E[X^2] = 29$
Using the computational formula for variance:

V(X) = E[X^2] - (E[X])^2

We are given

V(X)=4

and

E[X]=5

4 = E[X^2] - (5)^2

4 = E[X^2] - 25

E[X^2] = 4 + 25 = 29

This statement is correct.

Option D: $P(|X-5| \ge 4) \le 1/4$
Using Chebyshev's Inequality:

P(|X - E[X]| \ge k) \le \frac{V(X)}{k^2}

Here,

E[X]=5

V(X)=4

, and

k=4

P(|X-5| \ge 4) \le \frac{4}{4^2}

P(|X-5| \ge 4) \le \frac{4}{16}

P(|X-5| \ge 4) \le \frac{1}{4}

This statement is correct.

All statements are correct.
Answer: \boxed{A,B,C,D}"
:::

:::question type="NAT" question="A discrete random variable $Y$ has $P(Y=0)=0.4$ , $P(Y=1)=0.3$ , and $P(Y=2)=0.3$ . What is the standard deviation of $Y$ (round to two decimal places)?" answer="0.83" hint="Calculate $E[Y]$ and $E[Y^2]$ , then $V(Y)$ , and finally $\sigma_Y = \sqrt{V(Y)}$ ." solution="Step 1: Calculate $E[Y]$ .

E[Y] = \sum y P(Y=y)

E[Y] = (0)(0.4) + (1)(0.3) + (2)(0.3)

E[Y] = 0 + 0.3 + 0.6 = 0.9

Step 2: Calculate $E[Y^2]$ .

E[Y^2] = \sum y^2 P(Y=y)

E[Y^2] = (0^2)(0.4) + (1^2)(0.3) + (2^2)(0.3)

E[Y^2] = (0)(0.4) + (1)(0.3) + (4)(0.3)

E[Y^2] = 0 + 0.3 + 1.2 = 1.5

Step 3: Calculate $V(Y)$ .

V(Y) = E[Y^2] - (E[Y])^2

V(Y) = 1.5 - (0.9)^2

V(Y) = 1.5 - 0.81

V(Y) = 0.69

Step 4: Calculate the standard deviation $\sigma_Y$ .

\sigma_Y = \sqrt{V(Y)}

\sigma_Y = \sqrt{0.69}

\sigma_Y \approx 0.83066...

Rounding to two decimal places,

\sigma_Y = 0.83

.
Answer: \boxed{0.83}"
:::

---

Summary

❗ Key Takeaways for CMI

Expectation ( $E[X]$ ): Represents the long-run average. For discrete $X$ ,

E[X] = \sum x P(X=x)

X

E[X] = \int x f(x) dx

Linearity of Expectation:

E\left[\sum a_i X_i\right] = \sum a_i E[X_i]

always

X_i

Variance ( $V(X)$ ): Measures the spread around the mean. The computational formula

V(X) = E[X^2] - (E[X])^2

Properties of Variance:

V(aX+b) = a^2 V(X)

V\left(\sum X_i\right) = \sum V(X_i)

Indicator Random Variables: $I_A = 1$ if event $A$ occurs, $0$ otherwise.

E[I_A] = P(A)

Chebyshev's Inequality:

P(|X - E[X]| \ge k) \le \frac{V(X)}{k^2}

provides a general bound on deviations from the mean for any distribution with finite mean and variance.

---

What's Next?

💡 Continue Learning

This topic connects to:

Covariance and Correlation: Understanding dependence between random variables, which is essential for calculating variance of sums of dependent variables

V(X+Y) = V(X) + V(Y) + 2\operatorname{Cov}(X,Y)

Moment Generating Functions (MGFs): MGFs are powerful tools for finding expectations and variances of random variables, especially for sums of independent random variables. They provide an alternative, often simpler, method to derive these moments.

Common Probability Distributions: Knowing the specific formulas for $E[X]$ and $V(X)$ for distributions like Binomial, Poisson, Geometric, Uniform, Normal, and Exponential is critical for applying these concepts in specific scenarios.

Master these connections for comprehensive CMI preparation!

---

💡 Moving Forward

Now that you understand Expectation and Variance, let's explore Standard Distributions which builds on these concepts.

---

Part 4: Standard Distributions

Introduction

Standard distributions are fundamental building blocks in probability theory and statistics, providing models for a wide array of random phenomena encountered in data science. Each distribution describes the probabilities of different outcomes for a specific type of random variable, characterized by its parameters. Understanding these distributions is crucial for modeling real-world data, performing statistical inference, and making informed decisions.

In the CMI exam, a deep understanding of standard discrete and continuous distributions is essential. This includes knowing their probability mass/density functions, cumulative distribution functions, expected values, variances, and how to apply them to calculate probabilities and estimate parameters in various scenarios. Mastery of these concepts forms the bedrock for advanced topics like hypothesis testing, regression analysis, and machine learning algorithms.

📖 Random Variable

A random variable is a function that maps outcomes from a sample space to numerical values.

A discrete random variable can take on a finite or countably infinite number of values.

A continuous random variable can take on any value within a given range or interval.

---

Key Concepts

1. Discrete Distributions

Discrete distributions model scenarios where the outcomes are countable.

1.1 Bernoulli Distribution

The Bernoulli distribution models a single trial with two possible outcomes: "success" (usually denoted by 1) or "failure" (usually denoted by 0).

📐 Bernoulli PMF

P(X=x) = p^x (1-p)^{1-x} \quad \text{for } x \in \{0, 1\}

Variables:

$X$ = Bernoulli random variable

$p$ = probability of success ( $0 \le p \le 1$ )

$x$ = outcome (0 or 1)

When to use: For a single trial with binary outcome.

Properties:

Mean: $E[X] = p$

Variance: $Var(X) = p(1-p)$

---

1.2 Binomial Distribution

The Binomial distribution models the number of successes in a fixed number of independent Bernoulli trials.

📐 Binomial PMF

P(X=k) = \binom{n}{k} p^k (1-p)^{n-k} \quad \text{for } k \in \{0, 1, \dots, n\}

Variables:

$X$ = Binomial random variable

$n$ = number of trials

$k$ = number of successes

$p$ = probability of success in a single trial

$\binom{n}{k} = \frac{n!}{k!(n-k)!}$ = binomial coefficient

When to use: When counting the number of successes in a fixed number of independent trials, each with the same probability of success.

Properties:

Mean: $E[X] = np$

Variance: $Var(X) = np(1-p)$

Worked Example:

Problem: A fair coin is tossed 10 times. What is the probability of getting exactly 7 heads?

Solution:

Step 1: Identify parameters for Binomial distribution.
Here, $n=10$ (number of tosses), $k=7$ (number of heads), and $p=0.5$ (probability of heads for a fair coin).

Step 2: Apply the Binomial PMF.

P(X=7) = \binom{10}{7} (0.5)^7 (1-0.5)^{10-7}

Step 3: Calculate the binomial coefficient and simplify.

\binom{10}{7} = \frac{10!}{7!(10-7)!} = \frac{10!}{7!3!} = \frac{10 \times 9 \times 8}{3 \times 2 \times 1} = 10 \times 3 \times 4 = 120

P(X=7) = 120 \times (0.5)^7 \times (0.5)^3

P(X=7) = 120 \times (0.5)^{10}

P(X=7) = 120 \times 0.0009765625

P(X=7) = 0.1171875

Answer: $0.1171875$

---

1.3 Poisson Distribution

The Poisson distribution models the number of events occurring in a fixed interval of time or space, given a constant average rate ( $\lambda$ ) of occurrence and that these events occur independently.

📐 Poisson PMF

P(X=k) = \frac{e^{-\lambda} \lambda^k}{k!} \quad \text{for } k \in \{0, 1, 2, \dots\}

Variables:

$X$ = Poisson random variable

$k$ = number of events

$\lambda$ = average rate of events in the interval ( $\lambda > 0$ )

When to use: For counts of rare events over a specified interval or region.

Properties:

Mean: $E[X] = \lambda$

Variance: $Var(X) = \lambda$

❗
Poisson Approximation to Binomial

When the number of trials $n$ is large ( $n \ge 20$ ) and the probability of success $p$ is small ( $p \le 0.05$ ), the Binomial distribution $B(n, p)$ can be approximated by a Poisson distribution with parameter $\lambda = np$ . This approximation is useful for simplifying calculations when $n$ is large and $p$ is small.

Worked Example:

Problem: A call center receives an average of 5 calls per hour. What is the probability of receiving exactly 3 calls in the next hour?

Solution:

Step 1: Identify the parameter for Poisson distribution.
Here, $\lambda = 5$ (average calls per hour), and $k=3$ (number of calls).

Step 2: Apply the Poisson PMF.

P(X=3) = \frac{e^{-5} 5^3}{3!}

Step 3: Calculate the terms and simplify.

P(X=3) = \frac{0.0067379 \times 125}{6}

P(X=3) = \frac{0.8422375}{6}

P(X=3) = 0.1403729

Answer: $0.1403729$

---

2. Continuous Distributions

Continuous distributions model scenarios where the outcomes can take any value within a range.

2.1 Uniform Distribution

The Uniform distribution assigns equal probability to all values within a specified interval $[a, b]$ .

📐 Uniform PDF

f(x) = \begin{cases} \frac{1}{b-a} & \text{for } a \le x \le b \\ 0 & \text{otherwise} \end{cases}

Variables:

$X$ = Uniform random variable

$a$ = minimum value

$b$ = maximum value

When to use: When all outcomes within an interval are equally likely.

📐 Uniform CDF

F(x) = \begin{cases} 0 & \text{for } x < a \\ \frac{x-a}{b-a} & \text{for } a \le x < b \\ 1 & \text{for } x \ge b \end{cases}

Properties:

Mean: $E[X] = \frac{a+b}{2}$

Variance: $Var(X) = \frac{(b-a)^2}{12}$

Worked Example:

Problem: A random variable $X$ is uniformly distributed between 0 and 10. What is the probability that $X$ is between 3 and 7?

Solution:

Step 1: Identify parameters.
Here, $a=0$ , $b=10$ . We want to find $P(3 < X < 7)$ .

Step 2: Use the PDF or CDF. Using PDF:

P(3 < X < 7) = \int_{3}^{7} f(x) \, dx

P(3 < X < 7) = \int_{3}^{7} \frac{1}{10-0} \, dx

P(3 < X < 7) = \int_{3}^{7} \frac{1}{10} \, dx

Step 3: Evaluate the integral.

P(3 < X < 7) = \left[ \frac{x}{10} \right]_{3}^{7}

P(3 < X < 7) = \frac{7}{10} - \frac{3}{10}

P(3 < X < 7) = \frac{4}{10}

P(3 < X < 7) = 0.4

Answer: $0.4$

---

2.2 Exponential Distribution

The Exponential distribution models the time until an event occurs in a Poisson process, where events occur continuously and independently at a constant average rate. It is memoryless.

📐 Exponential PDF

f(x) = \lambda e^{-\lambda x} \quad \text{for } x \ge 0

Variables:

$X$ = Exponential random variable (time to event)

$\lambda$ = rate parameter (average number of events per unit time, $\lambda > 0$ )

When to use: For modeling waiting times or lifetimes when the rate of occurrence is constant.

📐 Exponential CDF

F(x) = P(X \le x) = 1 - e^{-\lambda x} \quad \text{for } x \ge 0

Properties:

Mean: $E[X] = \frac{1}{\lambda}$

Variance: $Var(X) = \frac{1}{\lambda^2}$

Memoryless Property: $P(X > s+t | X > s) = P(X > t)$ . The future waiting time does not depend on past waiting time.

Worked Example:

Problem: The lifespan of a certain electronic component follows an exponential distribution with a mean lifespan of 5 years. What is the probability that a component will last less than 3 years?

Solution:

Step 1: Determine the rate parameter $\lambda$ .
Given mean $E[X] = 5$ years. For exponential distribution, $E[X] = 1/\lambda$ .

5 = \frac{1}{\lambda}

\lambda = \frac{1}{5} = 0.2

Step 2: Use the CDF to find $P(X < 3)$ .

P(X < 3) = F(3) = 1 - e^{-\lambda \times 3}

P(X < 3) = 1 - e^{-0.2 \times 3}

P(X < 3) = 1 - e^{-0.6}

P(X < 3) = 1 - 0.5488

P(X < 3) = 0.4512

Answer: $0.4512$

---

2.3 Normal (Gaussian) Distribution

The Normal distribution is arguably the most important distribution in statistics. It is symmetric, bell-shaped, and characterized by its mean ( $\mu$ ) and standard deviation ( $\sigma$ ). Many natural phenomena follow this distribution, and it is central to the Central Limit Theorem.

📐 Normal PDF

f(x) = \frac{1}{\sigma\sqrt{2\pi}} e^{-\frac{1}{2} \left(\frac{x-\mu}{\sigma}\right)^2} \quad \text{for } -\infty < x < \infty

Variables:

$X$ = Normal random variable

$\mu$ = mean of the distribution

$\sigma$ = standard deviation of the distribution ( $\sigma > 0$ )

When to use: For modeling continuous data that clusters around a central value and is symmetric, or when applying the Central Limit Theorem.

Properties:

Mean: $E[X] = \mu$

Variance: $Var(X) = \sigma^2$

Median = Mode = Mean = $\mu$ .

The curve is symmetric about $\mu$ .

The total area under the curve is 1.

📖
Standard Normal Distribution

A Standard Normal distribution is a Normal distribution with a mean of $\mu=0$ and a standard deviation of $\sigma=1$ . It is typically denoted by $Z \sim N(0,1)$ . Its PDF is:

f(z) = \frac{1}{\sqrt{2\pi}} e^{-\frac{z^2}{2}}

The Cumulative Distribution Function (CDF) for the Standard Normal distribution is denoted by

\Phi(z) = P(Z \le z)

. This value is typically found using a Z-table.

📐 Standardization (Z-score)

Z = \frac{X - \mu}{\sigma}

Variables:

$X$ = value from a Normal distribution

$\mu$ = mean of $X$

$\sigma$ = standard deviation of $X$

$Z$ = corresponding value in the Standard Normal distribution

When to use: To convert any Normal random variable

X

into a Standard Normal random variable

Z

, allowing the use of a standard Z-table to calculate probabilities.

Worked Example (Probability Calculation):

Problem: The height of adult males in a city is normally distributed with a mean of 175 cm and a standard deviation of 7 cm. What is the probability that a randomly selected male is between 168 cm and 182 cm tall? (Use $\Phi(1)=0.8413$ , $\Phi(-1)=0.1587$ )

Solution:

Step 1: Identify parameters and values.
$\mu = 175$ , $\sigma = 7$ . We want to find $P(168 < X < 182)$ .

Step 2: Standardize the values.
For $x_1 = 168$ :

z_1 = \frac{168 - 175}{7} = \frac{-7}{7} = -1

For $x_2 = 182$ :

z_2 = \frac{182 - 175}{7} = \frac{7}{7} = 1

Step 3: Use the Standard Normal CDF ( $\Phi$ ) to find the probability.

P(168 < X < 182) = P(-1 < Z < 1)

P(-1 < Z < 1) = \Phi(1) - \Phi(-1)

P(-1 < Z < 1) = 0.8413 - 0.1587

P(-1 < Z < 1) = 0.6826

Answer: $0.6826$

$\mu$

$x_1$

$x_2$

$P(x_1 < X < x_2)$
$X$

❗ Central Limit Theorem (CLT)

For a sufficiently large sample size $n$ , the distribution of the sample mean $\bar{X}$ of $n$ independent and identically distributed (i.i.d.) random variables, each with mean $\mu$ and finite variance $\sigma^2$ , will be approximately normally distributed, regardless of the original distribution of the individual variables.
The sample mean $\bar{X}$ will have:

Mean: $E[\bar{X}] = \mu$

Standard Deviation: $SD(\bar{X}) = \frac{\sigma}{\sqrt{n}}$ (also called the standard error of the mean)
So, $\bar{X} \sim N\left(\mu, \frac{\sigma^2}{n}\right)$ approximately for large $n$ .

Worked Example (CLT and Parameter Estimation):

Problem: Suppose the individual scores on an exam are normally distributed with an unknown mean $\mu$ and standard deviation $\sigma$ . A candidate fails if they score below 35% and passes with distinction if they score above 80%. In a large group, 16% fail and 2% pass with distinction. Find $\mu$ and $\sigma$ . (Use $\Phi(-1)=0.16$ , $\Phi(2)=0.98$ )

Solution:

Step 1: Set up equations based on the given probabilities and Z-scores.
Let $X$ be the score on the exam.
We are given $P(X < 35) = 0.16$ .
Standardize $X=35$ : $Z_1 = \frac{35 - \mu}{\sigma}$ .
From the Z-table, $P(Z < -1) = 0.16$ , so $Z_1 = -1$ .

\frac{35 - \mu}{\sigma} = -1 \quad (\text{Equation } 1)

We are given $P(X > 80) = 0.02$ .
This means $P(X \le 80) = 1 - 0.02 = 0.98$ .
Standardize $X=80$ : $Z_2 = \frac{80 - \mu}{\sigma}$ .
From the Z-table, $P(Z \le 2) = 0.98$ , so $Z_2 = 2$ .

\frac{80 - \mu}{\sigma} = 2 \quad (\text{Equation } 2)

Step 2: Solve the system of linear equations.
From Equation 1:

35 - \mu = -\sigma

\mu - \sigma = 35 \quad (\text{Equation } 3)

From Equation 2:

80 - \mu = 2\sigma

\mu + 2\sigma = 80 \quad (\text{Equation } 4)

Subtract Equation 3 from Equation 4:

(\mu + 2\sigma) - (\mu - \sigma) = 80 - 35

3\sigma = 45

\sigma = 15

Substitute $\sigma=15$ into Equation 3:

\mu - 15 = 35

\mu = 35 + 15

\mu = 50

Answer: $\mu = 50$ and $\sigma = 15$ .

---

2.4 Gamma Distribution

The Gamma distribution is a versatile continuous distribution that generalizes the exponential distribution. It is often used to model waiting times for multiple events or the sum of independent exponentially distributed random variables.

📐 Gamma PDF

f(x) = \frac{\beta^\alpha}{\Gamma(\alpha)} x^{\alpha-1} e^{-\beta x} \quad \text{for } x > 0

Variables:

$X$ = Gamma random variable

$\alpha$ = shape parameter ( $\alpha > 0$ )

$\beta$ = rate parameter ( $\beta > 0$ )

$\Gamma(\alpha)$ = Gamma function, $\Gamma(z) = \int_{0}^{\infty} t^{z-1} e^{-t} dt$ . For positive integers, $\Gamma(n) = (n-1)!$ .

When to use: For modeling waiting times (e.g., in queuing theory), or when a variable is a sum of several independent exponential variables.

Properties:

Mean: $E[X] = \frac{\alpha}{\beta}$

Variance: $Var(X) = \frac{\alpha}{\beta^2}$

If $\alpha=1$ , the Gamma distribution reduces to the Exponential distribution with rate $\beta$ .

Worked Example:

Problem: The lifespan of a device follows a Gamma distribution. Historical data suggests the mean lifespan is 6 years and the variance is 12 years $^2$ . Find the parameters $\alpha$ and $\beta$ of this distribution.

Solution:

Step 1: Write down the equations for mean and variance in terms of $\alpha$ and $\beta$ .
$E[X] = \frac{\alpha}{\beta} = 6$
$Var(X) = \frac{\alpha}{\beta^2} = 12$

Step 2: Solve the system of equations for $\alpha$ and $\beta$ .
From the mean equation, $\alpha = 6\beta$ .
Substitute this into the variance equation:

\frac{6\beta}{\beta^2} = 12

\frac{6}{\beta} = 12

\beta = \frac{6}{12}

\beta = 0.5

Now substitute $\beta = 0.5$ back into the equation for $\alpha$ :

\alpha = 6 \times 0.5

\alpha = 3

Answer: $\alpha = 3$ and $\beta = 0.5$ .

---

2.5 Beta Distribution

The Beta distribution is a continuous probability distribution defined on the interval $[0, 1]$ . It is particularly useful for modeling probabilities or proportions, as its values are naturally constrained within this range.

📐 Beta PDF

f(x) = \frac{1}{B(\alpha, \beta)} x^{\alpha-1} (1-x)^{\beta-1} \quad \text{for } 0 < x < 1

Variables:

$X$ = Beta random variable

$\alpha$ = shape parameter ( $\alpha > 0$ )

$\beta$ = shape parameter ( $\beta > 0$ )

$B(\alpha, \beta)$ = Beta function, $B(\alpha, \beta) = \frac{\Gamma(\alpha)\Gamma(\beta)}{\Gamma(\alpha+\beta)}$

When to use: For modeling proportions, probabilities, or quantities constrained between 0 and 1 (e.g., success rates, market share).

Properties:

Mean: $E[X] = \frac{\alpha}{\alpha+\beta}$

Variance: $Var(X) = \frac{\alpha\beta}{(\alpha+\beta)^2(\alpha+\beta+1)}$

Mode: $Mode(X) = \frac{\alpha-1}{\alpha+\beta-2}$ (for $\alpha > 1, \beta > 1$ )

Worked Example:

Problem: The proportion of defective items produced by a machine follows a Beta distribution. From historical data, the mean proportion of defective items is 0.25 and the mode is 0.20. Find the parameters $\alpha$ and $\beta$ of this distribution.

Solution:

Step 1: Write down the equations for mean and mode in terms of $\alpha$ and $\beta$ .
$E[X] = \frac{\alpha}{\alpha+\beta} = 0.25$
$Mode(X) = \frac{\alpha-1}{\alpha+\beta-2} = 0.20$

Step 2: Solve the system of equations.
From the mean equation:

\frac{\alpha}{\alpha+\beta} = 0.25

\alpha = 0.25(\alpha+\beta)

\alpha = 0.25\alpha + 0.25\beta

0.75\alpha = 0.25\beta

\beta = 3\alpha \quad (\text{Equation } 1)

Substitute $\beta = 3\alpha$ into the mode equation:

\frac{\alpha-1}{\alpha+3\alpha-2} = 0.20

\frac{\alpha-1}{4\alpha-2} = 0.20

\alpha-1 = 0.20(4\alpha-2)

\alpha-1 = 0.8\alpha - 0.4

\alpha - 0.8\alpha = 1 - 0.4

0.2\alpha = 0.6

\alpha = \frac{0.6}{0.2}

\alpha = 3

Substitute $\alpha=3$ back into Equation 1:

\beta = 3 \times 3

\beta = 9

Answer: $\alpha = 3$ and $\beta = 9$ .

---

Problem-Solving Strategies

💡 CMI Strategy

Identify the Distribution: Carefully read the problem statement to determine which standard distribution best models the scenario. Look for keywords (e.g., "number of successes in $n$ trials" $\to$ Binomial; "average rate of events" $\to$ Poisson/Exponential; "mean and standard deviation" $\to$ Normal; "proportion" $\to$ Beta).

Extract Parameters: Identify all given parameters ( $\mu, \sigma, n, p, \lambda, \alpha, \beta$ ) and what you need to find.

Standardize for Normal: If dealing with a Normal distribution, always standardize the variable to a Z-score to use the standard normal table/CDF.

Use CDF for Range Probabilities: For continuous distributions, $P(a < X < b) = F(b) - F(a)$ . For Normal, this becomes $\Phi(Z_b) - \Phi(Z_a)$ . Remember $P(X>x) = 1 - P(X \le x)$ .

Parameter Estimation: If mean/variance/mode are given, set up simultaneous equations to solve for the distribution's parameters ( $\alpha, \beta, \mu, \sigma$ ).

Approximations: Recall when Poisson can approximate Binomial ( $n$ large, $p$ small, $\lambda=np$ ).

---

Common Mistakes

⚠️ Avoid These Errors

❌ Confusing PMF and PDF: Using integration for discrete distributions or summing for continuous distributions.

✅ Correct: Use PMF for discrete (summation for multiple values), PDF for continuous (integration for ranges).

❌ Incorrect Z-score Calculation: Forgetting to subtract the mean or divide by the standard deviation when standardizing a normal variable.

✅ Correct: Always use

Z = (X - \mu) / \sigma

. For sample mean, use

Z = (\bar{X} - \mu) / (\sigma/\sqrt{n})

❌ Misinterpreting Z-table Values: Directly using $\Phi(z)$ for $P(Z > z)$ .

✅ Correct:

P(Z > z) = 1 - \Phi(z)

. Use symmetry

P(Z < -z) = P(Z > z) = 1 - \Phi(z)

❌ Ignoring Distribution Domain: Calculating probabilities outside the valid range (e.g., negative time for Exponential, values outside $[0,1]$ for Beta).

✅ Correct: Always respect the domain of the random variable.

❌ Parameter Estimation Errors: Incorrectly setting up equations for mean/variance/mode for a specific distribution.

✅ Correct: Memorize or correctly derive the formulas for mean, variance, and mode for each distribution.

❌ Forgetting $n$ in CLT: When dealing with sample means, failing to divide $\sigma$ by $\sqrt{n}$ for the standard error. ✅ Correct: The standard deviation of the sample mean is $\sigma_{\bar{X}} = \sigma/\sqrt{n}$ .

---

Practice Questions

:::question type="MCQ" question="A call center receives calls at an average rate of 20 calls per hour. What is the probability that exactly 15 calls are received in a 30-minute interval?" options=[" $\frac{e^{-10} 10^{15}}{15!}$ "," $\frac{e^{-20} 20^{15}}{15!}$ "," $\frac{e^{-10} 15^{10}}{10!}$ "," $\frac{e^{-20} 15^{20}}{20!}$ "] answer="A" hint="Adjust the average rate to match the given time interval before applying the Poisson PMF." solution="Step 1: Determine the average rate for the given interval.
The average rate is 20 calls per hour. For a 30-minute interval (0.5 hours), the average rate $\lambda$ will be:

\lambda = 20 \text{ calls/hour} \times 0.5 \text{ hours} = 10 \text{ calls}

Step 2: Apply the Poisson Probability Mass Function (PMF).
The Poisson PMF is $P(X=k) = \frac{e^{-\lambda} \lambda^k}{k!}$ .
Here, $\lambda = 10$ and $k = 15$ .

P(X=15) = \frac{e^{-10} 10^{15}}{15!}

The correct option is A." :::

:::question type="NAT" question="The scores on a standardized test are normally distributed with a mean of 600 and a standard deviation of 100. If a student scores 750, what is their Z-score? Report to two decimal places." answer="1.50" hint="Use the Z-score formula $Z = (X - \mu) / \sigma$ ." solution="Step 1: Identify the given values.
$X = 750$ (student's score)
$\mu = 600$ (mean score)
$\sigma = 100$ (standard deviation)

Step 2: Apply the Z-score formula.

Z = \frac{X - \mu}{\sigma}

Z = \frac{750 - 600}{100}

Z = \frac{150}{100}

Z = 1.5

The Z-score is 1.50."
:::

:::question type="MSQ" question="A quality control process inspects batches of 50 items. Each item has a 1% chance of being defective, independently. Which of the following statements are correct?" options=["The number of defective items in a batch follows a Binomial distribution.","The probability of finding exactly 1 defective item in a batch is $\binom{50}{1} (0.01)^1 (0.99)^{49}$ .","The mean number of defective items in a batch is 0.5.","The Poisson approximation to this distribution would use $\lambda = 50$ ."] answer="A,B,C" hint="Identify the distribution type and its parameters. Check the conditions for Poisson approximation." solution="Statement A: The number of defective items in a fixed number of independent trials (50 items) with a constant probability of success (1% defect) follows a Binomial distribution. This is correct.

Statement B: For a Binomial distribution $B(n,p)$ , $P(X=k) = \binom{n}{k} p^k (1-p)^{n-k}$ . Here $n=50$ , $p=0.01$ , $k=1$ . So $P(X=1) = \binom{50}{1} (0.01)^1 (0.99)^{49}$ . This is correct.

Statement C: The mean of a Binomial distribution is $E[X] = np$ . Here $E[X] = 50 \times 0.01 = 0.5$ . This is correct.

Statement D: The Poisson approximation to a Binomial distribution uses $\lambda = np$ . Here $\lambda = 50 \times 0.01 = 0.5$ . The statement says $\lambda=50$ , which is incorrect. This is incorrect.

Therefore, statements A, B, and C are correct."
:::

:::question type="SUB" question="The time (in minutes) a customer spends waiting for a service representative follows an exponential distribution. If 80% of customers wait longer than 5 minutes, what is the average waiting time (in minutes)? Report to two decimal places." answer="22.40" hint="Use the CDF of the exponential distribution and its relationship with the mean." solution="Step 1: Set up the probability statement using the Exponential CDF.
Let $X$ be the waiting time. $X \sim \operatorname{Exp}(\lambda)$ .
We are given $P(X > 5) = 0.80$ .
We know $P(X > x) = e^{-\lambda x}$ .

So, $e^{-5\lambda} = 0.80$ .

Step 2: Solve for $\lambda$ .
Take the natural logarithm of both sides:

\ln(e^{-5\lambda}) = \ln(0.80)

-5\lambda = \ln(0.80)

-5\lambda \approx -0.22314355

\lambda \approx \frac{-0.22314355}{-5}

\lambda \approx 0.04462871

Step 3: Calculate the average waiting time (mean).
For an Exponential distribution, the mean is $E[X] = 1/\lambda$ .

E[X] = \frac{1}{0.04462871}

E[X] \approx 22.4045

Rounding to two decimal places, $E[X] \approx 22.40$ .
The average waiting time is 22.40 minutes."
:::

---

Chapter Summary

📖 Random Variables and Distributions - Key Takeaways

To excel in CMI, a deep understanding of Random Variables and Distributions is fundamental. Here are the most crucial points you must internalize:

Random Variables (RVs): Understand the formal definition of a random variable as a function mapping outcomes from a sample space to real numbers. Differentiate clearly between discrete and continuous random variables and their respective characteristics.

Probability Mass Function (PMF), Probability Density Function (PDF), and Cumulative Distribution Function (CDF):

Know the definitions and properties of PMF (for discrete RVs), PDF (for continuous RVs), and CDF (for both).

P(a \le X \le b) = F_X(b) - F_X(a)

Understand the relationship between PDF/PMF and CDF: $F_X(x) = \sum_{t \le x} P(X=t)$ or $F_X(x) = \int_{-\infty}^{x} f_X(t) dt$ , and $f_X(x) = F_X'(x)$ .
Expectation and Variance:

E[X]

Var[X]

Crucially, understand and apply their properties: Linearity of Expectation ( $E[aX+bY] = aE[X]+bE[Y]$ ) and properties of variance ( $Var[aX+b] = a^2Var[X]$ ).

E[g(X)]

Standard Distributions: Be thoroughly familiar with the key properties (parameters, PMF/PDF, mean, variance, typical shape) of the most common distributions:

Discrete: Bernoulli, Binomial, Poisson, Geometric.

Continuous:

Recognize scenarios where each distribution is applicable.
Moment Generating Functions (MGFs):

M_X(t) = E[e^{tX}]

Know how to use MGFs to find moments ( $E[X^n] = M_X^{(n)}(0)$ ) and, more importantly, to uniquely identify the distribution of an RV.

Transformations of Random Variables: Master techniques for finding the PMF/PDF of a new random variable $Y = g(X)$ given the distribution of $X$ . This often involves using the CDF method or the change-of-variable formula for continuous RVs.

---

Chapter Review Questions

:::question type="MCQ" question="Let $X$ be a continuous random variable with probability density function (PDF):

f_X(x) = \begin{cases} c(1-x^2) & \text{for } -1 \le x \le 1 \\ 0 & \text{otherwise} \end{cases}

Which of the following statements is TRUE?
(I) The constant

c = \frac{3}{4}

.
(II)

P(X > 0) = \frac{1}{2}

.
(III)

E[X] = 0

.
(IV)

Var[X] = \frac{2}{5}

.
" options=["A) (I) and (II) only" "B) (I), (II) and (III) only" "C) (I), (II), (III) and (IV)" "D) (I), (III) and (IV) only"] answer="B" hint="Remember the properties of a PDF: it must integrate to 1. Also, leverage symmetry to simplify calculations for expectation and probability." solution="Let's analyze each statement:

(I) The constant $c = \frac{3}{4}$ :
For $f_X(x)$ to be a valid PDF, $\int_{-\infty}^{\infty} f_X(x) dx = 1$ .

\int_{-1}^{1} c(1-x^2) dx = 1

c \left[ x - \frac{x^3}{3} \right]_{-1}^{1} = 1

c \left[ \left(1 - \frac{1}{3}\right) - \left(-1 - \frac{-1}{3}\right) \right] = 1

c \left[ \left(\frac{2}{3}\right) - \left(-\frac{2}{3}\right) \right] = 1

c \left[ \frac{4}{3} \right] = 1 \implies c = \frac{3}{4}

So, statement (I) is TRUE.

(II) $P(X > 0) = \frac{1}{2}$ :

P(X > 0) = \int_{0}^{1} \frac{3}{4}(1-x^2) dx

= \frac{3}{4} \left[ x - \frac{x^3}{3} \right]_{0}^{1}

= \frac{3}{4} \left[ \left(1 - \frac{1}{3}\right) - (0 - 0) \right]

= \frac{3}{4} \left[ \frac{2}{3} \right] = \frac{1}{2}

Alternatively, since

f_X(x)

is symmetric about

x=0

(

f_X(x) = f_X(-x)

P(X > 0) = P(X < 0) = \frac{1}{2}

.
So, statement (II) is TRUE.

(III) $E[X] = 0$ :

E[X] = \int_{-1}^{1} x \cdot \frac{3}{4}(1-x^2) dx = \frac{3}{4} \int_{-1}^{1} (x-x^3) dx

Since

(x-x^3)

is an odd function and the integration interval is symmetric about 0, the integral is 0.
So, statement (III) is TRUE.

(IV) $Var[X] = \frac{2}{5}$ :
$Var[X] = E[X^2] - (E[X])^2$ . Since $E[X]=0$ , $Var[X] = E[X^2]$ .

E[X^2] = \int_{-1}^{1} x^2 \cdot \frac{3}{4}(1-x^2) dx = \frac{3}{4} \int_{-1}^{1} (x^2-x^4) dx

Since

(x^2-x^4)

is an even function, we can write:

E[X^2] = \frac{3}{4} \cdot 2 \int_{0}^{1} (x^2-x^4) dx

= \frac{3}{2} \left[ \frac{x^3}{3} - \frac{x^5}{5} \right]_{0}^{1}

= \frac{3}{2} \left[ \left(\frac{1}{3} - \frac{1}{5}\right) - (0) \right]

= \frac{3}{2} \left[ \frac{5-3}{15} \right] = \frac{3}{2} \left[ \frac{2}{15} \right] = \frac{1}{5}

So,

Var[X] = \frac{1}{5}

. Therefore, statement (IV)

Var[X] = \frac{2}{5}

is FALSE.

Based on the analysis, statements (I), (II), and (III) are TRUE. The correct option is B."
:::

:::question type="NAT" question="A fair six-sided die is rolled repeatedly. Let $X$ be the number of rolls until a '6' appears for the first time. Let $Y$ be the number of rolls until a '6' appears for the second time. Find $E[Y|X=1]$ . (Enter your answer as a plain number)." answer="7" hint="Consider the nature of the geometric distribution and the memoryless property. If the first '6' occurs on the 1st roll, how many additional rolls are needed for the second '6'?" solution="Let $X$ be the number of rolls until the first '6' appears. $X$ follows a Geometric distribution with $p = 1/6$ .
Let $Y$ be the number of rolls until the second '6' appears.

We are asked to find $E[Y|X=1]$ .
Given that the first '6' appeared on the 1st roll, this means roll 1 was a '6'.
Now, we need to find the expected number of additional rolls from roll 2 onwards until the second '6' appears.
Let $Z$ be the number of additional rolls needed after the 1st roll for the second '6' to appear.
Since die rolls are independent and the probability of rolling a '6' remains $p=1/6$ for each subsequent roll, $Z$ also follows a Geometric distribution with parameter $p=1/6$ .
The expected value of a Geometric distribution (number of trials until the first success) is $1/p$ .
So, $E[Z] = 1/p = 1/(1/6) = 6$ .

The total number of rolls until the second '6' appears, $Y$ , can be expressed as $Y = X + Z$ .
Therefore, $E[Y|X=1] = E[1 + Z | X=1]$ .
Since $1$ is a fixed value in the conditional expectation, and $Z$ is independent of $X=1$ (due to the memoryless property of the process), we have:
$E[Y|X=1] = 1 + E[Z]$
$E[Y|X=1] = 1 + 6 = 7$ .

The expected value is 7."
:::

:::question type="MCQ" question="Let $X$ be a random variable with Moment Generating Function (MGF) $M_X(t) = \frac{e^{2t}}{1-3t}$ for $t < \frac{1}{3}$ .
Which of the following is the variance of $X$ , $Var[X]$ ?
" options=["A) 3" "B) 9" "C) 11" "D) 13"] answer="B" hint="Recall that $M_X'(0) = E[X]$ and $M_X''(0) = E[X^2]$ . Then use $Var[X] = E[X^2] - (E[X])^2$ ." solution="The Moment Generating Function (MGF) is given by $M_X(t) = e^{2t}(1-3t)^{-1}$ .

First, we find $E[X] = M_X'(0)$ .
Using the product rule $(uv)' = u'v + uv'$ :
Let $u = e^{2t}$ and $v = (1-3t)^{-1}$ .
Then $u' = 2e^{2t}$ and $v' = -1(1-3t)^{-2}(-3) = 3(1-3t)^{-2}$ .

So, $M_X'(t) = (2e^{2t})(1-3t)^{-1} + (e^{2t})(3(1-3t)^{-2})$ .
Now, evaluate at $t=0$ :
$M_X'(0) = (2e^0)(1-0)^{-1} + (e^0)(3(1-0)^{-2})$
$M_X'(0) = 2(1)(1) + 1(3)(1) = 2 + 3 = 5$ .
So, $E[X] = 5$ .

Next, we find $E[X^2] = M_X''(0)$ .
We need to differentiate $M_X'(t) = 2e^{2t}(1-3t)^{-1} + 3e^{2t}(1-3t)^{-2}$ .
Let $M_X'(t) = A(t) + B(t)$ , where $A(t) = 2e^{2t}(1-3t)^{-1}$ and $B(t) = 3e^{2t}(1-3t)^{-2}$ .

For $A(t)$ :
$A'(t) = (2e^{2t} \cdot 2)(1-3t)^{-1} + (2e^{2t})(-1(1-3t)^{-2}(-3))$
$A'(t) = 4e^{2t}(1-3t)^{-1} + 6e^{2t}(1-3t)^{-2}$ .
At $t=0$ : $A'(0) = 4(1)(1) + 6(1)(1) = 4+6=10$ .

For $B(t)$ :
$B'(t) = (3e^{2t} \cdot 2)(1-3t)^{-2} + (3e^{2t})(-2(1-3t)^{-3}(-3))$
$B'(t) = 6e^{2t}(1-3t)^{-2} + 18e^{2t}(1-3t)^{-3}$ .
At $t=0$ : $B'(0) = 6(1)(1) + 18(1)(1) = 6+18=24$ .

So, $M_X''(0) = A'(0) + B'(0) = 10 + 24 = 34$ .
Thus, $E[X^2] = 34$ .

Finally, $Var[X] = E[X^2] - (E[X])^2$ .
$Var[X] = 34 - (5)^2 = 34 - 25 = 9$ .

The correct option is B.

Alternatively, recognize the MGF.
The MGF of an Exponential distribution with rate $\lambda$ is $\frac{\lambda}{\lambda-t}$ .
The MGF of $X_0 \sim \operatorname{Exp}(\lambda)$ is $M_{X_0}(t) = \frac{\lambda}{\lambda-t}$ .
The MGF of $X_1 \sim \operatorname{Exp}(1/3)$ is $\frac{1/3}{1/3-t} = \frac{1}{1-3t}$ .
The MGF of $Y = X_1 + 2$ is $E[e^{t(X_1+2)}] = E[e^{tX_1} e^{2t}] = e^{2t} E[e^{tX_1}] = e^{2t} \frac{1}{1-3t}$ .
So $X$ is distributed as an $\operatorname{Exp}(1/3)$ random variable shifted by 2.
Let $X_1 \sim \operatorname{Exp}(1/3)$ . Then $E[X_1] = 1/\lambda = 3$ and $Var[X_1] = 1/\lambda^2 = 9$ .
$X = X_1 + 2$ .
$E[X] = E[X_1+2] = E[X_1] + 2 = 3+2 = 5$ .
$Var[X] = Var[X_1+2] = Var[X_1] = 9$ .
The variance of $X$ is 9."
:::

:::question type="NAT" question="Let $X$ be a continuous random variable uniformly distributed on the interval $(0, 2)$ . Define a new random variable $Y = X^2$ . Find $E[Y]$ . (Enter your answer as a plain number in decimal form, rounded to two decimal places)." answer="1.33" hint="First, determine the PDF of $X$ . Then, use the Law of the Unconscious Statistician (LOTUS) to compute $E[Y]$ without explicitly finding the PDF of $Y$ ." solution="The random variable $X$ is uniformly distributed on $(0, 2)$ .
Its PDF is given by:

f_X(x) = \begin{cases} \frac{1}{2-0} = \frac{1}{2} & \text{for } 0 < x < 2 \\ 0 & \text{otherwise} \end{cases}

We want to find

E[Y]

where

Y = X^2

.
Using the Law of the Unconscious Statistician (LOTUS),

E[g(X)] = \int_{-\infty}^{\infty} g(x) f_X(x) dx

.
Here,

g(x) = x^2

E[Y] = E[X^2] = \int_{0}^{2} x^2 \cdot \frac{1}{2} dx

= \frac{1}{2} \int_{0}^{2} x^2 dx

= \frac{1}{2} \left[ \frac{x^3}{3} \right]_{0}^{2}

= \frac{1}{2} \left[ \frac{2^3}{3} - \frac{0^3}{3} \right]

= \frac{1}{2} \left[ \frac{8}{3} \right]

= \frac{4}{3}

Now, we need to calculate the numerical value and round it to two decimal places.

E[Y] = \frac{4}{3} \approx 1.3333...

Rounded to two decimal places,

E[Y] \approx 1.33

.
The expected value of

Y

is 1.33."
:::

---

What's Next?

💡 Continue Your CMI Journey

You've mastered Random Variables and Distributions! This chapter is the bedrock for much of advanced probability theory and statistics.

Key connections:
Building on Previous Learning: This chapter heavily relies on your understanding of basic probability (sample spaces, events, conditional probability, independence) and calculus (integration, differentiation) for continuous random variables. A solid grasp of set theory is also beneficial for defining events and sample spaces.
Paving the Way for Future Chapters: The concepts learned here are foundational for:
Joint Distributions: Understanding how multiple random variables interact.
Conditional Expectation: A deeper dive into expected values given certain conditions.
Limit Theorems: The Law of Large Numbers (LLN) and the Central Limit Theorem (CLT), which are crucial for statistical inference, build directly on the properties of expectation and variance of random variables.
Statistical Inference: Chapters on estimation (e.g., maximum likelihood estimation) and hypothesis testing rely on the distributions of sample statistics.
* Stochastic Processes: Many advanced topics in probability and applied mathematics begin with discrete-time or continuous-time random variables.

Keep practicing problems that combine these concepts, as CMI questions often integrate knowledge across multiple topics!

Random Variables and Distributions

Overview

Random Variables (RVs) are the mathematical bridge between a random experiment's outcomes and the real numbers. In Data Science, every data point—from a user's click-through rate to the error margin in a neural network—is a realization of a random variable. ---

Learning Objectives

❗ By the End of This Chapter

After studying this chapter, you will be able to:

Define and classify discrete and continuous random variables.

Construct and interpret Probability Mass Functions ( $PMF$ ), Probability Density Functions ( $PDF$ ), and Cumulative Distribution Functions ( $CDF$ ).

Compute expectation $E[X]$ , variance $Var[X]$ , and understand their properties.

Model real-world phenomena using standard distributions (Binomial, Poisson, Normal).

---

Part 1: Foundations of Random Variables

📖 Random Variable (RV)

A random variable $X$ is a function that maps each outcome $\omega$ in the sample space $\Omega$ to a unique real number.
$X: \Omega \to \mathbb{R}$
The set of all possible values $X$ can take is its support, denoted by $R_X$ .

Types of Random Variables

Discrete RV: $R_X$ is finite or countably infinite (e.g., counting heads in coin flips).

Continuous RV: $R_X$ is an uncountable interval (e.g., measuring height or time).

---

Probability Characterization

1. Probability Mass Function (PMF)

For a discrete RV, the PMF

p_X(x)

defines the probability of

X

taking a specific value

x

p_X(x) \ge 0

for all

x

\sum_{x \in R_X} p_X(x) = 1

2. Cumulative Distribution Function (CDF)

The CDF

F_X(x)

represents the probability that

X

takes a value less than or equal to

x

F_X(x) = P(X \le x)

For discrete RVs:

F_X(x) = \sum_{x_i \le x} p_X(x_i)

3. Probability Density Function (PDF)

For a continuous RV, the probability of

X

falling in an interval

[a, b]

is the area under the PDF

f_X(x)

P(a \le X \le b) = \int_{a}^{b} f_X(x) \, dx

Note:

P(X=x) = 0

for any specific value in a continuous distribution. ---

Functions of Random Variables

Y = g(X)

, then

Y

is also a random variable. For the discrete case, the PMF of

Y

is:

p_Y(y) = P(Y=y) = \sum_{x: g(x)=y} p_X(x)

> Sign Trap: If

Y = g(X)

and

g

is not a constant,

X

and

Y

are dependent. Knowing

X

perfectly determines

Y

Unified Classification of Random Variables

📖 Random Variables (Merged)

A Random Variable (RV) is a numerical mapping $X: \Omega \to \mathbb{R}$ .

Discrete RV: Possible values are countable ( $x_1, x_2, \dots$ ). Behavior defined by a Probability Mass Function (PMF).

Continuous RV: Possible values form an interval. Behavior defined by a Probability Density Function (PDF).

---

Comparative Probability Measures

To characterize an RV, we use specific functions that describe how probability is distributed across its range.

📐 PMF, PDF, and CDF Characteristics

| Feature | Discrete (PMF) | Continuous (PDF) |
| :--- | :--- | :--- |
| Probability | $p_X(x) = P(X=x)$ | $P(a \le X \le b) = \int_a^b f_X(x) \, dx$ |
| Sum/Integral | $\sum_{x} p_X(x) = 1$ | $\int_{-\infty}^{\infty} f_X(x) \, dx = 1$ |
| CDF Relation | $F_X(x) = \sum_{x_i \le x} p_X(x_i)$ | $F_X(x) = \int_{-\infty}^{x} f_X(u) \, du$ |
| Point Mass | $P(X=x) \in [0,1]$ | $P(X=x) = 0$ |

---

Moments: Expectation & Variance (Consolidated)

Moments provide a summary of the distribution's location and spread.

📐 Expectation and Variance Rules

Expected Value $E[X]$ : The "theoretical average."

- Discrete:

E[X] = \sum x \cdot p_X(x)

- Continuous:

E[X] = \int x \cdot f_X(x) \, dx

Variance $Var[X]$ : The "spread" around the mean.

Var[X] = E[X^2] - (E[X])^2

Standard Deviation: $\sigma_X = \sqrt{Var[X]}$

Operational Properties

Linearity: $E[aX + bY] = aE[X] + bE[Y]$ (Always true).

Independence: $E[XY] = E[X]E[Y]$ (Only if $X, Y$ are independent).

Shift/Scale: $Var[aX + b] = a^2 Var[X]$ .

Advanced Analysis of Random Variables

1. Functional Transformations: $Y = g(X)$

In Data Science, we rarely analyze raw variables; we analyze transformed data (log-transforms, scaling, etc.). If

X

is a discrete RV and

Y = g(X)

, the probability

P(Y=y)

is the sum of probabilities of all

x

in the domain that map to

y

. > Conceptual Depth: If

g

is a many-to-one function (like

x^2

x \pmod n

), multiple

x

values "collapse" into a single

y

, potentially changing the shape of the distribution significantly. ---

2. Independence and Correlation

Two random variables

X

and

Y

are independent if and only if:

P(X=x, Y=y) = P(X=x) \cdot P(Y=y)

for all

x, y

❗ The Transformation Trap

If $Y = g(X)$ and $g$ is not a constant, $X$ and $Y$ are strictly dependent. Knowing $X$ gives you $100\%$ information about $Y$ . However, $E[XY]$ may still equal $E[X]E[Y]$ in specific symmetric cases, but this does not imply independence.

---

3. Advanced Problem Set (MSQ & SUB)

:::question type="MSQ" question="Let

X

be a discrete random variable representing the outcome of a fair six-sided die roll. Let

Y = X \pmod 3

. Which of the following statements are correct?" options=["The range of

Y

\{0, 1, 2\}

","

P(Y=1) = P(Y=2) = P(Y=0) = \dfrac{1}{3}

","

E[Y] = 1

","

X

and

Y

are independent","

Var[Y] = \dfrac{2}{3}

"] answer="A,B,C,E" hint="Find the mapping of each die face to its remainder when divided by 3." solution="

Range:

X \in \{1, 2, 3, 4, 5, 6\}

Y

values are

\{1, 2, 0, 1, 2, 0\}

. Range is

\{0, 1, 2\}

. (A is true)

Probabilities: Each

Y

value appears exactly twice out of 6 outcomes.

P(Y=y) = 2/6 = 1/3

. (B is true)

Expectation:

E[Y] = 0(1/3) + 1(1/3) + 2(1/3) = 1

. (C is true)

Independence: Knowing

X=1

forces

Y=1

. They are dependent. (D is false)

Variance:

E[Y^2] = 0^2(1/3) + 1^2(1/3) + 2^2(1/3) = 5/3

Var[Y] = 5/3 - (1)^2 = 2/3

. (E is true)

Correct:

\boxed{A,B,C,E}

." ::: :::question type="MSQ" question="Let

X

be a random variable with

E[X] = \mu

and

Var[X] = \sigma^2

. Let

Y = aX + b

. Which properties must hold for

Y

?" options=["

E[Y] = a\mu + b

","

Var[Y] = a^2\sigma^2

","

E[Y^2] = a^2 E[X^2] + 2ab\mu + b^2

","

Var[Y] = a^2\sigma^2 + b

"] answer="A,B,C" hint="Use linearity of expectation and the definition of variance for scaled/shifted variables." solution="

A: By linearity of expectation, $E[aX+b] = aE[X] + b$ .

B: $Var[aX+b] = a^2 Var[X] = a^2 \sigma^2$ . (Note: $b$ does not affect spread).

C: $E[(aX+b)^2] = E[a^2X^2 + 2abX + b^2] = a^2 E[X^2] + 2ab E[X] + b^2$ .

D: Incorrect; constants added to $X$ do not add to the variance.

Correct:

\boxed{A,B,C}

." ::: :::question type="SUB" question="Prove that if

X

is a discrete random variable and

Y = X^2

, then

X

and

Y

are independent if and only if

X

is a constant random variable (takes only one value with probability 1)." answer="Proof provided via definition of independence." hint="Apply the definition

P(X=x, Y=y) = P(X=x)P(Y=y)

and consider the case where

X

takes at least two distinct values." solution="

If $X$ is constant:

X=c

with prob 1. Then

Y=c^2

with prob 1.

P(X=c, Y=c^2) = 1 = P(X=c)P(Y=c^2)

. They are independent.

If $X$ is not constant: Assume

X

takes at least two values

x_1, x_2

such that

x_1^2 \ne x_2^2

with non-zero probability.

If we know

X=x_1

, then

Y

must be

x_1^2

P(Y=x_1^2 | X=x_1) = 1

For independence, we require

P(Y=x_1^2 | X=x_1) = P(Y=x_1^2)

But

P(Y=x_1^2) = P(X=x_1) + P(X=-x_1)

, which is generally

< 1

X

can take other values (like

x_2

Thus, the information of

X

changes the distribution of

Y

, proving dependence.

Therefore, independence only holds if the 'function' provides no new information, which occurs only if

X

\boxed{\text{constant}}

." :::

Advanced Analysis of Random Variables

1. Functional Transformations: $Y = g(X)$

In Data Science, we rarely analyze raw variables; we analyze transformed data (log-transforms, scaling, etc.). If

X

is a discrete RV and

Y = g(X)

, the probability

P(Y=y)

is the sum of probabilities of all

x

in the domain that map to

y

. > Conceptual Depth: If

g

is a many-to-one function (like

x^2

x \pmod n

), multiple

x

values "collapse" into a single

y

, potentially changing the shape of the distribution significantly. ---

2. Independence and Correlation

Two random variables

X

and

Y

are independent if and only if:

P(X=x, Y=y) = P(X=x) \cdot P(Y=y)

for all

x, y

❗ The Transformation Trap

---

3. Advanced Problem Set (MSQ & SUB)

:::question type="MSQ" question="Let

X

be a discrete random variable representing the outcome of a fair six-sided die roll. Let

Y = X \pmod 3

. Which of the following statements are correct?" options=["The range of

Y

\{0, 1, 2\}

","

P(Y=1) = P(Y=2) = P(Y=0) = \dfrac{1}{3}

","

E[Y] = 1

","

X

and

Y

are independent","

Var[Y] = \dfrac{2}{3}

"] answer="A,B,C,E" hint="Find the mapping of each die face to its remainder when divided by 3." solution="

Range:

X \in \{1, 2, 3, 4, 5, 6\}

Y

values are

\{1, 2, 0, 1, 2, 0\}

. Range is

\{0, 1, 2\}

. (A is true)

Probabilities: Each

Y

value appears exactly twice out of 6 outcomes.

P(Y=y) = 2/6 = 1/3

. (B is true)

Expectation:

E[Y] = 0(1/3) + 1(1/3) + 2(1/3) = 1

. (C is true)

Independence: Knowing

X=1

forces

Y=1

. They are dependent. (D is false)

Variance:

E[Y^2] = 0^2(1/3) + 1^2(1/3) + 2^2(1/3) = 5/3

Var[Y] = 5/3 - (1)^2 = 2/3

. (E is true)

Correct:

\boxed{A,B,C,E}

." ::: :::question type="MSQ" question="Let

X

be a random variable with

E[X] = \mu

and

Var[X] = \sigma^2

. Let

Y = aX + b

. Which properties must hold for

Y

?" options=["

E[Y] = a\mu + b

","

Var[Y] = a^2\sigma^2

","

E[Y^2] = a^2 E[X^2] + 2ab\mu + b^2

","

Var[Y] = a^2\sigma^2 + b

"] answer="A,B,C" hint="Use linearity of expectation and the definition of variance for scaled/shifted variables." solution="

A: By linearity of expectation, $E[aX+b] = aE[X] + b$ .

B: $Var[aX+b] = a^2 Var[X] = a^2 \sigma^2$ . (Note: $b$ does not affect spread).

C: $E[(aX+b)^2] = E[a^2X^2 + 2abX + b^2] = a^2 E[X^2] + 2ab E[X] + b^2$ .

D: Incorrect; constants added to $X$ do not add to the variance.

Correct:

\boxed{A,B,C}

." ::: :::question type="SUB" question="Prove that if

X

is a discrete random variable and

Y = X^2

, then

X

and

Y

are independent if and only if

X

is a constant random variable (takes only one value with probability 1)." answer="Proof provided via definition of independence." hint="Apply the definition

P(X=x, Y=y) = P(X=x)P(Y=y)

and consider the case where

X

takes at least two distinct values." solution="

If $X$ is constant:

X=c

with prob 1. Then

Y=c^2

with prob 1.

P(X=c, Y=c^2) = 1 = P(X=c)P(Y=c^2)

. They are independent.

If $X$ is not constant: Assume

X

takes at least two values

x_1, x_2

such that

x_1^2 \ne x_2^2

with non-zero probability.

If we know

X=x_1

, then

Y

must be

x_1^2

P(Y=x_1^2 | X=x_1) = 1

For independence, we require

P(Y=x_1^2 | X=x_1) = P(Y=x_1^2)

But

P(Y=x_1^2) = P(X=x_1) + P(X=-x_1)

, which is generally

< 1

X

can take other values (like

x_2

Thus, the information of

X

changes the distribution of

Y

, proving dependence.

Therefore, independence only holds if the 'function' provides no new information, which occurs only if

X

\boxed{\text{constant}}

." :::

---

Final Synthesis: Random Variables in Predictive Modeling

The transition from probability theory to Data Science involves treating models as functions of random variables.

1. Risk and Expected Loss

In Machine Learning, we define a Loss Function

L(Y, \hat{Y})

, where

Y

is the true label and

\hat{Y}

is the prediction. Since

Y

is a random variable, the Loss is also a random variable.

Empirical Risk: The average loss over a sample.

True Risk: The expectation of the loss function $E[L(Y, \hat{Y})]$ .

2. The Law of Large Numbers (LLN)

The LLN states that as the number of independent trials

n

increases, the sample mean

\bar{X}_n

converges to the theoretical expectation

E[X]

\qquad P\left(\lim_{n \to \infty} \bar{X}_n = \mu\right) = 1

This justifies why we can use large datasets to estimate the "true" underlying parameters of a distribution.

3. Change of Variables (Conceptual)

When we apply a non-linear transformation to data (like a Sigmoid or ReLU function in Deep Learning), the distribution of the output variable

Y = g(X)

changes.

If $g(X)$ is strictly increasing, the CDF of $Y$ is simply $F_Y(y) = F_X(g^{-1}(y))$ .

If $g(X)$ is non-monotonic, the PDF "stretches" or "compresses" based on the derivative $|g'(x)|$ , a concept known as the Jacobian in higher dimensions.