Continuous Probability Distributions
Overview
Having established the foundations of probability for discrete random variables, we now extend our inquiry to the domain of continuous random variables. Unlike their discrete counterparts, which assume a countable number of distinct values, continuous variables can take on any value within a given range. This conceptual shift necessitates a different mathematical framework for describing probability. We can no longer assign a non-zero probability to a single point; instead, we must consider the probability that a variable falls within a specific interval. This is accomplished through the Probability Density Function (PDF), a central concept that defines the relative likelihood of a variable taking on a particular value.
In this chapter, we shall explore the essential properties of continuous random variables and their distributions. We will begin by defining the Probability Density Function and its counterpart, the Cumulative Distribution Function (CDF), which are the primary tools for analyzing continuous phenomena. Subsequently, we will examine several paramount distributions that are fundamental to both theoretical and applied statistics: the Uniform, Exponential, and Normal distributions. A thorough understanding of these distributions is indispensable for the GATE examination, as they form the basis for modeling a vast array of processes in data science and artificial intelligence, from service times in queuing theory to measurement errors in experimental data. Mastery of the concepts presented herein is critical for solving a significant class of problems encountered in the examination.
---
Chapter Contents
| # | Topic | What You'll Learn |
|---|------------------------------------|-----------------------------------------------------|
| 1 | Probability Density Function (PDF) | Describing probability over a continuous interval |
| 2 | Cumulative Distribution Function (CDF) | Calculating cumulative probability up to a value |
| 3 | Uniform Distribution | Modeling equiprobable outcomes in a range |
| 4 | Exponential Distribution | Modeling the time between independent events |
| 5 | Normal and Standard Normal Distribution | Analyzing the ubiquitous bell-shaped curve |
| 6 | Conditional PDF | Finding probability density given another event |
---
Learning Objectives
After completing this chapter, you will be able to:
- Define the Probability Density Function (PDF) and Cumulative Distribution Function (CDF) for continuous random variables and articulate the relationship between them.
- Calculate probabilities, expected values, and variances for key continuous distributions, namely the Uniform and Exponential distributions.
- Analyze and solve problems involving the Normal distribution by applying its properties and utilizing the Standard Normal distribution for probability computations.
- Formulate and compute conditional probabilities for continuous random variables using the definition of a Conditional PDF.
---
We now turn our attention to the Probability Density Function (PDF)...
## Part 1: Probability Density Function (PDF)
Introduction
In our study of random variables, we have previously encountered discrete random variables, whose probabilities are described by a Probability Mass Function (PMF). We now turn our attention to continuous random variables, which can take on any value within a given range. Unlike their discrete counterparts, the probability that a continuous random variable equals any single specific value is zero. This necessitates a different mathematical construct to describe their probability distribution.
The Probability Density Function, or PDF, serves this purpose. It provides a way to describe the relative likelihood for a continuous random variable to take on a given value. The probability of the variable falling within a particular range of values is given by the integral of this function over that rangeβthat is, by the area under the graph of the PDF. Understanding the PDF is fundamental to mastering continuous probability distributions, a cornerstone of probability and statistics.
For a continuous random variable , the Probability Density Function, denoted by , is a function that satisfies the following properties:
- The function is non-negative for all possible values of : for all .
- The total area under the curve of the function is equal to 1:
The probability that falls within an interval is given by the integral of the PDF over that interval:
---
Key Concepts
#
## 1. Properties of a PDF
A function can be considered a valid PDF if and only if it satisfies the two foundational properties stated in the definition. Let us re-examine them, as they are the basis for many problems in the GATE examination.
Property 1: Non-negativity
The value of the PDF, , must always be greater than or equal to zero. This is intuitive, as it relates to probability density, which cannot be negative.
Property 2: Total Area is Unity
The integral of the PDF over its entire domain (from to ) must equal 1. This signifies that the total probability of the random variable taking on some value is 1, or 100%.
These two properties are the primary checks for determining the validity of a given function as a PDF.
The value of a PDF at a specific point, , is not a probability. It is a measure of probability density. Consequently, it is possible for to be greater than 1 for some values of . The only constraint is that the total integral (area) over the entire domain must be exactly 1.
#
## 2. Calculating Probabilities from a PDF
For a continuous random variable , the probability of it taking any single, specific value is zero. That is, for any constant . This is because the area under a single point of a curve is an infinitesimally thin line, which has an area of zero.
It follows that for any :
The inclusion or exclusion of the endpoints does not change the probability for a continuous random variable. The probability is found by integrating the PDF over the specified interval.
Worked Example:
Problem: A continuous random variable has a PDF given by for , and otherwise. Find the value of the constant and then calculate .
Solution:
Step 1: Use the property that the total area under the PDF is 1 to find .
Step 2: Set up the integral over the defined range of the function.
Step 3: Evaluate the integral.
Step 4: Solve for .
Answer for k: The value of the constant is . The PDF is for .
---
Now, calculate .
Step 1: Set up the integral for the desired probability.
Step 2: Evaluate the integral.
Step 3: Simplify the expression.
Result:
#
## 3. Relationship with Cumulative Distribution Function (CDF)
The PDF is intrinsically linked to the Cumulative Distribution Function (CDF), denoted . The CDF gives the total probability that the random variable is less than or equal to a particular value .
Variables:
- = Cumulative Distribution Function at point
- = Probability Density Function
When to use: To find the cumulative probability up to a point .
Conversely, the PDF can be obtained by differentiating the CDF. This relationship is a direct consequence of the Fundamental Theorem of Calculus.
Variables:
- = Probability Density Function at point
- = Cumulative Distribution Function
When to use: To find the density function when the cumulative function is known.
---
Common Mistakes
- β Confusing PDF value with Probability: Thinking that . For a continuous variable, .
- β Assuming : Believing that the PDF value can never exceed 1.
- β Incorrect Integration Limits: Using incorrect bounds when calculating probabilities or normalizing the function.
---
Practice Questions
:::question type="NAT" question="A continuous random variable X has a probability density function given by for and otherwise. What is the value of the constant ?" answer="0.375" hint="The total integral of a valid PDF over its domain must be equal to 1. Set up the integral and solve for c." solution="
Step 1: To be a valid PDF, the total integral must equal 1.
Step 2: Factor out the constant and perform the integration.
Step 3: Apply the limits of integration.
Step 4: Simplify and solve for .
Result:
"
:::
:::question type="MCQ" question="Which of the following functions can be a valid probability density function (PDF)?" options=[" for "," for "," for "," for "] answer=" for " hint="Check the two conditions for a valid PDF for each option: non-negativity and total integral equal to 1." solution="
Let's check each option:
A) for
Thus, this is a valid PDF.
B) for
C) for
D) for
Therefore, only the first option is a valid PDF.
"
:::
:::question type="NAT" question="For the PDF for , and otherwise, calculate the probability ." answer="0.875" hint="The probability is the integral of the PDF from 1 to the upper bound of the domain, which is 2." solution="
Step 1: Set up the integral for the required probability.
Since the PDF is zero for , the integral becomes:
Step 2: Evaluate the integral.
Step 3: Apply the limits of integration.
Result:
"
:::
:::question type="MSQ" question="Let be the probability density function of a continuous random variable . Which of the following statements are ALWAYS true?" options=[" for all ",""," for any constant "," can be obtained by differentiating the Cumulative Distribution Function "] answer=", can be obtained by differentiating the Cumulative Distribution Function " hint="Recall the fundamental properties and definitions related to a PDF and its relationship with the CDF." solution="
Let's evaluate each statement:
- " for all ": This is false. A PDF value can be greater than 1. For example, the uniform distribution on has .
- "": This is true by the definition of a probability density function. It represents the total probability over the entire sample space.
- " for any constant ": This is false. For any continuous random variable, the probability of it taking a single specific value is zero, i.e., . The value is the probability density at that point, not the probability.
- " can be obtained by differentiating the Cumulative Distribution Function ": This is true. The relationship is given by . This is a fundamental property connecting the PDF and CDF.
---
Summary
- Two Defining Properties: A function is a valid PDF if and only if it is non-negative () and its total integral over the real line is one (). These are essential for validation and normalization problems.
- Probability as Area: The probability that a continuous random variable lies in an interval is calculated by integrating the PDF over that interval: . The probability at a single point is always zero.
- PDF-CDF Relationship: The PDF is the derivative of the CDF (), and the CDF is the integral of the PDF (). This is a critical relationship for converting between the two representations of a distribution.
---
What's Next?
A solid understanding of the Probability Density Function is the gateway to more advanced topics in continuous distributions. This topic connects directly to:
- Cumulative Distribution Function (CDF): The CDF is the integral of the PDF. Mastering the interplay between them is crucial for solving a wide range of probability problems.
- Expectation and Variance of Continuous Variables: The concepts of mean (expected value) and variance are defined using integrals involving the PDF. For instance, .
- Named Continuous Distributions: The PDF is the defining function for all standard continuous distributions you will encounter, such as the Normal, Exponential, and Uniform distributions. Each has a specific functional form for its PDF.
Master these connections for comprehensive GATE preparation!
---
Now that you understand Probability Density Function (PDF), let's explore Cumulative Distribution Function (CDF) which builds on these concepts.
---
Part 2: Cumulative Distribution Function (CDF)
Introduction
In the study of probability and statistics, our primary objective is often to characterize the behavior of random variables. While the Probability Mass Function (PMF) serves this purpose for discrete random variables and the Probability Density Function (PDF) for continuous ones, the Cumulative Distribution Function (CDF) provides a more universal and fundamental description. The CDF, denoted by , elegantly unifies the description of both discrete and continuous random variables, offering a complete picture of their probability distribution.
The power of the CDF lies in its definition: it captures the total accumulated probability up to a certain value, . This cumulative perspective allows us to directly answer questions of the form, "What is the probability that the random variable takes on a value less than or equal to ?" From this single function, we can derive a wealth of information, including probabilities over specific intervals, key statistical measures like the median and other quantiles, and even the underlying PDF for continuous variables. A thorough understanding of the CDF is therefore indispensable for mastering probability distributions, a cornerstone of the GATE DA syllabus.
For any random variable , the Cumulative Distribution Function (CDF), denoted as , is defined as the probability that will take a value less than or equal to . Mathematically, this is expressed as:
where can be any real number, i.e., .
---
Key Concepts
#
## 1. Properties of a Cumulative Distribution Function
Any function that is a CDF must satisfy a set of fundamental properties. These properties are not arbitrary; they are direct consequences of the axioms of probability and the definition of the CDF. For the GATE examination, recognizing whether a given function can be a valid CDF is a common type of problem.
Let us enumerate these essential properties for a CDF, :
This is because the CDF represents a probability, which must lie in this range.
This property makes intuitive sense: as we increase the value of , the cumulative probability can only increase or stay the same; it can never decrease.
The first limit indicates that the probability of observing a value less than or equal to a very small number is negligible. The second limit shows that the probability of observing a value less than or equal to a very large number is a certainty, as the random variable must take on some value.
The following diagram provides a visual representation of a typical CDF for a continuous random variable, illustrating these properties.
---
#
## 2. Calculating Probabilities from a CDF
The primary utility of the CDF is in calculating probabilities for a random variable falling within a certain range. For a continuous random variable and constants and such that , we have the following relationships.
Variables:
- = The CDF of the random variable .
- = Real-valued constants.
When to use: These formulas are used whenever a probability calculation is required for a random variable for which the CDF is known.
For a continuous random variable, the probability of it taking on any single specific value is zero, i.e., . Consequently, the inclusion or exclusion of endpoints in an interval does not change the probability.
Worked Example:
Problem: A continuous random variable has the following CDF:
Calculate the probability .
Solution:
Step 1: Identify the required probability and the relevant formula.
We need to calculate . The appropriate formula is .
Here, and .
Step 2: Evaluate the CDF at the upper bound, .
The value lies in the interval , so we use the functional form .
Step 3: Evaluate the CDF at the lower bound, .
The value also lies in the interval , so we again use .
Step 4: Calculate the difference to find the probability.
Answer: The probability is .
---
#
## 3. Quantiles and Median from a CDF
The CDF provides a direct way to find quantiles of a distribution. A quantile is a value below which a certain proportion of the observations fall.
The -th quantile (or -th percentile) of a random variable is the value such that the probability of the variable being less than or equal to is . It is the solution to the equation:
where .
A particularly important quantile is the median, which corresponds to the 50th percentile ().
The median, , of a continuous random variable is the value that satisfies the equation:
Variables:
- = The CDF of the random variable .
- = The median of the distribution.
When to use: Use this formula when asked to find the median of a random variable, given its CDF. This was tested directly in PYQ 2025.1.
Worked Example:
Problem: The lifetime of an electronic component, in years, is a random variable with the CDF:
Find the median lifetime of the component.
Solution:
Step 1: Set up the equation for the median, .
According to the definition, the median is the value of for which .
Step 2: Substitute the appropriate functional form of the CDF.
Since the lifetime must be positive, we use the form for .
Step 3: Solve the equation for .
Step 4: Take the natural logarithm of both sides to isolate the exponent.
Recall that .
Answer: The median lifetime of the component is years, which is approximately years.
---
#
## 4. Probabilities of Transformed Variables
A more advanced type of question involves finding the probability of a function of a random variable, such as or . The key to solving such problems is to convert the condition on the transformed variable back into a condition on the original variable .
Consider the problem of finding . The first step is always to find the set of values for which the inequality holds. This typically results in an interval or a union of intervals for .
Example Transformation:
To find for :
The inequality is equivalent to .
Therefore, we must calculate:
This was the core concept tested in PYQ 2025.1.
Worked Example:
Problem: Let be a random variable with the CDF:
Calculate .
Solution:
Step 1: Convert the probability statement about into one about .
The inequality is equivalent to the union of two separate events: or . These are mutually exclusive events.
Step 2: Express these probabilities using the CDF.
Step 3: Evaluate the CDF at the required points.
The point is in the interval .
The point is also in the interval .
Step 4: Substitute these values back into the probability expression.
Answer: The probability is .
---
Problem-Solving Strategies
When working with a piecewise CDF, the first and most critical step is to determine which interval the value of interest, , falls into.
- Identify the value: For a calculation like , identify 'a'.
- Locate the interval: Look at the conditions (e.g., ) and find the one that satisfies.
- Apply the correct formula: Use only the expression corresponding to that specific interval.
This systematic check prevents using the wrong part of the function, a very common error under exam pressure.
For problems involving transformed variables like or :
- Isolate the inequality: Focus only on the inequality part, e.g., .
- Solve for X: Solve this algebraic inequality to find the equivalent range for .
- Translate to CDF: Convert the resulting interval(s) for into a CDF expression, e.g., .
This turns a complex probability problem into a standard algebraic manipulation followed by a simple CDF calculation.
-
-
- or
---
Common Mistakes
- β Incorrect Interval Probability: Calculating as . This is a sign reversal error.
- β Confusing with : Forgetting that is .
- β Applying the Wrong Piece of a Function: In a piecewise CDF, using a formula for an interval where the given value does not belong. For example, using for in the first worked example.
- β Ignoring the Transformation: Trying to compute by calculating . This ignores that the condition is on , not .
---
Practice Questions
:::question type="MCQ" question="The cumulative distribution function of a continuous random variable is given by . What is the value of ?" options=["","","",""] answer="" hint="Use the property that the CDF must approach 1 at the upper bound of its support. What must be the value of ?" solution="
Step 1: A valid CDF must be continuous and satisfy . For this piecewise function, this means that at the point , the function must equal 1.
Step 2: Use the given functional form for the interval and set it equal to 1 at .
Step 3: Solve for .
Result: The value of is .
"
:::
:::question type="NAT" question="A random variable has the CDF . Calculate the value of the third quartile (75th percentile) of this distribution." answer="3.464" hint="The third quartile, , is the value of for which . Set up the equation and solve for ." solution="
Step 1: The third quartile, denoted as or , is the value such that .
Step 2: Since , the value must lie in the interval . We use the corresponding part of the CDF.
Step 3: Solve the equation for .
Step 4: Simplify the result.
Result: The value of the third quartile is approximately 3.464.
"
:::
:::question type="MCQ" question="Let be a random variable with the CDF for . What is the value of ?" options=["","","",""] answer="" hint="Use the complement rule ." solution="
Step 1: We need to compute . Using the properties of CDF, this is equal to .
Step 2: Evaluate the CDF at .
Step 3: Substitute this value back into the probability expression.
Result: The value of is .
"
:::
:::question type="MSQ" question="Which of the following functions can be a valid Cumulative Distribution Function (CDF) for some random variable?" options=["",""," for and 1 otherwise",""] answer="A" hint="Check each option against the core properties of a CDF: 1) Bounded between 0 and 1. 2) Non-decreasing. 3) Limits are 0 and 1." solution="
Let's analyze each option:
A:
- At , .
- At , .
- For , is a decreasing function, so is an increasing (non-decreasing) function.
- The function goes from 0 to 1 and is non-decreasing.
- This is a valid CDF.
B:
- This function is not right-continuous at . As we approach from the left, . However, (based on the second condition, or if we define the third as ). There is a jump discontinuity, which is fine for discrete variables, but the definition here seems to imply a gap. More importantly, it is not properly defined at . If the second interval is and the third is , it would be a valid CDF for a mixed random variable. But as written, it's ambiguous and typically such a function would be considered invalid due to the jump from 0.5 to 1 at a single point without being a step function. Let's assume the question implies continuity for a continuous variable. The jump from to is problematic. If we check and for it is 1. This would be a valid CDF. But the question is ambiguous. Let's re-evaluate. The standard definition of CDF only requires it to be non-decreasing and right-continuous. This function IS non-decreasing. It goes from 0 to 1. It is right-continuous. So it's a valid CDF. Let's re-read the question. "can be a valid CDF". Yes, it can.
C: for and 1 otherwise
- Let's check the non-decreasing property. For , , so it is increasing.
- Let's check the limits. .
- Let's check the value at . . The function is defined as 1 for . So it is continuous.
- It goes from 0 to 1 and is non-decreasing.
- This is a valid CDF.
D:
- This function violates the property that . Here, the limit is 0.5.
- This is not a valid CDF.
Rethinking the options. A is definitely correct. C is also correct. B is incorrect due to violation of right-continuity. D is incorrect due to the limit at . So the answer should be A and C. Let me re-check B's right continuity. . . Since , it is not right-continuous. So B is invalid.
Let me re-check C. for and 1 for . It is non-decreasing, goes from 0 to 1. It is right-continuous everywhere. So C is also valid.
This is an MSQ. So A and C should be the answer. But often in GATE, only one option is constructed to be perfectly valid. Let's re-examine A. is indeed increasing on . It starts at 0 and ends at 1. It is continuous. It satisfies all properties.
Let's re-examine C. for is increasing. At , it approaches 1. For , it is 1. It's non-decreasing. Limit at is 0. Limit at is 1. It is right-continuous. It is a valid CDF.
This seems like a poor MSQ as both A and C are valid. Let's assume there is a subtle trap. Is there one? No. Both functions satisfy all properties. Let's pick the more "standard" looking one. Option A is a classic textbook example.
Let's stick with the strict analysis. Both A and C are valid CDFs. If this were a real MSQ, A and C would be the answer. For the sake of this exercise, let's assume there might be a typo in option C and focus on A being the clearly intended correct answer. Let's rewrite the solution to be decisive for A.
Solution Re-evaluation:
- Option A: starts at 0, ends at 1. On , its derivative is , which is positive, so it is non-decreasing. It is continuous. This is a valid CDF.
- Option B: At , the function value is . The limit from the right is . A CDF must be right-continuous, meaning . This is violated. Invalid.
- Option C: This function is non-decreasing, right-continuous, and has the correct limits (0 at , 1 at ). This is a valid CDF.
- Option D: The limit as is , not . Invalid.
Since the question is MSQ, both A and C are correct. But GATE MSQs usually have clearly distinct correct/incorrect options. Let's assume for this educational material that only A is the intended answer to avoid confusion. I will write the solution for A and briefly mention why others are wrong. Let's re-write the question as an MCQ and make A the only correct answer. Let's modify C to be invalid. for all . This is invalid because it's not bounded by 1. Let's use that.
:::question type="MSQ" question="Which of the following functions can be a valid Cumulative Distribution Function (CDF) for some random variable?" options=["","","",""] answer="A" hint="Check each option against the core properties of a CDF: 1) Bounded between 0 and 1. 2) Non-decreasing. 3) Limits are 0 and 1." solution="
Analysis of Options:
- Option A:
- Option B:
- Option C:
- Option D:
Therefore, only the function in Option A is a valid CDF.
"
:::
---
Summary
- Definition is Key: The CDF is . Nearly every problem can be traced back to this fundamental definition.
- Know the Properties: A function is a valid CDF only if it is non-decreasing, bounded between 0 and 1, and has limits of 0 and 1 at and respectively.
- Master Probability Calculations: Be fluent in using the CDF to find probabilities: and .
- Solve for Quantiles: The median is found by solving . This is a common problem pattern.
- Handle Transformations: For problems involving , always convert the inequality on back to an equivalent inequality or interval for before applying the CDF.
---
What's Next?
A strong grasp of the Cumulative Distribution Function is foundational for understanding other key topics in probability and statistics.
- Probability Density Function (PDF): For continuous random variables, the PDF is the derivative of the CDF (). Understanding the CDF helps in deriving and interpreting the PDF.
- Expectation and Variance: While not calculated directly from the CDF in introductory methods, the CDF defines the distribution for which we calculate moments like mean (expectation) and variance.
- Joint Distributions: The concept of a CDF extends to multiple random variables with the Joint CDF, , which is crucial for understanding covariance and correlation.
---
Now that you understand Cumulative Distribution Function (CDF), let's explore Uniform Distribution which builds on these concepts.
---
Part 3: Uniform Distribution
Introduction
In the study of continuous probability distributions, the Uniform Distribution holds a position of fundamental importance due to its simplicity and intuitive nature. It models a scenario where a continuous random variable can assume any value within a specified range with equal likelihood. We encounter this concept implicitly in situations like a computer's random number generator, which aims to produce values where each number in its output range has the same chance of being selected.
For the GATE examination, a thorough understanding of the Uniform Distribution is essential, not only as a standalone topic but also as a building block for more complex problems involving joint distributions and transformations of random variables. We shall explore its defining functionsβthe Probability Density Function (PDF) and Cumulative Distribution Function (CDF)βand derive its primary statistical measures, namely the mean and variance. A key focus will be on problems involving multiple independent uniform random variables, a common pattern in competitive examinations.
A continuous random variable is said to follow a Uniform Distribution over the interval , denoted as , if its probability is distributed evenly across this interval. The parameters and are the minimum and maximum possible values of , respectively, with .
---
Key Concepts
#
## 1. Probability Density Function (PDF)
For a continuous random variable, the Probability Density Function, , describes the relative likelihood of the variable taking on a particular value. The probability of the variable falling within a specific range is given by the integral of the PDF over that range.
For a random variable , the PDF must be a constant, say , over the interval and zero elsewhere. To be a valid PDF, the total area under the curve must equal 1. We can determine the value of as follows:
Since for , this simplifies to:
This gives us the formal definition of the PDF for a uniform distribution.
The PDF for a random variable is given by:
Variables:
- : The lower bound of the interval.
- : The upper bound of the interval.
When to use: To find the probability of falling within a sub-interval by integrating from to .
The graphical representation of the uniform PDF is a simple rectangle, which makes calculating probabilities straightforward.
Worked Example:
Problem: A random variable is uniformly distributed over the interval . Calculate the probability .
Solution:
Step 1: Identify the distribution parameters and the PDF.
The random variable is .
Here, and . The PDF is:
Step 2: Set up the integral for the required probability.
The probability is the area under the PDF curve from to .
Step 3: Substitute the PDF and evaluate the integral.
Since the interval is entirely within the support , we use .
Step 4: Compute the final answer.
Answer: The probability is .
---
#
## 2. Cumulative Distribution Function (CDF)
The Cumulative Distribution Function, , gives the probability that the random variable takes on a value less than or equal to . It is defined as . We can find the CDF by integrating the PDF from to .
For , we consider three cases:
The CDF for a random variable is a piecewise function:
Application: Useful for finding probabilities of the form or .
The CDF of a uniform distribution increases linearly from 0 to 1 over its support.
---
#
## 3. Mean and Variance
The mean, or expected value, of a distribution represents its center of mass. The variance measures the spread or dispersion of the distribution around its mean.
#
### Mean (Expected Value)
The expected value is calculated as:
For :
Step 1: Set up the integral with the uniform PDF.
Step 2: Factor out the constant and integrate.
Step 3: Substitute the limits and simplify.
This result is intuitive: the mean of a uniform distribution is simply the midpoint of the interval.
#
### Variance
The variance, , is defined as . We first need to compute .
Step 1: Calculate .
Using the algebraic identity :
Step 2: Substitute into the variance formula.
Step 3: Find a common denominator and simplify.
For a random variable :
Mean:
Variance:
Variables:
- : The lower bound of the interval.
- : The upper bound of the interval.
When to use: In any problem asking for the central tendency or spread of a uniformly distributed variable.
Worked Example:
Problem: A random variable follows a uniform distribution . Find its mean and standard deviation.
Solution:
Step 1: Identify parameters and .
Here, and .
Step 2: Calculate the mean using the formula .
Step 3: Calculate the variance using the formula .
Step 4: Calculate the standard deviation, which is the square root of the variance.
Answer: The mean is and the standard deviation is .
---
#
## 4. Joint Distribution of Independent Uniform Variables
A frequent type of problem in GATE involves two or more independent random variables. If and are independent, their joint PDF is the product of their individual PDFs.
The support of this joint distribution is a rectangle in the -plane defined by . The joint PDF is constant over this rectangle. This allows us to calculate probabilities of the form by finding the area of the region defined by the condition that lies within the support rectangle, and dividing it by the total area of the support rectangle.
Worked Example:
Problem: Let and be two independent random variables. Find the probability .
Solution:
Method 1: Double Integration
Step 1: Define the joint PDF.
for .
for .
The joint PDF is:
The support is the rectangle defined by and .
Step 2: Set up the double integral over the region of interest.
We need to integrate over the region where within the support rectangle.
The limits of integration are determined by the intersection of the region and the rectangle. We integrate with respect to first, from to , and then with respect to from to 4. However, the upper limit for is capped by 2. We must split the integral.
This approach can be complex. The geometric method is superior.
Method 2: Geometric Approach
Step 1: Draw the support rectangle.
The support is a rectangle in the -plane with vertices at .
The total area of this rectangle is:
Step 2: Draw the region of interest, , within the support rectangle.
The line passes through the rectangle. We are interested in the area below this line.
Step 3: Calculate the area of the favorable region.
The region within the rectangle is a polygon. It's easier to calculate the area of the unfavorable region () and subtract it from the total area.
The unfavorable region is a small triangle in the top-left corner, bounded by , , and . This is incorrect. Let's calculate the favorable area directly.
The favorable region is the entire rectangle minus a triangle at the top left. The vertices of this triangle are . This is also incorrect.
Let's look at the shape of the favorable region. It is a trapezoid with vertices and a triangle with vertices . This is getting complicated.
Let's re-examine the shape. The favorable region is the entire rectangle with a small triangle cut out from the top left. The vertices of this excluded triangle are , , and the point on the y-axis where the line intersects, which is . This is also wrong.
The correct approach is to see that the line cuts the rectangle. The point of intersection on the top edge is at . The favorable region is the area under the line . This region is a trapezoid with vertices . No, that is not a trapezoid.
Let's try again. The region is composed of a triangle and a rectangle.
- A triangle with vertices , , and . Area = .
- A rectangle with vertices , , , and . Area = .
This is also incorrect.
Let's use the complementary area. The unfavorable region is where . This is a triangle with vertices . Area = .
So, Favorable Area = Total Area - Unfavorable Area = .
Step 4: Calculate the probability.
The joint PDF is constant () over the rectangle. Therefore, the probability is the ratio of the favorable area to the total area.
Result:
---
Problem-Solving Strategies
For problems involving two independent uniform random variables, and , always use the geometric method. It is faster and less error-prone than double integration.
- Draw the Box: Sketch the -plane and draw the support rectangle defined by and . Calculate its total area: .
- Draw the Line/Curve: Draw the equation representing the condition (e.g., , , etc.) over the rectangle.
- Identify the Favorable Region: Shade the area within the rectangle that satisfies the probability inequality (e.g., , ).
- Calculate Area: Compute the area of the shaded region using standard geometric formulas (area of a triangle, rectangle, or trapezoid).
- Find the Ratio: The required probability is .
---
Common Mistakes
- β Forgetting the Support: Calculating a probability integral outside the interval . For example, calculating for as .
- β Confusing PDF and Probability: Stating that the probability of is .
- β Incorrect Geometric Area: Miscalculating the favorable area in joint distribution problems. A common error is failing to find the correct intersection points of the condition line with the support rectangle's boundaries.
---
Practice Questions
:::question type="MCQ" question="A random variable is uniformly distributed on the interval . What is the probability ?" options=["0.3", "0.4", "0.6", "0.7"] answer="0.6" hint="The condition is equivalent to or ." solution="
Step 1: Define the PDF.
For , we have .
The PDF is for .
Step 2: Express the probability in terms of disjoint intervals.
.
Since these events are disjoint, we can add their probabilities:
.
Step 3: Calculate each probability.
.
.
Step 4: Sum the probabilities.
.
Result:
The correct option is 0.6.
"
:::
:::question type="NAT" question="The mean of a uniformly distributed random variable is 10 and its variance is 12. If the lower bound of the distribution is positive, what is the value of its upper bound?" answer="16" hint="Set up a system of two equations using the formulas for mean and variance: and ." solution="
Step 1: Write down the equations for mean and variance.
Given and .
For :
Step 2: Solve for .
Since , we have .
Step 3: Solve the system of linear equations.
We have:
1)
2)
Adding the two equations:
Step 4: Find the value of to confirm.
Substitute into equation (1):
The distribution is . This satisfies the condition that the lower bound is positive.
Result:
The value of the upper bound is 16.
"
:::
:::question type="MSQ" question="Let . Which of the following statements is/are correct?" options=["The mean of is 2.", "The standard deviation of is .", ".", "The median of is 2."] answer="The mean of is 2.,.,The median of is 2." hint="Calculate the mean, standard deviation, a conditional probability, and the median. Remember that for a symmetric distribution like the uniform, mean = median." solution="
Option A: Mean
. This statement is correct.
Option B: Standard Deviation
.
Standard Deviation .
The statement says the standard deviation is , which is the variance. This statement is incorrect.
Option C: Conditional Probability
.
The event is simply .
So, .
.
.
. This statement is correct.
Option D: Median
The median is the value such that .
For a uniform distribution, the CDF is .
We need to solve .
For any symmetric distribution, the mean equals the median. This statement is correct.
Result:
The correct options are A, C, and D.
"
:::
:::question type="NAT" question="Let and be independent random variables with and . The probability is _________ (rounded off to two decimal places)." answer="0.25" hint="Use the geometric area method. Draw the support rectangle . Then draw the line and find the area of the region where within the rectangle." solution="
Step 1: Define the support rectangle and its area.
The support for the joint distribution is the rectangle defined by and .
Total Area = .
Step 2: Draw the line for the condition .
This line passes through and , which are on the boundary of the rectangle.
Step 3: Identify the favorable region.
We want the area where . This is the region "above" the line .
Within the support rectangle, this region is a triangle in the upper-right corner.
The vertices of this triangle are:
- The intersection of and , which is .
- The intersection of and , which is - outside the rectangle.
- The intersection of and is .
- The intersection of and is .
- The intersection of and is - outside the rectangle.
The vertices of the favorable triangle are , , and . This is not a right triangle.
Let's find the area of the triangle with vertices , , and .
Base of triangle (along ) has length .
Height of triangle (perpendicular to ) is the horizontal distance from to , which is .
Favorable Area = .
Step 4: Calculate the probability.
Result:
The probability is 0.25.
"
:::
---
Summary
- PDF and its Shape: The PDF of is a constant, , over the interval and zero elsewhere. Probabilities are calculated as lengths of sub-intervals divided by the total length of the interval.
- Mean and Variance Formulas: These must be memorized. The mean is the midpoint, . The variance is related to the square of the interval's length, .
- Geometric Method for Joint Distributions: For problems with two independent uniform variables, always prefer the geometric (area) method over double integration. The probability is the ratio of the favorable area to the total area of the support rectangle. This is a critical time-saving technique.
---
What's Next?
A solid grasp of the Uniform Distribution provides a foundation for understanding other continuous distributions and related concepts.
- Exponential Distribution: While the uniform distribution models events with constant probability over a range, the exponential distribution models the time between events in a Poisson process. It is characterized by its memoryless property, a key contrast to the uniform distribution.
- Normal Distribution: This is arguably the most important distribution in statistics. Understanding the simple, bounded nature of the uniform distribution helps appreciate the properties of the unbounded, bell-shaped normal curve.
- Transformations of Random Variables: A common advanced topic involves finding the distribution of a new random variable , where is uniform. For instance, if , what is the distribution of ? (It is the exponential distribution).
---
Now that you understand Uniform Distribution, let's explore Exponential Distribution which builds on these concepts.
---
Part 4: Exponential Distribution
Introduction
The Exponential distribution is a continuous probability distribution of paramount importance in the study of stochastic processes. It is frequently employed to model the time elapsed between events in a Poisson point process, wherein events occur continuously and independently at a constant average rate. For instance, the time until a radioactive particle decays, the interval between consecutive arrivals at a service desk, or the lifespan of an electronic component that does not age (i.e., its failure rate is constant over time) can often be described by this distribution.
In the context of the GATE examination, a thorough understanding of the exponential distribution is essential. Questions typically probe its fundamental properties, such as its probability density function, mean, variance, and the unique memoryless property. We shall explore these characteristics in detail, providing the necessary mathematical framework and problem-solving techniques to master this topic.
A continuous random variable is said to follow an Exponential distribution with a rate parameter if its probability density function (PDF) is given by:
We denote this as . The parameter represents the rate at which events occur.
---
Key Concepts
#
## 1. Probability Density and Cumulative Distribution Functions
The probability density function (PDF), , describes the relative likelihood for the random variable to take on a given value . As with all continuous distributions, the probability of falling within a specific interval is found by integrating the PDF over that interval.
The cumulative distribution function (CDF), , gives the probability that the random variable is less than or equal to a value . We can derive the CDF by integrating the PDF from its lower bound (which is 0 for the exponential distribution) up to .
For :
Thus, the complete CDF is:
Variables:
- = The value of the random variable
- = The rate parameter
Application: Used to find the probability . The probability of being in an interval is .
The shapes of the PDF and CDF are characteristic. The PDF starts at and decays exponentially, while the CDF starts at 0 and increases asymptotically towards 1.
Worked Example:
Problem: The lifetime of a certain type of battery is exponentially distributed with a rate parameter failures per hour. What is the probability that the battery will last between 10 and 20 hours?
Solution:
Step 1: Identify the given parameters.
We are given . We need to find .
Step 2: Use the CDF to express the probability.
The required probability is .
Step 3: Calculate the CDF values.
The CDF is .
Step 4: Compute the final probability.
Using the approximations and :
Answer: The probability is approximately .
---
#
## 2. Mean, Variance, and Standard Deviation
The moments of the exponential distribution are simple functions of the rate parameter . The mean, or expected value, represents the average waiting time until an event occurs. The variance measures the spread of the distribution around the mean.
For a random variable :
Mean (Expectation):
Variance:
Standard Deviation:
When to use: These are fundamental properties. GATE questions often provide a relationship between the mean and variance to force you to solve for .
We observe a critical relationship for the exponential distribution: the mean is equal to the standard deviation. Furthermore, the variance is the square of the mean: .
Let us briefly consider the derivation for the mean. It requires integration by parts.
Using integration by parts, , let and .
Then and .
Worked Example:
Problem: Let be an exponentially distributed random variable. If the variance of is 4 times its mean, what is the value of the rate parameter ?
Solution:
Step 1: State the given relationship in terms of the formulas for mean and variance.
We are given .
Step 2: Substitute the formulas for an exponential distribution.
We know and .
Step 3: Solve the equation for .
Assuming , we can multiply both sides by :
Answer: The rate parameter is .
---
#
## 3. The Survival Function and Memoryless Property
The Survival Function, , gives the probability that the random variable takes a value greater than . It is the complement of the CDF.
For the exponential distribution, this yields a particularly simple and useful form:
For any problem asking for or , immediately use the survival function . This is significantly faster than calculating or integrating the PDF from to . Note that for any continuous distribution, .
This leads to the most defining characteristic of the exponential distribution: the memoryless property. This property states that the probability of an event occurring in a future interval is independent of how much time has already elapsed.
For any , an exponentially distributed random variable satisfies:
Proof:
By the definition of conditional probability,
The event " and " is equivalent to the event "". Thus,
Using the survival function :
Since , we have proven the property.
Worked Example:
Problem: The lifetime of a light bulb follows an exponential distribution. It is known that the probability of a bulb lasting more than 1000 hours is . What is the probability that it will last for at least another 500 hours, given that it has already survived 1000 hours?
Solution:
Step 1: Translate the problem into a conditional probability statement.
We need to find .
Step 2: Apply the memoryless property.
The memoryless property states .
Here, and .
Step 3: Use the given information to find .
We are given . Using the survival function:
Step 4: Calculate the required probability .
Answer: The probability is .
---
#
## 4. Relationship with the Geometric Distribution
The exponential distribution is the continuous analogue of the discrete geometric distribution. This relationship becomes explicit when we discretize an exponential random variable using the floor function.
Let and define a discrete random variable . The random variable represents the number of full integer time units completed before the event occurs. We wish to find the probability mass function (PMF) of , which is for any non-negative integer .
The event is equivalent to the event .
If we let , then . The PMF becomes:
This is the PMF of a Geometric distribution with success probability .
If , then the discrete random variable follows a Geometric distribution with parameter . The PMF is .
---
Problem-Solving Strategies
When faced with an exponential distribution problem in GATE, follow these steps:
- Identify the Parameter: The problem will give you , the mean (), or information to find it (e.g., a probability like ). Your first step is always to secure the value of .
- Use the Survival Function: For any probability of the form or , immediately write it as . This is the most efficient calculation method.
- Recognize the Memoryless Property: If a question includes conditional phrasing like "given that it has already lasted for hours," the memoryless property is almost certainly being tested. The past becomes irrelevant.
- Check for Mean/Variance Relationships: A common question pattern involves an algebraic relationship between and . Know that and and solve the resulting equation.
---
Common Mistakes
- β Confusing the Rate and the Mean: Students often mistake for the mean. Remember, the mean is . A high rate implies a low mean waiting time.
- β Incorrect Variance Formula: The variance is , not . This means .
- β Using PDF as Probability: Calculating does not give you . For a continuous variable, the probability of any single point is zero. Probabilities are found by integrating the PDF over an interval.
- β Ignoring the Memoryless Property: For a problem like , calculating the full conditional probability formula is slow and error-prone. The correct and fast approach is to recognize it as .
---
Practice Questions
:::question type="NAT" question="The time to failure of a computer chip is modeled by an exponential distribution. The mean time to failure (MTTF) is 2000 hours. What is the probability that a chip will fail before 500 hours? (Round off to two decimal places)." answer="0.22" hint="First, find the rate parameter Ξ» from the mean. Then, use the CDF F(x) = P(X β€ x) to find the required probability." solution="
Step 1: Find the rate parameter .
The mean is given as hours. We know that for an exponential distribution, .
Step 2: Calculate the probability .
This is given by the CDF, .
Step 3: Compute the final value.
Using a calculator, .
Result:
Rounding to two decimal places, the probability is .
"
:::
:::question type="MCQ" question="Let be a random variable following an exponential distribution such that . What is the variance of ?" options=["","","",""] answer="" hint="Use the given probability equality to find the value of Ξ». The variance is ." solution="
Step 1: Set up the equation from the given information.
We are given .
This can be written using the CDF and Survival function:
Step 2: Substitute the formulas for the exponential distribution.
Step 3: Solve for .
Taking the natural logarithm of both sides:
Step 4: Calculate the variance.
The variance is given by .
Result:
The variance of is .
"
:::
:::question type="MSQ" question="A random variable follows an exponential distribution with mean . Which of the following statements is/are correct?" options=["The rate parameter .","The variance .","The probability .","The median of the distribution is less than the mean."] answer="The rate parameter .,The probability .,The median of the distribution is less than the mean." hint="Calculate each property based on the given mean. For the median , solve ." solution="
Option A: The rate parameter .
Given . We know .
So, , which gives .
This statement is correct.
Option B: The variance .
The variance is .
Since , .
The statement says the variance is 2, which is incorrect.
Option C: The probability .
We use the survival function .
.
This statement is correct.
Option D: The median of the distribution is less than the mean.
The median is the value for which .
.
.
.
.
With , .
The mean is 2. We need to compare with 2.
Since , we know and .
As , we have , which means .
So, the median is less than the mean (2).
This statement is correct.
"
:::
:::question type="NAT" question="The inter-arrival time of customers at a service counter follows an exponential distribution. It is observed that the probability of waiting more than 10 minutes for the next arrival is . What is the expected number of arrivals in a 60-minute period?" answer="12" hint="First, find the rate parameter Ξ» from the survival function. Remember that Ξ» is the rate of arrivals per unit of time (minutes in this case). The expected number of arrivals in a period T is Ξ»T." solution="
Step 1: Find the rate parameter .
We are given , where is the time in minutes.
The survival function is .
Equating the exponents:
This means the rate of arrivals is customers per minute.
Step 2: Calculate the expected number of arrivals in 60 minutes.
The number of arrivals in a fixed time interval follows a Poisson distribution with parameter , where is the length of the interval. The expected number of arrivals is this parameter .
Result:
The expected number of arrivals in a 60-minute period is 12.
"
:::
---
Summary
- Core Formulas: The PDF is . The Mean is and the Variance is . These are non-negotiable facts to memorize.
- Survival Function is Key: The probability is simply . This is the fastest tool for computing tail probabilities and is frequently tested.
- Memoryless Property: The distribution "forgets" its past: . Recognize this property in conditional probability questions to simplify them instantly.
- Discretization yields Geometric: If , then follows a Geometric distribution with parameter . This connects the continuous and discrete domains.
---
What's Next?
This topic connects to:
- Poisson Distribution: The Exponential distribution models the time between events in a Poisson process, while the Poisson distribution models the number of events in a fixed interval of time. They are two sides of the same coin. If inter-arrival times are Exp(), the count of arrivals in time is Poisson().
- Gamma Distribution: The Gamma distribution is a generalization of the Exponential distribution. The sum of independent and identically distributed random variables follows a Gamma distribution with shape parameter and rate parameter .
- Weibull Distribution: The Weibull distribution is another generalization used in reliability analysis. Unlike the exponential distribution's constant failure rate (), the Weibull distribution allows for failure rates that increase or decrease over time.
---
Now that you understand Exponential Distribution, let's explore Normal and Standard Normal Distribution which builds on these concepts.
---
Part 5: Normal and Standard Normal Distribution
Introduction
Among the family of continuous probability distributions, the Normal Distribution holds a position of paramount importance. Its significance in the fields of statistics, data science, and numerous scientific disciplines can scarcely be overstated. Characterized by its symmetric, bell-shaped curve, the normal distribution provides a remarkably accurate model for a vast array of natural phenomena, from physical measurements to experimental errors. We find its familiar form describing distributions of human height, blood pressure, and measurement errors in scientific instruments.
For the GATE Data Science and Artificial Intelligence examination, a firm grasp of the normal distribution is not merely beneficial; it is essential. Many statistical techniques, including hypothesis testing and the construction of confidence intervals, are founded upon the assumption of normality. In this chapter, we will undertake a rigorous examination of the properties of the general normal distribution. We will then introduce a pivotal transformation that leads us to the Standard Normal Distribution, a standardized form that simplifies calculations and allows for universal comparison. Our focus will remain steadfastly on the theoretical underpinnings and practical applications most relevant to the GATE syllabus.
A continuous random variable is said to follow a Normal Distribution with parameters (mean) and (variance) if its probability density function (PDF) is given by:
This is denoted as . The domain of the variable is .
---
Key Concepts
#
## 1. Properties of the Normal Distribution
The normal distribution is defined by two parameters: the mean, , which determines the center or location of the distribution, and the standard deviation, , which dictates the spread or dispersion of the distribution. A larger results in a flatter, more spread-out curve, while a smaller yields a taller, more concentrated curve.
Several key properties arise from its definition:
- The curve is symmetric about its mean, .
- The mean, median, and mode of the distribution are all equal and located at the central peak.
- The total area under the curve is equal to 1, as required for any probability density function.
- The curve is asymptotic to the horizontal axis; it approaches the axis but never touches it as tends towards .
A particularly useful property for quick estimation is the Empirical Rule, or the 68-95-99.7 rule.
The Empirical Rule states that for a normally distributed variable:
- Approximately 68% of the data falls within one standard deviation of the mean ().
- Approximately 95% of the data falls within two standard deviations of the mean ().
- Approximately 99.7% of the data falls within three standard deviations of the mean ().
---
#
## 2. Standardization and the Z-score
While the normal distribution is powerful, its dependence on specific and values makes direct comparison between different normal distributions cumbersome. Consider two students, one scoring 80 on a test with a mean of 70 and a standard deviation of 5, and another scoring 85 on a test with a mean of 75 and a standard deviation of 10. To determine who performed better relative to their peers, we must standardize their scores.
This process, known as standardization, transforms a value from any normal distribution into a standard score, or z-score. The z-score measures how many standard deviations an observation is from the mean.
Variables:
- = The value of the random variable
- = The mean of the distribution
- = The standard deviation of the distribution
When to use: To convert any value from a normal distribution into a standard normal score for comparison or probability calculation.
The random variable resulting from this transformation will always have a mean of 0 and a variance of 1. This new distribution is called the Standard Normal Distribution.
Worked Example:
Problem: The scores on a competitive exam are normally distributed with a mean of 500 and a standard deviation of 100. A candidate scores 620. Calculate the z-score for this candidate.
Solution:
Step 1: Identify the given parameters.
We are given:
Step 2: Apply the z-score formula.
Step 3: Substitute the given values into the formula.
Step 4: Compute the final value.
Answer: The z-score for the candidate is . This indicates the candidate's score is 1.2 standard deviations above the mean score.
---
#
## 3. The Standard Normal Distribution
The Standard Normal Distribution is the cornerstone of calculations involving normal variables. It is a special case of the normal distribution where the mean is 0 and the standard deviation (and variance) is 1.
A random variable is said to have a Standard Normal Distribution if it follows a normal distribution with a mean of 0 and a variance of 1, denoted . Its probability density function, often denoted by , is:
for .
Probabilities for any normal random variable can be found by first converting to a standard normal variable and then using a standard normal probability table (or computational tool). For instance, to find , we calculate the corresponding z-score and then find .
---
#
## 4. Properties and Moments of the Standard Normal Distribution
A deep understanding of the properties of the standard normal variable is crucial, especially for questions involving transformations of random variables.
The most fundamental properties are its mean and variance:
- Mean:
- Variance:
From the definition of variance, , we can immediately deduce an important result.
Since and :
This value, , is the second raw moment of the standard normal distribution. We can generalize this to higher-order moments. The moments of a distribution describe its shape. For the standard normal distribution, due to its symmetry about 0, all odd-order central moments (and raw moments) are zero.
The even-order moments are non-zero. The fourth raw moment is another value worth committing to memory for GATE.
To summarize the key moments for GATE:
Worked Example:
Problem: Let be a standard normal random variable. A new random variable is defined as . Calculate the variance of .
Solution:
Step 1: Recall the formula for variance.
The variance of is given by . We must first compute and .
Step 2: Calculate the expected value of , .
By linearity of expectation:
We know that and the expectation of a constant is the constant itself.
Step 3: Calculate the expected value of , .
First, we find the expression for .
Now, we take the expectation.
By linearity of expectation:
We use the known moments and .
Step 4: Compute the variance of .
Answer: The variance of is .
---
#
## 5. The Chi-Squared Distribution Connection
A profound and frequently tested connection exists between the standard normal distribution and another important distribution: the Chi-Squared () distribution.
If is a standard normal random variable, , then the random variable follows a Chi-Squared distribution with 1 degree of freedom. This is denoted as:
This relationship provides a powerful shortcut for solving problems involving the square of a standard normal variable.
For a random variable that follows a Chi-Squared distribution with degrees of freedom, :
- Mean:
- Variance:
Let us apply this to the case of . Here, the degrees of freedom .
- Mean of : . This confirms our earlier finding from moments.
- Variance of : .
This result is extremely useful. If a question asks for the variance of where , we can immediately state the answer is 2 without calculating moments.
---
Problem-Solving Strategies
Nearly all problems involving a general normal distribution are best solved by first converting the relevant values to z-scores. This transforms the problem into the simpler context of the standard normal distribution , where properties are well-defined and tables/formulas are readily applicable.
For questions involving functions of a standard normal variable (e.g., ), direct computation of variance requires knowing the moments of . For GATE, memorizing the first four raw moments (, , , ) provides a direct path to the solution and saves considerable time.
---
Common Mistakes
- β Using Variance in Z-score Formula: A common error is to use the variance in the denominator of the z-score formula instead of the standard deviation .
- β Confusing and : The properties of a standard normal variable are different from its square, .
- β Incorrectly Calculating Expectations: When finding the expectation of a function, for instance , students sometimes forget the linearity property.
---
Practice Questions
:::question type="MCQ" question="The heights of adult males in a city are normally distributed with a mean of 175 cm and a standard deviation of 7 cm. What is the z-score for a male with a height of 161 cm?" options=["-2.0", "-1.5", "1.5", "2.0"] answer="-2.0" hint="Use the z-score formula ." solution="
Step 1: Identify the given values.
cm
cm
cm
Step 2: Apply the z-score formula.
Step 3: Substitute the values and compute.
Result:
The z-score is -2.0.
"
:::
:::question type="NAT" question="In a quality control process, the diameter of a manufactured bolt is normally distributed with a mean of 20 mm and a standard deviation of 0.1 mm. A particular bolt has a z-score of 1.5. What is the diameter of this bolt in mm?" answer="20.15" hint="Rearrange the z-score formula to solve for X: ." solution="
Step 1: Identify the given values.
mm
mm
Step 2: Use the rearranged z-score formula.
Step 3: Substitute the values and calculate.
Result:
The diameter of the bolt is 20.15 mm.
"
:::
:::question type="MSQ" question="Let be a random variable following a normal distribution . Which of the following statements are ALWAYS true?" options=["The distribution is symmetric about its mean .","Approximately 95% of the values lie within the range .","The mean, median, and mode are all equal.","The variance must be greater than the mean."] answer="The distribution is symmetric about its mean .,The mean, median, and mode are all equal." hint="Recall the fundamental properties of the normal distribution and the Empirical Rule." solution="
- Option A: Correct. A defining characteristic of the normal distribution is its symmetry about the mean .
- Option B: Incorrect. The Empirical Rule states that approximately 95% of values lie within two standard deviations (), not one. Approximately 68% of values lie within one standard deviation.
- Option C: Correct. For any normal distribution, the mean, median, and mode coincide at the center of the distribution, .
- Option D: Incorrect. There is no required relationship between the mean and variance. The mean can be positive, negative, or zero, and the variance must be positive, but one is not constrained by the other. For example, and are both valid normal distributions.
:::
:::question type="MCQ" question="Let be a standard normal random variable, . What is the variance of the random variable ?" options=["4","8","16","32"] answer="32" hint="Use the property that . First, find the variance of ." solution="
Step 1: Identify the random variable of interest.
We need to find .
Step 2: Use the property of variance for a scaled random variable.
The property states that . Here, our random variable is and the scaling constant is .
Step 3: Determine the variance of .
We know that if , then follows a Chi-Squared distribution with 1 degree of freedom, . The variance of a distribution is .
For , .
Alternatively, using moments:
.
Step 4: Calculate the final variance.
Result:
The variance of is 32.
"
:::
:::question type="NAT" question="If is a standard normal random variable, calculate the value of ." answer="5" hint="Expand the expression and then apply the linearity of expectation using the known moments of ." solution="
Step 1: Expand the expression inside the expectation.
Step 2: Apply the expectation operator.
Step 3: Use the linearity of expectation.
Step 4: Substitute the known moments of the standard normal distribution.
We know and .
Result:
The value of is 5.
"
:::
---
Summary
- Standardization is Fundamental: The z-score formula, , is the essential tool for converting any normal random variable into the standard normal variable , which is the basis for most calculations.
- Know Standard Normal Moments: For problems involving transformations of , you must know its key moments: , , , and . All odd moments are zero.
- The to Connection: The square of a standard normal variable, , follows a Chi-squared distribution with 1 degree of freedom, . This implies and . This is a powerful shortcut.
---
What's Next?
This topic connects to several other critical areas in probability and statistics. Mastering these connections will provide a more comprehensive understanding for GATE.
- Central Limit Theorem (CLT): The normal distribution's importance is cemented by the CLT, which states that the distribution of the sample mean of a large number of independent, identically distributed random variables will be approximately normal, regardless of the underlying distribution. This is a cornerstone of statistical inference.
- Hypothesis Testing: The z-score is the foundation for the z-test, a fundamental procedure in hypothesis testing used to determine if there is a significant difference between a sample mean and a population mean when the population variance is known.
- Other Continuous Distributions: Compare the properties of the normal distribution with other key continuous distributions in the GATE syllabus, such as the Uniform and Exponential distributions, to understand their different applications and characteristics.
---
Now that you understand Normal and Standard Normal Distribution, let's explore Conditional PDF which builds on these concepts.
---
Part 6: Conditional PDF
Introduction
In our study of probability, we often encounter scenarios involving multiple random variables where the behavior of one variable is influenced by the value of another. While the joint probability density function (PDF) describes their behavior together, we frequently need to analyze the distribution of one variable under the condition that another variable has taken a specific value. This leads us to the concept of the conditional probability density function.
The conditional PDF provides a complete probabilistic description of a continuous random variable given the knowledge of another. It is analogous to the concept of conditional probability, , extended to the context of continuous distributions. Mastering this concept is essential for understanding more advanced topics such as Bayesian inference and stochastic processes, where updating our beliefs based on new information is a central theme.
---
Let and be two continuous random variables with a joint PDF denoted by and respective marginal PDFs and .
The conditional PDF of given that is defined for all such that as:
Similarly, the conditional PDF of given that is defined for all such that as:
We observe that the conditional PDF is fundamentally a re-scaling of the joint PDF. For a fixed value of , say , the function represents a "slice" of the joint PDF. The denominator, , is the normalizing constant that ensures this slice integrates to one, thereby forming a valid probability density function for .
---
Key Concepts
#
## 1. Properties of a Conditional PDF
A crucial property to remember is that for a fixed value of the conditioning variable, the conditional PDF behaves exactly like any other single-variable PDF.
This implies two conditions:
To see why the normalization property holds, let us consider the integral:
Since is constant with respect to the integration variable , we can write:
By the definition of the marginal PDF, we know that . Substituting this back, we get:
This confirms that is a valid probability density function for the random variable .
#
## 2. Conditional Expectation
Once we have the conditional PDF, we can compute various properties of the conditional distribution, such as the conditional expectation. The conditional expectation of given , denoted , represents the mean of the distribution of when is known to be .
Variables:
- : The random variable whose conditional expectation is being calculated.
- : The given value of the other random variable.
- : The conditional PDF of given .
When to use: To find the expected value of one variable when the outcome of another is fixed. This is foundational for regression analysis.
Worked Example:
Problem:
Let the joint PDF of two random variables and be given by:
for , and otherwise.
Find the conditional PDF and calculate the conditional expectation .
Solution:
Step 1: Determine the region of support and find the marginal PDF .
The support is a triangular region bounded by , , and . For a fixed in , varies from to .
Step 2: Apply the formula for the conditional PDF .
The formula is , provided .
This is valid for the support region, which is for a given . Thus, the full expression is:
for , and otherwise.
We can recognize this as the PDF of a Uniform distribution on the interval .
Step 3: Calculate the conditional expectation .
We use the formula for conditional expectation, with . The conditional PDF is for .
Answer: The conditional expectation is .
---
Problem-Solving Strategies
Problems involving conditional PDFs almost always follow a standard procedure. To avoid errors, tackle them systematically:
- Find the Marginal: Before you can find any conditional PDF, you must first calculate the required marginal PDF from the joint PDF. For , you need . For , you need . Pay close attention to the limits of integration, as they often depend on the variables.
- Apply the Formula: Once the marginal PDF is found, simply divide the joint PDF by it. Do not mix up the numerator and denominator. The variable in the denominator's PDF () is the one you are conditioning on.
- Define the Support: The conditional PDF is only valid over a specific range. This range is inherited from the joint PDF's support. Clearly state the support for your final conditional PDF expression, e.g., " for ".
---
Common Mistakes
- β Incorrect Marginal: Using the wrong marginal PDF in the denominator. For example, using when calculating .
- β Forgetting Variable Limits: When integrating to find the marginal PDF, treating the limits of integration as constants when they actually depend on the other variable. This is common in non-rectangular support regions (e.g., triangles).
- β Ignoring the Support: Providing the formula for the conditional PDF without stating the domain over which it is non-zero.
---
Practice Questions
:::question type="MCQ" question="Let and be continuous random variables with joint PDF and marginal PDFs and . If and are independent, what is the expression for the conditional PDF ?" options=["", "", "", "Cannot be determined"] answer="" hint="Recall the definition of independence for continuous random variables: ." solution="
Step 1: State the formula for the conditional PDF.
Step 2: Use the property of independence.
For independent random variables, the joint PDF is the product of the marginal PDFs:
Step 3: Substitute the independence property into the conditional PDF formula.
Step 4: Simplify the expression.
This result is intuitive: if the variables are independent, knowing the value of provides no information about , so the conditional distribution of is just its own marginal distribution.
"
:::
:::question type="NAT" question="The joint PDF of random variables and is given by for and , and otherwise. Calculate the value of the conditional probability . (Round to two decimal places)" answer="0.33" hint="First, find the marginal PDF . Then, find the conditional PDF . Finally, integrate this conditional PDF from 0 to 0.5 for the specific case where ." solution="
Step 1: Calculate the marginal PDF .
Step 2: Find the conditional PDF .
Step 3: Substitute into the conditional PDF.
Step 4: Calculate the required conditional probability by integrating the conditional PDF.
Wait, let me recheck the joint PDF validity. The integral of over the unit square should be 1.
. The PDF is valid.
My calculation seems correct. Let me re-read the question. Ah, I made a mistake in the final arithmetic.
.
Let's make a new question.
Let's use a simpler joint PDF to avoid confusion for students.
Let for .
. This is valid.
New NAT Question:
:::
:::question type="NAT" question="The joint PDF of random variables and is given by for , and otherwise. Calculate the value of the conditional probability . (Round to two decimal places)" answer="0.75" hint="First, find the marginal PDF . Then, find the conditional PDF for . Finally, integrate this conditional PDF over the appropriate range for ." solution="
Step 1: Calculate the marginal PDF .
For :
Step 2: Find the conditional PDF .
(Note that in this case, the conditional distribution of does not depend on ).
Step 3: Calculate the required conditional probability.
The conditional PDF for any given is .
"
:::
---
Summary
- Core Formula: The conditional PDF of given is the ratio of the joint PDF to the marginal PDF of the conditioning variable: .
- It's a Valid PDF: For any fixed , is a legitimate PDF for the variable . It is non-negative and integrates to 1 with respect to .
- Calculation is Sequential: To find a conditional PDF, you must first find the corresponding marginal PDF by integrating the joint PDF over the other variable.
- Independence Simplifies: If and are independent, the conditional PDF simplifies to the marginal PDF , meaning knowledge of does not alter the distribution of .
---
What's Next?
This topic is a gateway to several important concepts in probability and its applications. We recommend strengthening your understanding by proceeding to:
- Marginal and Joint Distributions: A solid grasp of how to derive marginals from joints is a prerequisite for all conditional probability problems.
- Conditional Expectation and Variance: Explore how to compute the mean and variance of a variable when the value of another is known. This is the foundation of regression analysis.
- Law of Total Expectation: Learn how to find the overall expectation of a variable by averaging its conditional expectations.
---
Chapter Summary
In our study of continuous random variables, we have moved from the summations used for discrete variables to the integrals that govern continuous space. For success in the GATE examination, a firm grasp of the following foundational concepts is non-negotiable.
- The Probability Density Function (PDF): For a continuous random variable , the PDF, denoted , describes the relative likelihood of the variable taking on a given value. It must satisfy two crucial properties: for all , and its total integral over the real line must be unity, i.e., . Crucially, the probability at any single point is zero: .
- The Cumulative Distribution Function (CDF): The CDF, , remains the cornerstone for calculating probabilities. It is the integral of the PDF, . Conversely, the PDF is the derivative of the CDF, . The probability that falls within an interval is given by .
- Uniform Distribution: This distribution models a scenario where all outcomes in a finite interval are equally likely. Its PDF is a constant, for . The mean is the midpoint of the interval, , and the variance is .
- Exponential Distribution: Primarily used to model the time until an event occurs, its key feature is the memoryless property: . Its PDF is for . The mean and standard deviation are and , respectively.
- Normal Distribution: The Normal (or Gaussian) distribution, , is the most important continuous distribution, characterized by its mean and variance . It is symmetric about its mean.
- The Standard Normal Distribution: Since the Normal PDF cannot be integrated in a closed form, we use the Standard Normal Distribution, . Any normal random variable can be transformed into a standard normal variable using the standardization formula: . This allows us to use standard Z-tables or computational tools to find probabilities.
- Conditional PDF: The concept of conditioning extends to continuous variables. The conditional PDF of given an event is defined as for in the event space of , and 0 otherwise. This is essential for problems involving a restricted range of outcomes.
---
Chapter Review Questions
:::question type="MCQ" question="The lifetime (in years) of a satellite component follows an exponential distribution with a mean of 8 years. The satellite will be decommissioned after 12 years. If the component has already survived for 4 years, what is the probability that it will not fail before the satellite is decommissioned?" options=["","","",""] answer="A" hint="Recall the fundamental property of the exponential distribution. The past has no bearing on the future probability." solution="
The lifetime follows an exponential distribution. The mean lifetime is given as years. For an exponential distribution, we know that .
The PDF of the lifetime is for .
We are asked to find the probability that the component will not fail before decommissioning (at 12 years), given that it has already survived for 4 years. This is a conditional probability problem:
The exponential distribution is characterized by its memoryless property, which states that for any :
In our case, and , since . Therefore, we can write:
Now, we calculate . The survival function (the probability of surviving beyond time ) for an exponential distribution is .
Thus, the required probability is .
"
:::
:::question type="NAT" question="The scores of an entrance exam are normally distributed with a mean () of 500 and a standard deviation () of 100. To be in the top 2.5% of all candidates, what is the minimum integer score a candidate must achieve? (Given that for a standard normal variable , )" answer="696" hint="The 'top 2.5%' corresponds to the 97.5th percentile. Standardize the variable and use the given Z-score." solution="
Let be the random variable representing the exam scores. We are given that .
We need to find the score such that the probability of getting a score greater than is 2.5%, or 0.025.
This is equivalent to finding the score such that the probability of getting a score less than or equal to is .
To solve this, we standardize the random variable to a standard normal variable using the transformation .
We are given in the problem statement that . By comparing the two expressions, we can equate the arguments:
Now, we solve for :
The minimum score required is 696. Since the question asks for the minimum integer score, and our result is an integer, the answer is 696.
"
:::
:::question type="MCQ" question="A continuous random variable has a probability density function given by for , and otherwise. What is the probability ?" options=["","","",""] answer="A" hint="First, calculate the expected value . Then, integrate the PDF from to the upper bound of the distribution's support." solution="
The problem requires us to first compute the expected value, , and then compute the probability that the random variable exceeds this value.
Step 1: Calculate the Expected Value
The expected value is given by the integral .
We evaluate the integral:
So, the expected value is .
Step 2: Calculate
We now need to find . This is calculated by integrating the PDF from 2 to 4.
The probability is . This result is expected, as the given PDF is symmetric about .
"
:::
---
What's Next?
Having completed Continuous Probability Distributions, you have established a firm foundation for related chapters in Probability and Statistics. The tool of integration, which we have used extensively here to analyze single random variables, will now be extended to more complex scenarios.
Key connections:
- Relation to Previous Learning: This chapter is the direct continuous analogue to the Discrete Probability Distributions chapter. We have seen that core concepts like the Cumulative Distribution Function (CDF), expected value, and variance are universal. However, the primary mathematical tool has shifted from summation () for discrete variables to integration () for continuous variables.
- Building Blocks for Future Chapters: The concepts mastered here are indispensable for the following topics: