Random Variables
In probability, a Random Variable (RV) is a functional mapping that assigns a numerical value to each outcome in a sample space. It allows us to move from qualitative outcomes (like "Rain" or "No Rain") to quantitative data that we can feed into a Machine Learning model.
1. What exactly is a Random Variable?
A random variable is not a variable in the algebraic sense (where ). Instead, it is a function that maps the sample space to the real numbers .
Example: If you flip two coins, the sample space is . We can define a Random Variable as the "Number of Heads."
2. Types of Random Variables
Machine Learning handles two distinct types of data, which correspond to the two types of random variables:
A. Discrete Random Variables
These take on a finite or countably infinite number of distinct values.
- ML Example: The number of clicks on an ad, the number of words in a sentence.
- Function: Uses a Probability Mass Function (PMF), .
B. Continuous Random Variables
These can take any value within a range or interval.
- ML Example: The probability that a house will sell for a specific price, the weight of a person.
- Function: Uses a Probability Density Function (PDF), .
For a continuous variable, the probability of the variable being exactly one specific number (e.g., ) is always . Instead, we calculate the probability over an interval.
3. Describing Distributions
To understand the behavior of a Random Variable, we use three primary functions:
| Function | Symbol | Purpose |
|---|---|---|
| PMF / PDF | or | The probability (or density) of a specific value. |
| CDF | The probability that will be less than or equal to . | |
| Expected Value | The "long-term average" or center of the distribution. |
The Cumulative Distribution Function (CDF)
The CDF is defined for both discrete and continuous variables:
4. Expected Value and Variance
In Machine Learning, we often want to know the "typical" value of a feature and how much it varies.
Expected Value (Mean)
The weighted average of all possible values.
- Discrete:
- Continuous: