The Normal (Gaussian) Distribution
The Normal Distribution, often called the Gaussian Distribution, is the most significant probability distribution in statistics and Machine Learning. It is characterized by its iconic symmetric "bell shape," where most observations cluster around the central peak.
1. The Mathematical Definition
A continuous random variable is said to be normally distributed with mean and variance (denoted as ) if its Probability Density Function (PDF) is:
Key Parameters:
- Mean (): Determines the center of the peak (location).
- Standard Deviation (): Determines the "spread" or width of the bell (scale).
2. The Empirical Rule (68-95-99.7)
One of the most useful properties of the Normal Distribution is that we know exactly how much data falls within specific distances from the mean.
3. The Standard Normal Distribution (Z)
A Standard Normal Distribution is a special case where the mean is 0 and the standard deviation is .
We can convert any normal distribution into a standard one using the Z-score formula. This process is called Standardization, a critical step in ML feature engineering.