Uniform Distribution
The Uniform Distribution is the simplest probability distribution. It describes a scenario where every possible outcome is equally likely to occur. In Machine Learning, it is the bedrock of random number generation and the initial state of many neural networks.
1. Two Flavors of Uniformityโ
We distinguish between the Uniform distribution based on whether the data is countable (Discrete) or measurable (Continuous).
2. Discrete Uniform Distributionโ
A discrete random variable has a uniform distribution if each of the values in its range has the same probability.
The Mathโ
- Mean ():
- Variance (): (for consecutive integers)
3. Continuous Uniform Distributionโ
A continuous random variable X on the interval [a, b] has a uniform distribution if its probability density is constant across that interval.
The Mathโ
The Probability Density Function (PDF) is:
Key Properties:โ
- Mean (): (The midpoint of the interval)
- Variance ():
Because the height is constant () and the width is (), the total area is always 1. This is why the continuous uniform distribution is often visualized as a perfect rectangle.
4. Why this matters in Machine Learningโ
A. Weight Initializationโ
When we start training a Neural Network, we cannot set all weights to zero (this causes symmetry problems). Instead, we often initialize weights using a Uniform Distribution (e.g., between and ) to give each neuron a unique starting point.
B. Random Sampling and Shufflingโ
When we "shuffle" a dataset before training, we are using a discrete uniform distribution to ensure that every row has an equal probability of appearing in any given position in the batch.
C. Data Augmentationโ
In computer vision, we might rotate an image by a random angle. We typically pick that angle from a continuous uniform distribution, such as , to ensure we aren't biasing the model toward specific rotations.
D. Hyperparameter Search (Random Search)โ
Instead of checking every single value (Grid Search), Random Search picks hyperparameter values from a uniform distribution. Statistically, this is often more efficient at finding the optimal "needle in the haystack."
5. Summary Tableโ
| Feature | Discrete Uniform | Continuous Uniform |
|---|---|---|
| Notation | ||
| Height | (Probability) | (Density) |
| Shape | Set of equal-height dots/bars | A flat rectangle |
| Common Use | Shuffling, Dice, Indices | Weight initialization, Augmentation |
We have now covered the "Big Four" distributions: Normal, Binomial, Poisson, and Uniform. But how do we measure the "distance" between these distributions or the "information" they contain?