Skip to main content

Uniform Distribution

The Uniform Distribution is the simplest probability distribution. It describes a scenario where every possible outcome is equally likely to occur. In Machine Learning, it is the bedrock of random number generation and the initial state of many neural networks.

1. Two Flavors of Uniformityโ€‹

We distinguish between the Uniform distribution based on whether the data is countable (Discrete) or measurable (Continuous).

2. Discrete Uniform Distributionโ€‹

A discrete random variable XX has a uniform distribution if each of the nn values in its range has the same probability.

The Mathโ€‹

P(X=x)=1nP(X = x) = \frac{1}{n}
  • Mean (ฮผ\mu): a+b2\frac{a + b}{2}
  • Variance (ฯƒ2\sigma^2): n2โˆ’112\frac{n^2 - 1}{12} (for consecutive integers)

3. Continuous Uniform Distributionโ€‹

A continuous random variable X on the interval [a, b] has a uniform distribution if its probability density is constant across that interval.

The Mathโ€‹

The Probability Density Function (PDF) is:

f(x)={1bโˆ’aforย aโ‰คxโ‰คb0otherwisef(x) = \begin{cases} \frac{1}{b - a} & \text{for } a \le x \le b \\ 0 & \text{otherwise} \end{cases}

Key Properties:โ€‹

  • Mean (ฮผ\mu): a+b2\frac{a + b}{2} (The midpoint of the interval)
  • Variance (ฯƒ2\sigma^2): (bโˆ’a)212\frac{(b - a)^2}{12}
The "Rectangle" Distribution

Because the height is constant (1/(bโˆ’a)1/(b-a)) and the width is (bโˆ’ab-a), the total area is always 1. This is why the continuous uniform distribution is often visualized as a perfect rectangle.

4. Why this matters in Machine Learningโ€‹

A. Weight Initializationโ€‹

When we start training a Neural Network, we cannot set all weights to zero (this causes symmetry problems). Instead, we often initialize weights using a Uniform Distribution (e.g., between โˆ’0.05-0.05 and 0.050.05) to give each neuron a unique starting point.

B. Random Sampling and Shufflingโ€‹

When we "shuffle" a dataset before training, we are using a discrete uniform distribution to ensure that every row has an equal probability of appearing in any given position in the batch.

C. Data Augmentationโ€‹

In computer vision, we might rotate an image by a random angle. We typically pick that angle from a continuous uniform distribution, such as AngleโˆผU(โˆ’20โˆ˜,20โˆ˜)\text{Angle} \sim \mathcal{U}(-20^\circ, 20^\circ), to ensure we aren't biasing the model toward specific rotations.

Instead of checking every single value (Grid Search), Random Search picks hyperparameter values from a uniform distribution. Statistically, this is often more efficient at finding the optimal "needle in the haystack."

5. Summary Tableโ€‹

FeatureDiscrete UniformContinuous Uniform
NotationXโˆผU(n)X \sim \mathcal{U}(n)XโˆผU(a,b)X \sim \mathcal{U}(a, b)
Height1/n1/n (Probability)1/(bโˆ’a)1/(b-a) (Density)
ShapeSet of equal-height dots/barsA flat rectangle
Common UseShuffling, Dice, IndicesWeight initialization, Augmentation

We have now covered the "Big Four" distributions: Normal, Binomial, Poisson, and Uniform. But how do we measure the "distance" between these distributions or the "information" they contain?