Skip to main content

Uniform Distribution

The Uniform Distribution is the simplest probability distribution. It describes a scenario where every possible outcome is equally likely to occur. In Machine Learning, it is the bedrock of random number generation and the initial state of many neural networks.

1. Two Flavors of Uniformity​

We distinguish between the Uniform distribution based on whether the data is countable (Discrete) or measurable (Continuous).

2. Discrete Uniform Distribution​

A discrete random variable XX has a uniform distribution if each of the nn values in its range has the same probability.

The Math​

P(X=x)=1nP(X = x) = \frac{1}{n}
  • Mean (ΞΌ\mu): a+b2\frac{a + b}{2}
  • Variance (Οƒ2\sigma^2): n2βˆ’112\frac{n^2 - 1}{12} (for consecutive integers)

3. Continuous Uniform Distribution​

A continuous random variable X on the interval [a, b] has a uniform distribution if its probability density is constant across that interval.

The Math​

The Probability Density Function (PDF) is:

f(x)={1bβˆ’aforΒ a≀x≀b0otherwisef(x) = \begin{cases} \frac{1}{b - a} & \text{for } a \le x \le b \\ 0 & \text{otherwise} \end{cases}

Key Properties:​

  • Mean (ΞΌ\mu): a+b2\frac{a + b}{2} (The midpoint of the interval)
  • Variance (Οƒ2\sigma^2): (bβˆ’a)212\frac{(b - a)^2}{12}
The "Rectangle" Distribution

Because the height is constant (1/(bβˆ’a)1/(b-a)) and the width is (bβˆ’ab-a), the total area is always 1. This is why the continuous uniform distribution is often visualized as a perfect rectangle.

4. Why this matters in Machine Learning​

A. Weight Initialization​

When we start training a Neural Network, we cannot set all weights to zero (this causes symmetry problems). Instead, we often initialize weights using a Uniform Distribution (e.g., between βˆ’0.05-0.05 and 0.050.05) to give each neuron a unique starting point.

B. Random Sampling and Shuffling​

When we "shuffle" a dataset before training, we are using a discrete uniform distribution to ensure that every row has an equal probability of appearing in any given position in the batch.

C. Data Augmentation​

In computer vision, we might rotate an image by a random angle. We typically pick that angle from a continuous uniform distribution, such as Angle∼U(βˆ’20∘,20∘)\text{Angle} \sim \mathcal{U}(-20^\circ, 20^\circ), to ensure we aren't biasing the model toward specific rotations.

Instead of checking every single value (Grid Search), Random Search picks hyperparameter values from a uniform distribution. Statistically, this is often more efficient at finding the optimal "needle in the haystack."

5. Summary Table​

FeatureDiscrete UniformContinuous Uniform
NotationX∼U(n)X \sim \mathcal{U}(n)X∼U(a,b)X \sim \mathcal{U}(a, b)
Height1/n1/n (Probability)1/(bβˆ’a)1/(b-a) (Density)
ShapeSet of equal-height dots/barsA flat rectangle
Common UseShuffling, Dice, IndicesWeight initialization, Augmentation

We have now covered the "Big Four" distributions: Normal, Binomial, Poisson, and Uniform. But how do we measure the "distance" between these distributions or the "information" they contain?