Bernoulli and Binomial Distributions
In Machine Learning, we often ask "Yes/No" questions: Will a user click this ad? Is this transaction fraudulent? Does the image contain a cat? These binary outcomes are modeled using the Bernoulli and Binomial distributions.
1. The Bernoulli Distributionβ
A Bernoulli Distribution is the simplest discrete distribution. It represents a single trial with exactly two possible outcomes: Success (1) and Failure (0).
The Mathβ
If is the probability of success, then (often denoted as ) is the probability of failure.
- Mean ():
- Variance ():
2. The Binomial Distributionβ
The Binomial Distribution is the sum of independent Bernoulli trials. It tells us the probability of getting exactly successes in attempts.
The 4 Conditions (B.I.N.S.)β
For a variable to follow a Binomial distribution, it must meet these criteria:
- Binary: Only two outcomes per trial (Success/Failure).
- Independent: The outcome of one trial doesn't affect the next.
- Number: The number of trials () is fixed in advance.
- Same: The probability of success () is the same for every trial.
The Formulaβ
The Probability Mass Function (PMF) is:
Where is the "n-choose-k" combination formula: .
3. Visualizing the Trialsβ
If we have trials, the possible outcomes can be visualized as a tree. The Binomial distribution simply groups these outcomes by the total number of successes.
4. Why this matters in Machine Learningβ
A. Binary Classificationβ
When you train a Logistic Regression model, you are essentially assuming your target variable follows a Bernoulli distribution. The model outputs the parameter (the probability of the positive class).
B. Evaluation (A/B Testing)β
If you show an ad to people () and click it, you use the Binomial distribution to calculate the confidence interval of your click-through rate.
C. Logistic Loss (Cross-Entropy)β
The "Loss Function" used in most neural networks is derived directly from the likelihood of a Bernoulli distribution. Minimizing this loss is equivalent to finding the p that best fits your binary data.
5. Summary Tableβ
| Feature | Bernoulli | Binomial |
|---|---|---|
| Number of Trials | ||
| Outcomes | or | , , , , |
| Mean | ||
| Variance |
The Binomial distribution covers discrete successes. But what if we are counting the number of events happening over a fixed interval of time or space? For that, we turn to the Poisson distribution.