Bayes' Theorem
Bayes' Theorem is more than just a formula; it is a philosophy of how to learn. It describes the probability of an event based on prior knowledge of conditions that might be related to the event. In Machine Learning, it is the engine behind Bayesian Inference and the Naive Bayes classifier.
1. The Formulaβ
Bayes' Theorem allows us to find if we already know .
Breaking Down the Termsβ
- (Posterior): The probability of our hypothesis after seeing the evidence .
- (Likelihood): The probability of the evidence appearing given that hypothesis is true.
- (Prior): Our initial belief about hypothesis before seeing any evidence.
- (Evidence/Marginal Likelihood): The total probability of seeing evidence under all possible hypotheses.
2. The Logic of Bayesian Updatingβ
Bayesian logic is iterative. Today's Posterior becomes tomorrow's Prior.
3. A Practical Example: Medical Testingβ
Suppose a disease affects 1% of the population (Prior). A test for this disease is 99% accurate (Likelihood). If a patient tests positive, what is the probability they actually have the disease?
Using Bayes' Theorem:β
Even with a 99% accurate test, the probability of having the disease given a positive result is only 50%. This is because the disease is so rare (low Prior) that the number of false positives equals the number of true positives.
4. Bayes' Theorem in Machine Learningβ
A. Naive Bayes Classifierβ
Naive Bayes is a popular algorithm for text classification (like spam detection). It assumes that every feature (word) is independent of every other feature (the "Naive" part) and uses Bayes' Theorem to calculate the probability of a category:
B. Bayesian Neural Networksβ
Unlike standard neural networks that have fixed weights, Bayesian Neural Networks represent weights as probability distributions. This allows the model to express uncertainty, it can say "I think this is a cat, but I'm only 60% sure."
C. Hyperparameter Optimizationβ
Bayesian Optimization is a strategy used to find the best hyperparameters for a model. It builds a probability model of the objective function and uses it to select the most promising hyperparameters to evaluate next.
5. Summary Tableβ
| Concept | Traditional (Frequentist) | Bayesian |
|---|---|---|
| View of Probability | Long-run frequency of events. | Measure of "degree of belief." |
| Parameters | Fixed, unknown constants. | Random variables with distributions. |
| New Data | Used to refine the estimate. | Used to update the entire belief (Prior \to Posterior). |
Now that we can update our beliefs using Bayes' Theorem, we need to understand how these probabilities are spread across different outcomes. This brings us to Random Variables and Probability Distributions.