Loss Functions: Measuring Error
A Loss Function (also known as a Cost Function) is a method of evaluating how well your specific algorithm models your featured data. If your predictions are totally off, your loss function will output a higher number. If theyβre pretty good, itβll output a lower number.
The goal of training a neural network is to use Optimization to find the weights that result in the lowest possible loss.
1. Regression Loss Functionsβ
When you are predicting a continuous value (like a house price or temperature), you need to measure the distance between the predicted number and the actual number.
A. Mean Squared Error (MSE)β
MSE is the most common loss function for regression. It squares the difference between prediction and reality, which heavily penalizes large errors.
Where:
- = number of samples
- = actual value
- = predicted value
B. Mean Absolute Error (MAE)β
MAE takes the absolute difference. Unlike MSE, it treats all errors linearly. It is more "robust" to outliers because it doesn't square the large deviations.
2. Classification Loss Functionsβ
When predicting categories, we don't look at "distance"; we look at probability divergence.
A. Binary Cross-Entropy (Log Loss)β
Used for binary classification (Yes/No). It measures the performance of a classification model whose output is a probability value between 0 and 1.
Where:
- = actual label (0 or 1)
- = predicted probability of the positive class (1)
- = natural logarithm
B. Categorical Cross-Entropyβ
Used for multi-class classification (e.g., Cat vs. Dog vs. Bird). It compares the predicted probability distribution across all classes with the actual one-hot encoded label.
Where:
- = number of classes
- = actual label (1 for the correct class, 0 otherwise)
- = predicted probability for class
3. Which Loss Function to Choose?β
Choosing the right loss function depends entirely on your output layer and the problem type:
| Problem Type | Output Layer Activation | Recommended Loss |
|---|---|---|
| Regression | Linear (None) | Mean Squared Error (MSE) |
| Binary Classification | Sigmoid | Binary Cross-Entropy |
| Multi-class Classification | Softmax | Categorical Cross-Entropy |
| Multi-label Classification | Sigmoid (per node) | Binary Cross-Entropy |
4. Implementation with Kerasβ
# For Regression
model.compile(optimizer='adam', loss='mean_squared_error')
# For Binary Classification (0 or 1)
model.compile(optimizer='adam', loss='binary_crossentropy')
# For Multi-class Classification (One-hot labels)
model.compile(optimizer='adam', loss='categorical_crossentropy')
5. The Loss Landscapeβ
If we visualize the loss function relative to two weights, it looks like a hilly terrain. Training a model is essentially the process of "walking down the hill" to find the lowest valley (The Global Minimum).
Referencesβ
- PyTorch Docs: Loss Functions Gallery
- Machine Learning Mastery: Loss and Loss Functions for Training Deep Learning Neural Networks
Now that we have a "Loss" score, how do we actually change the weights to make that score smaller?