Skip to main content

Loss Functions: Measuring Error

A Loss Function (also known as a Cost Function) is a method of evaluating how well your specific algorithm models your featured data. If your predictions are totally off, your loss function will output a higher number. If theyโ€™re pretty good, itโ€™ll output a lower number.

The goal of training a neural network is to use Optimization to find the weights that result in the lowest possible loss.

1. Regression Loss Functionsโ€‹

When you are predicting a continuous value (like a house price or temperature), you need to measure the distance between the predicted number and the actual number.

A. Mean Squared Error (MSE)โ€‹

MSE is the most common loss function for regression. It squares the difference between prediction and reality, which heavily penalizes large errors.

MSE=1nโˆ‘i=1n(yiโˆ’y^i)2MSE = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2

Where:

  • nn = number of samples
  • yiy_i = actual value
  • y^i\hat{y}_i = predicted value

B. Mean Absolute Error (MAE)โ€‹

MAE takes the absolute difference. Unlike MSE, it treats all errors linearly. It is more "robust" to outliers because it doesn't square the large deviations.

2. Classification Loss Functionsโ€‹

When predicting categories, we don't look at "distance"; we look at probability divergence.

A. Binary Cross-Entropy (Log Loss)โ€‹

Used for binary classification (Yes/No). It measures the performance of a classification model whose output is a probability value between 0 and 1.

L=โˆ’[ylogโก(p)+(1โˆ’y)logโก(1โˆ’p)]L = -[y \log(p) + (1 - y) \log(1 - p)]

Where:

  • yy = actual label (0 or 1)
  • pp = predicted probability of the positive class (1)
  • logโก\log = natural logarithm

B. Categorical Cross-Entropyโ€‹

Used for multi-class classification (e.g., Cat vs. Dog vs. Bird). It compares the predicted probability distribution across all classes with the actual one-hot encoded label.

L=โˆ’โˆ‘i=1Cyilogโก(pi)L = - \sum_{i=1}^{C} y_i \log(p_i)

Where:

  • CC = number of classes
  • yiy_i = actual label (1 for the correct class, 0 otherwise)
  • pip_i = predicted probability for class ii

3. Which Loss Function to Choose?โ€‹

Choosing the right loss function depends entirely on your output layer and the problem type:

Problem TypeOutput Layer ActivationRecommended Loss
RegressionLinear (None)Mean Squared Error (MSE)
Binary ClassificationSigmoidBinary Cross-Entropy
Multi-class ClassificationSoftmaxCategorical Cross-Entropy
Multi-label ClassificationSigmoid (per node)Binary Cross-Entropy

4. Implementation with Kerasโ€‹

# For Regression
model.compile(optimizer='adam', loss='mean_squared_error')

# For Binary Classification (0 or 1)
model.compile(optimizer='adam', loss='binary_crossentropy')

# For Multi-class Classification (One-hot labels)
model.compile(optimizer='adam', loss='categorical_crossentropy')

5. The Loss Landscapeโ€‹

If we visualize the loss function relative to two weights, it looks like a hilly terrain. Training a model is essentially the process of "walking down the hill" to find the lowest valley (The Global Minimum).

Referencesโ€‹


Now that we have a "Loss" score, how do we actually change the weights to make that score smaller?