Skip to main content

Loss Functions: Measuring Error

A Loss Function (also known as a Cost Function) is a method of evaluating how well your specific algorithm models your featured data. If your predictions are totally off, your loss function will output a higher number. If they’re pretty good, it’ll output a lower number.

The goal of training a neural network is to use Optimization to find the weights that result in the lowest possible loss.

1. Regression Loss Functions​

When you are predicting a continuous value (like a house price or temperature), you need to measure the distance between the predicted number and the actual number.

A. Mean Squared Error (MSE)​

MSE is the most common loss function for regression. It squares the difference between prediction and reality, which heavily penalizes large errors.

MSE=1nβˆ‘i=1n(yiβˆ’y^i)2MSE = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2

Where:

  • nn = number of samples
  • yiy_i = actual value
  • y^i\hat{y}_i = predicted value

B. Mean Absolute Error (MAE)​

MAE takes the absolute difference. Unlike MSE, it treats all errors linearly. It is more "robust" to outliers because it doesn't square the large deviations.

2. Classification Loss Functions​

When predicting categories, we don't look at "distance"; we look at probability divergence.

A. Binary Cross-Entropy (Log Loss)​

Used for binary classification (Yes/No). It measures the performance of a classification model whose output is a probability value between 0 and 1.

L=βˆ’[ylog⁑(p)+(1βˆ’y)log⁑(1βˆ’p)]L = -[y \log(p) + (1 - y) \log(1 - p)]

Where:

  • yy = actual label (0 or 1)
  • pp = predicted probability of the positive class (1)
  • log⁑\log = natural logarithm

B. Categorical Cross-Entropy​

Used for multi-class classification (e.g., Cat vs. Dog vs. Bird). It compares the predicted probability distribution across all classes with the actual one-hot encoded label.

L=βˆ’βˆ‘i=1Cyilog⁑(pi)L = - \sum_{i=1}^{C} y_i \log(p_i)

Where:

  • CC = number of classes
  • yiy_i = actual label (1 for the correct class, 0 otherwise)
  • pip_i = predicted probability for class ii

3. Which Loss Function to Choose?​

Choosing the right loss function depends entirely on your output layer and the problem type:

Problem TypeOutput Layer ActivationRecommended Loss
RegressionLinear (None)Mean Squared Error (MSE)
Binary ClassificationSigmoidBinary Cross-Entropy
Multi-class ClassificationSoftmaxCategorical Cross-Entropy
Multi-label ClassificationSigmoid (per node)Binary Cross-Entropy

4. Implementation with Keras​

# For Regression
model.compile(optimizer='adam', loss='mean_squared_error')

# For Binary Classification (0 or 1)
model.compile(optimizer='adam', loss='binary_crossentropy')

# For Multi-class Classification (One-hot labels)
model.compile(optimizer='adam', loss='categorical_crossentropy')

5. The Loss Landscape​

If we visualize the loss function relative to two weights, it looks like a hilly terrain. Training a model is essentially the process of "walking down the hill" to find the lowest valley (The Global Minimum).

References​


Now that we have a "Loss" score, how do we actually change the weights to make that score smaller?