Derivatives - The Rate of Change
Calculus is the mathematical foundation for optimization in Machine Learning. Specifically, Derivatives are the primary tool used to train almost every ML model, from Linear Regression to complex Neural Networks, via algorithms like Gradient Descent.
1. What is a Derivative?
The derivative of a function measures the instantaneous rate of change of that function. Geometrically, the derivative at any point on a curve is the slope of the tangent line to the curve at that point.
Formal Definition
The derivative of a function with respect to is defined using limits:
- is the common notation, read as "the derivative of with respect to ."
- The expression is the slope of the secant line between and .
- Taking the limit as approaches zero gives the exact slope of the tangent line at .
2. Derivatives in Machine Learning: Optimization
In Machine Learning, we define a Loss Function (or Cost Function) which measures the error of our model, where represents the model's parameters (weights and biases).
The goal of training is to find the parameter values that minimize the loss function.
A. Finding the Minimum
- A function's minimum (or maximum) occurs where the slope is zero.
- The derivative tells us the slope.
- Therefore, by setting the derivative to zero, we can find the optimal parameters .
B. Gradient Descent
For most complex ML models, the loss function is too complex to solve by setting the derivative to zero directly. Instead, we use an iterative process called Gradient Descent.
The derivative tells us two things:
- Magnitude: How steep the slope is (how quickly the loss is changing).
- Direction (Sign): Whether moving parameter in a positive direction will increase or decrease the loss.
In Gradient Descent, we update the parameter in the opposite direction of the derivative (down the slope) to find the minimum:
- (alpha) is the learning rate (a small scalar).
- is the derivative (the slope/gradient).
3. Basic Differentiation Rules
You must be familiar with the following rules to understand how derivatives are calculated for model training.
| Rule Name | Function | Derivative | Example |
|---|---|---|---|
| Constant Rule | |||
| Power Rule | |||
| Constant Multiple | |||
| Sum/Difference | |||
| Exponential |
Example: Quadratic Loss
Linear Regression often uses Mean Squared Error (MSE), which is a quadratic function of the weights .
Let the simplified loss function be . We apply the Sum and Power Rules:
If the current weight is , the slope is (steep, positive).
References and Resources
To solidify your understanding of differentiation, here are some excellent resources:
- Khan Academy - Differential Calculus: Comprehensive video tutorials covering limits, derivatives, and rules. Excellent for visual learners.
- Calculus: Early Transcendentals by James Stewart (or any similar major textbook): Provides rigorous definitions and practice problems.
- The Calculus of Computation by Lars Kristensen: A good resource that connects calculus directly to computational methods.
Most functions in ML depend on more than one parameter (e.g., ). To find the slope in these multi-variable spaces, we must use Partial Derivatives.