Logistic Regression
Logistic Regression is the go-to algorithm for binary classification (problems with two possible outcomes). While Linear Regression predicts a continuous number, Logistic Regression predicts the probability that an input belongs to a specific category.
1. The Sigmoid Function
The core difference between linear and logistic regression is the Activation Function. To turn a real-valued number into a probability between and , we use the Sigmoid (or Logistic) function.
The Formula:
Where:
- is the input (a linear combination of features).
- is Euler's number (approximately ).
Key Properties:
- If is a large positive number, approaches .
- If is a large negative number, approaches .
- If , .
2. From Linear to Logistic
Logistic Regression starts by calculating a linear combination of inputs, just like Linear Regression:
It then passes that result through the Sigmoid function to get the probability ():
3. The Decision Boundary
To make a final classification, we apply a threshold (usually ).
- If , classify as Class 1 (e.g., "Spam").
- If , classify as Class 0 (e.g., "Not Spam").
The line (or plane) where the probability is exactly is called the Decision Boundary.
4. Implementation with Scikit-Learn
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
# 1. Initialize the model
# 'liblinear' is a good solver for small datasets
model = LogisticRegression(solver='liblinear')
# 2. Train
model.fit(X_train, y_train)
# 3. Predict Class Labels
y_pred = model.predict(X_test)
# 4. Predict Probabilities
y_probs = model.predict_proba(X_test)[:, 1] # Probability of being Class 1
5. Cost Function: Log Loss
In Linear Regression, we use Mean Squared Error. However, because of the Sigmoid function, MSE would result in a non-convex function that is hard to optimize. Instead, Logistic Regression uses Log Loss (Cross-Entropy).
Log Loss penalizes the model heavily when it is confident about a wrong prediction.
6. Multi-class Classification (One-vs-Rest)
By default, Logistic Regression is binary. To handle multiple classes (e.g., classifying an image as "Cat", "Dog", or "Bird"), Scikit-Learn uses the One-vs-Rest (OvR) strategy, where it trains one binary classifier per class.
7. Pros and Cons
| Advantages | Disadvantages |
|---|---|
| Highly interpretable (you can see feature weights). | Assumes a linear relationship between features and log-odds. |
| Fast to train and predict. | Easily outperformed by more complex models (like Random Forests). |
| Provides probabilities, not just hard labels. | Can struggle with highly non-linear data. |
References for More Details
- Scikit-Learn Logistic Regression Documentation: Understanding regularization parameters like
C(inverse of regularization strength). - In this video, StatQuest provides an excellent visual explanation of Logistic Regression concepts:
Logistic Regression is a "linear" classifier. What if your data is organized like a flowchart?