Skip to main content

Logistic Regression

Logistic Regression is the go-to algorithm for binary classification (problems with two possible outcomes). While Linear Regression predicts a continuous number, Logistic Regression predicts the probability that an input belongs to a specific category.

1. The Sigmoid Function

The core difference between linear and logistic regression is the Activation Function. To turn a real-valued number into a probability between 00 and 11, we use the Sigmoid (or Logistic) function.

The Formula:

σ(z)=11+ez\sigma(z) = \frac{1}{1 + e^{-z}}

Where:

  • zz is the input (a linear combination of features).
  • ee is Euler's number (approximately 2.718282.71828).

Key Properties:

  • If zz is a large positive number, σ(z)\sigma(z) approaches 11.
  • If zz is a large negative number, σ(z)\sigma(z) approaches 00.
  • If z=0z = 0, σ(z)=0.5\sigma(z) = 0.5.

2. From Linear to Logistic

Logistic Regression starts by calculating a linear combination of inputs, just like Linear Regression:

z=β0+β1x1+β2x2+...z = \beta_0 + \beta_1x_1 + \beta_2x_2 + ...

It then passes that result through the Sigmoid function to get the probability (pp):

p=σ(z)p = \sigma(z)

3. The Decision Boundary

To make a final classification, we apply a threshold (usually 0.50.5).

  • If p0.5p \geq 0.5, classify as Class 1 (e.g., "Spam").
  • If p<0.5p < 0.5, classify as Class 0 (e.g., "Not Spam").

The line (or plane) where the probability is exactly 0.50.5 is called the Decision Boundary.

4. Implementation with Scikit-Learn

from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split

# 1. Initialize the model
# 'liblinear' is a good solver for small datasets
model = LogisticRegression(solver='liblinear')

# 2. Train
model.fit(X_train, y_train)

# 3. Predict Class Labels
y_pred = model.predict(X_test)

# 4. Predict Probabilities
y_probs = model.predict_proba(X_test)[:, 1] # Probability of being Class 1

5. Cost Function: Log Loss

In Linear Regression, we use Mean Squared Error. However, because of the Sigmoid function, MSE would result in a non-convex function that is hard to optimize. Instead, Logistic Regression uses Log Loss (Cross-Entropy).

Log Loss penalizes the model heavily when it is confident about a wrong prediction.

J(θ)=1mi=1m[y(i)log(y^(i))+(1y(i))log(1y^(i))]J(\theta) = -\frac{1}{m} \sum_{i=1}^{m} [y^{(i)} \log(\hat{y}^{(i)}) + (1 - y^{(i)}) \log(1 - \hat{y}^{(i)})]

6. Multi-class Classification (One-vs-Rest)

By default, Logistic Regression is binary. To handle multiple classes (e.g., classifying an image as "Cat", "Dog", or "Bird"), Scikit-Learn uses the One-vs-Rest (OvR) strategy, where it trains one binary classifier per class.

7. Pros and Cons

AdvantagesDisadvantages
Highly interpretable (you can see feature weights).Assumes a linear relationship between features and log-odds.
Fast to train and predict.Easily outperformed by more complex models (like Random Forests).
Provides probabilities, not just hard labels.Can struggle with highly non-linear data.

References for More Details

  • Scikit-Learn Logistic Regression Documentation: Understanding regularization parameters like C (inverse of regularization strength).
  • In this video, StatQuest provides an excellent visual explanation of Logistic Regression concepts:


Logistic Regression is a "linear" classifier. What if your data is organized like a flowchart?