Lasso Regression (L1 Regularization)
Lasso Regression (Least Absolute Shrinkage and Selection Operator) is a type of linear regression that uses L1 Regularization.
While standard Linear Regression tries to minimize only the error, Lasso adds a penalty equal to the absolute value of the magnitude of the coefficients. This forces the model to not only be accurate but also as simple as possible.
1. The Mathematical Objective
Lasso minimizes the following cost function:
Where:
- MSE (Mean Squared Error): Keeps the model accurate.
- (Alpha): The tuning parameter that controls the strength of the penalty.
- : The absolute value of the coefficients.
2. Feature Selection: The Power of Zero
The most significant difference between Lasso and its sibling, Ridge Regression, is that Lasso can shrink coefficients exactly to zero.
When a coefficient becomes zero, that feature is effectively removed from the model. This makes Lasso an excellent tool for:
- Automated Feature Selection: Identifying the most important variables in a dataset with hundreds of features.
- Model Interpretability: Creating "sparse" models that are easier for humans to understand.
3. Choosing the Alpha () Parameter
- If : The penalty is removed, and the result is standard Ordinary Least Squares (OLS).
- As increases: More coefficients are pushed to zero, leading to a simpler, more biased model.
- If is too high: All coefficients become zero, and the model predicts only the mean (Underfitting).
4. Implementation with Scikit-Learn
from sklearn.linear_model import Lasso
from sklearn.preprocessing import StandardScaler
# 1. Scaling is REQUIRED for Lasso
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
# 2. Initialize and Train
# 'alpha' is the regularization strength
lasso = Lasso(alpha=0.1)
lasso.fit(X_train_scaled, y_train)
# 3. Check which features were selected (non-zero)
import pandas as pd
importance = pd.Series(lasso.coef_, index=feature_names)
print(importance[importance != 0])
5. Lasso vs. Ridge
| Feature | Ridge () | Lasso () |
|---|---|---|
| Penalty | Square of coefficients | Absolute value of coefficients |
| Coefficients | Shrink towards zero, but never reach it | Can shrink exactly to zero |
| Use Case | When most features are useful | When you have many "noisy" or useless features |
| Model Type | Dense (all features kept) | Sparse (some features removed) |
6. Limitations of Lasso
- Correlated Features: If two features are highly correlated, Lasso will randomly pick one and discard the other, which can lead to instability.
- Sample Size: If , Lasso can select at most features.
References for More Details
- Scikit-Learn Lasso Documentation: Exploring
LassoCV, which automatically finds the best Alpha using cross-validation.