Skip to main content

Polynomial Regression: Beyond Straight Lines

Polynomial Regression is a form of regression analysis in which the relationship between the independent variable xx and the dependent variable yy is modelled as an nthn^{th} degree polynomial.

While it fits a non-linear model to the data, as a statistical estimation problem, it is still considered linear because the regression function is linear in terms of the unknown parameters (β\beta) that are estimated from the data.

1. Why use Polynomial Regression?

Linear regression requires a straight-line relationship. However, real-world data often follows curves, such as:

  • Growth Rates: Biological growth or interest rates.
  • Physics: The path of a projectile or the relationship between speed and braking distance.
  • Economics: Diminishing returns on investment.

2. The Mathematical Equation

In a simple linear model, we have:

y=β0+β1x1y = \beta_0 + \beta_1x_1

In Polynomial Regression, we add higher-degree terms of the same feature:

y=β0+β1x+β2x2+β3x3+...+βnxn+ϵy = \beta_0 + \beta_1x + \beta_2x^2 + \beta_3x^3 + ... + \beta_nx^n + \epsilon

Where:

  • yy: The dependent variable (Target).
  • xx: The independent variable (Feature).
  • β0\beta_0: The Intercept.
  • β1,β2,...,βn\beta_1, \beta_2, ..., \beta_n: The Coefficients for each polynomial term.
  • ϵ\epsilon: The error term (Residual).

By treating x2,x3,...x^2, x^3, ... as distinct features, we allow the model to "bend" to fit the data points.

3. The Danger of Degree: Overfitting

Choosing the right degree (nn) is the most critical part of Polynomial Regression:

  • Underfitting (Degree 1): A straight line that fails to capture the curve in the data.
  • Optimal Fit (Degree 2 or 3): A smooth curve that captures the general trend.
  • Overfitting (Degree 10+): A wiggly line that passes through every single data point but fails to predict new data because it has captured the noise instead of the signal.

4. Implementation with Scikit-Learn

In Scikit-Learn, we perform Polynomial Regression by using a Transformer to generate new features and then passing them to a standard LinearRegression model.

Polynomial Regression with Scikit-Learn
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.pipeline import make_pipeline

# 1. Generate data (Example: a parabola)
# X, y = ...

# 2. Create a pipeline that:
# a) Generates polynomial terms (x^2)
# b) Fits a linear regression to those terms
degree = 2
poly_model = make_pipeline(PolynomialFeatures(degree), LinearRegression())

# 3. Train the model
poly_model.fit(X, y)

# 4. Predict
y_pred = poly_model.predict(X)

5. Feature Scaling is Mandatory

When you square or cube features, the range of values expands drastically.

  • If , then and .
  • If , then and .

Because of this explosive growth, you should always scale your features (e.g., using StandardScaler) before or after applying polynomial transformations to prevent numerical instability.

6. Pros and Cons

AdvantagesDisadvantages
Can model complex, non-linear relationships.Extremely sensitive to outliers.
Broad range of functions can be mapped under it.High risk of overfitting if the degree is too high.
Fits into the linear regression framework.Becomes computationally expensive with many features.

References for More Details


Polynomial models can easily become too complex and overfit. How do we keep the model's weights in check?