Polynomial Regression Explained

Polynomial regression is a useful algorithm for machine learning that can be surprisingly powerful. This post will show you what polynomial regression is and how to implement it, in Python, using scikit-learn.

This post is a continuation of linear regression explained and multiple linear regression explained. If you are not familiar with linear regression for a single variable or multiple linear regression with multiple variables, please read those posts then come back (they’re not too long).

Basics of polynomial regression

The goal of polynomial regression is to fit a curve to data with a non-linear relationship between the independent variables and the output variable.

Such as with the image below:

This image has an empty alt attribute; its file name is image-4.png

When to use it

Polynomial regression can be used when the independent variables (the factors you are using to predict with) each have a non-linear relationship with the output variable (what you want to predict).

So, the equation between the independent variables (the X values) and the output variable (the Y value) is of the form Y= θ0+θ1X1+θ2X1^2

How it works

With linear regression for a single variable, our goal was to find optimum values for θ0 and θ1 in the equation Y=θ0+θ1X1 that allows us to fit the best possible line through the data.

With multiple linear regression, our goal was to find optimum values for θ0, θ1,…,θn in the equation
Y= θ0+θ1X1+θ2X2+…+θnXn
where n is the number of different feature variables.

With polynomial regression, there is a non-linear relationship between the independent variable and the output variable.

If we use the linear regression equation for the following graph:

This image has an empty alt attribute; its file name is image-2.png

We might end up with something that looks like this:

This image has an empty alt attribute; its file name is image-3.png

Instead, we want to fit a curve that looks more like this:

This image has an empty alt attribute; its file name is image-4.png

So, we will take the linear regression equation and add a polynomial feature on at the end giving us the equation Y=θ0+θ1X1+θ2X1^2
This will allow us to fit a non-linear curve to the data.
If we set X2 = X1^2 we get the following equation:
Y=θ0+θ1X1+θ2X2
Which is just like the multiple linear regression equation. Now, we can find the optimal values for θ by using the same approach that we used for multiple linear regression.

So, we will set the initial weights = 0, and then use gradient descent to optimize them in the same way we did for multiple linear regression.

What about multiple features?

It is possible to do polynomial regression when there are multiple features. Say you have two features X1 and X2, and you find that they have a non-linear relationship with the output variable then you could have the equation Y= θ0+θ1X1+θ2X2 +θ3X1^2+θ4X2^2+θ5X1X2 and substitute X3=X1^2, X4=X2^2 and X5=X1X2 giving you the equation Y= θ0+θ1X1+θ2X2 +θ3X3+θ2X4+θ0X5.

The reason why θ5X1X2 is included is that it addresses the relationship between X1 and X2, this is something that regular linear regression does not do and can result in a more powerful model. However, this could result in overfitting. By default, PolynomialFeatures from the scikit-learn library adds all combinations of features up to the specified degree.

Is feature scaling necessary?

Yes, feature scaling is necessary for the same reasons discussed in this post. However, in this case, feature scaling is even more important because adding polynomial features will increase the differences between features even further resulting in gradient descent taking even longer.

Code example

Below is an example of how to implement polynomial regression, in Python, using scikit-learn.

import numpy as np
import matplotlib.pyplot as plt

X_values = np.arange(0,5.1,0.1).reshape(-1,1)
Y_values = 4 + 5 * (X_values**2) + 15*np.random.rand(51,1)

plt.scatter(X_values, Y_values)
plt.xlabel(“$x_1$”, fontsize=18)
plt.ylabel(“$y$”, rotation=0, fontsize=18)
plt.show()

Below, is a linear version so that we have something to compare to.

from sklearn.linear_model import LinearRegression

reg = LinearRegression().fit(X_values, Y_values)

reg.intercept_, reg.coef_

(array([-9.9391395]), array([[25.66326965]]))

This represents the equation Y_hat = -9.9391395 + 25.66326965*X1

plt.scatter(X_values, Y_values)
plt.plot(X_values, reg.predict(X_values), color=”r”)
plt.xlabel(“$x_1$”, fontsize=18)
plt.ylabel(“$y$”, rotation=0, fontsize=18)
plt.show()

Now, we will create the polynomial version.

from sklearn.preprocessing import PolynomialFeatures
poly_features = PolynomialFeatures(degree=2)
X_poly = poly_features.fit_transform(X_values)
reg2 = LinearRegression().fit(X_poly,Y_values)

First, it is necessary to add the squared feature to the X_values which is what lines 2 and 3 above are doing. After adding the squared feature, we can fit a regular linear model to the data.

reg2.intercept_, reg2.coef_

(array([9.64161002]), array([[0. , 1.68684166, 4.7952856 ]]))

This represents the following equation:
Y_hat = 9.64161002 +0*X1^0 + 1.68684166*X1^1 + 4.7952856*X1^2

plt.scatter(X_values, Y_values)
plt.plot(X_values, reg2.predict(X_poly), color=”r”)
plt.xlabel(“$x_1$”, fontsize=18)
plt.ylabel(“$y$”, rotation=0, fontsize=18)
plt.show()

As you can see, the polynomial version fits the data much better.

Sources

Andrew Ng’s machine learning course on Coursera
Aurélien Géron’s Hands on machine learning with scikit-learn and tensorflow
https://scikit-learn.org