As you fit regression models, you might need to make a choice between linear and nonlinear regression models. The field of statistics can be weird. Despite their names, both forms of regression can fit curvature in your data. So, how do you choose? In this blog post, I show you how to choose between linear and nonlinear regression models.
This blog post covers the basics of choosing between linear and nonlinear regression models. However, I’ve written more in-depth posts about some of the relevant issues. Please follow the links as needed.
First off, let’s cover a few basics. If the two types of regression models are not named based on their ability to fit curves, what is the difference between them?
In a nutshell, linear models must follow one very particular form:
The form is linear in the parameters because all terms are either the constant or a parameter multiplied by an independent variable (IV). A linear regression equation simply sums the terms. While the model must be linear in the parameters, you can raise an independent variable by an exponent to fit a curve. For instance, you can include a squared or cubed term.
Nonlinear regression models are anything that doesn’t follow this one form.
While both types of models can fit curvature, nonlinear regression is much more flexible in the shapes of the curves that it can fit. After all, the sky is the limit when it comes to the possible forms of nonlinear models. See the related post below for more details.
Guidelines for Choosing Between Linear and Nonlinear Regression
The general guideline is to use linear regression first to determine whether it can fit the particular type of curve in your data. If you can’t obtain an adequate fit using linear regression, that’s when you might need to choose nonlinear regression.
Linear regression is easier to use, simpler to interpret, and you obtain more statistics that help you assess the model. While linear regression can model curves, it is relatively restricted in the shapes of the curves that it can fit. Sometimes it can’t fit the specific curve in your data.
Nonlinear regression can fit many more types of curves, but it can require more effort both to find the best fit and to interpret the role of the independent variables. Additionally, R-squared is not valid for nonlinear regression, and it is impossible to calculate p-values for the parameter estimates.
Linear and Nonlinear Regression Examples
Let’s fit an example dataset using both linear and nonlinear regression. With these regression examples, I’ll show you how to determine whether linear regression provides an unbiased fit and then how to fit a nonlinear regression model to the same data. Our goal is to develop an unbiased model. These data are freely available from the NIST and pertain to the relationship between density and electron mobility. Download the CSV data file to try it yourself: ElectronMobility.
Example of a linear regression model
First, I’ll attempt to fit the curve using a linear model. Because there is only one independent variable, I can use a fitted line plot. This plot is handy because you can graph the estimated relationship along with the data. In this model, I use a cubed term to fit the curvature.
The fitted relationship in the graph follows the data fairly close and produces a high R-squared of 98.5%. Those sound great, but look more closely and you’ll notice that various places along the regression line consistently under and over-predict the observed values. This model is biased, and it illustrates a point that I make in my post about R-squared. By themselves, high R-squared values don’t necessarily indicate that you have a good model.
Because we have only one independent variable, we can plot the relationship on the fitted line plot. However, when you have more than one independent variable, you can’t use a fitted line plot and you’ll need to rely on residual plots to check the regression assumptions. For our data, the residual plots display the nonrandom patterns very clearly. You want to see random residuals.
Our linear regression model can’t adequately fit the curve in the data. There’s nothing more we can do with linear regression. Consequently, it’s time to try nonlinear regression.
Example of a nonlinear regression model
Now, let’s fit the same data but using nonlinear regression. As I mentioned earlier, nonlinear regression can be harder to perform. The fact that you can fit nonlinear models with virtually an infinite number of functional forms is both its strength and downside.
The main positive is that nonlinear regression provides the most flexible curve-fitting functionality. The downside is that it can take considerable effort to choose the nonlinear function that creates the best fit for the particular shape of the curve. Unlike linear regression, you also need to supply starting values for the nonlinear algorithm. Some datasets can require substantial effort to find acceptable starting values. For instance, some starting values can cause the algorithm to fail to converge on a solution or to converge on an incorrect solution. It’s for these reasons that I always recommend fitting linear models first.
Our example dataset is one that the NIST uses to illustrate a hard-to-fit nonlinear relationship. So, it’s no surprise that the linear model was insufficient. Because this blog post focuses on the basics of choosing between linear and nonlinear models, I’m not going to cover how the researchers chose the optimal functional form of the nonlinear model. Instead, I’ll jump to the solution and not show all the work to get there, much like a cooking show! I want you to see how the following nonlinear model compares to the linear model based on the best solution.
For our data, a rational function provides the best nonlinear fit. A rational function is the ratio of two polynomial functions. For electron mobility, the model is:
Y = (B1 + B2*x + B3*x^2 + B4*x^3) / (1 + B5*x + B6*x^2 + B7*x^3)
The equation for the nonlinear regression analysis is too long for the fitted line plot:
Electron Mobility = (1288.14 + 1491.08 * Density Ln + 583.238 * Density Ln^2 + 75.4167 * Density Ln^3) / (1 + 0.966295 * Density Ln + 0.397973 * Density Ln^2 + 0.0497273 * Density Ln^3)
Comparing the Regression Models and Making a Choice
In the fitted line plot, the nonlinear relationship follows the data almost exactly. The residual plot displays the randomness that we want to see for an unbiased model. R-squared does not appear because it is invalid for nonlinear regression. However, we can compare the standard error of the regression (S) for the two models. You want S to be smaller because it indicates that the data points are closer to the fitted line. For the linear model, S is 72.5 while for the nonlinear model it is 13.7. The nonlinear model provides a better fit because it is both unbiased and produces smaller residuals.
Nonlinear regression is a powerful alternative to linear regression but there are a few drawbacks. Fortunately, it’s not difficult to try linear regression first.
For more information about fitting curves with both linear and nonlinear regression, and comparing the results, read my post: Curve Fitting Using Linear and Nonlinear Regression. There are numerous other types of regression analysis that you can use. Read my post to learn how to choose the correct type of regression for your data.