As you fit regression models, you might need to make a choice between linear and nonlinear regression models. The field of statistics can be weird. Despite their names, both forms of regression can fit curvature in your data. So, how do you choose? In this blog post, I show you how to choose between linear and nonlinear regression models.

This blog post covers the basics of choosing between linear and nonlinear regression models. However, I’ve written more in-depth posts about some of the relevant issues. Please follow the links as needed.

First off, let’s cover a few basics. If the two types of regression models are not named based on their ability to fit curves, what is the difference between them?

In a nutshell, linear models must follow one very particular form:

Dependent variable = constant + parameter * IV + … + parameter * IV

The form is linear in the parameters because all terms are either the constant or a parameter multiplied by an independent variable (IV). A linear regression equation simply sums the terms. While the model must be linear in the parameters, you can raise an independent variable by an exponent to fit a curve. For instance, you can include a squared or cubed term.

Nonlinear regression models are anything that doesn’t follow this one form.

While both types of models can fit curvature, nonlinear regression is much more flexible in the shapes of the curves that it can fit. After all, the sky is the limit when it comes to the possible forms of nonlinear models. See the related post below for more details.

**Related posts**: The Difference Between Linear and Nonlinear Regression Models and When Should I Use Regression Analysis?

## Guidelines for Choosing Between Linear and Nonlinear Regression

The general guideline is to use linear regression first to determine whether it can fit the particular type of curve in your data. If you can’t obtain an adequate fit using linear regression, that’s when you might need to choose nonlinear regression.

Linear regression is easier to use, simpler to interpret, and you obtain more statistics that help you assess the model. While linear regression can model curves, it is relatively restricted in the shapes of the curves that it can fit. Sometimes it can’t fit the specific curve in your data.

Nonlinear regression can fit many more types of curves, but it can require more effort both to find the best fit and to interpret the role of the independent variables. Additionally, R-squared is not valid for nonlinear regression, and it is impossible to calculate p-values for the parameter estimates.

## Linear and Nonlinear Regression Examples

Let’s fit an example dataset using both linear and nonlinear regression. With these regression examples, I’ll show you how to determine whether linear regression provides an unbiased fit and then how to fit a nonlinear regression model to the same data. Our goal is to develop an unbiased model. These data are freely available from the NIST and pertain to the relationship between density and electron mobility. Download the CSV data file to try it yourself: ElectronMobility.

### Example of a linear regression model

First, I’ll attempt to fit the curve using a linear model. Because there is only one independent variable, I can use a fitted line plot. This plot is handy because you can graph the estimated relationship along with the data. In this model, I use a cubed term to fit the curvature.

The fitted relationship in the graph follows the data fairly close and produces a high R-squared of 98.5%. Those sound great, but look more closely and you’ll notice that various places along the regression line consistently under and over-predict the observed values. This model is biased, and it illustrates a point that I make in my post about R-squared. By themselves, high R-squared values don’t necessarily indicate that you have a good model.

Because we have only one independent variable, we can plot the relationship on the fitted line plot. However, when you have more than one independent variable, you can’t use a fitted line plot and you’ll need to rely on residual plots to check the regression assumptions. For our data, the residual plots display the nonrandom patterns very clearly. You want to see random residuals.

Our linear regression model can’t adequately fit the curve in the data. There’s nothing more we can do with linear regression. Consequently, it’s time to try nonlinear regression.

**Related post**: Seven Classical Assumptions of OLS Linear Regression

### Example of a nonlinear regression model

Now, let’s fit the same data but using nonlinear regression. As I mentioned earlier, nonlinear regression can be harder to perform. The fact that you can fit nonlinear models with virtually an infinite number of functional forms is both its strength and downside.

The main positive is that nonlinear regression provides the most flexible curve-fitting functionality. The downside is that it can take considerable effort to choose the nonlinear function that creates the best fit for the particular shape of the curve. Unlike linear regression, you also need to supply starting values for the nonlinear algorithm. Some datasets can require substantial effort to find acceptable starting values. For instance, some starting values can cause the algorithm to fail to converge on a solution or to converge on an incorrect solution. It’s for these reasons that I always recommend fitting linear models first.

Our example dataset is one that the NIST uses to illustrate a hard-to-fit nonlinear relationship. So, it’s no surprise that the linear model was insufficient. Because this blog post focuses on the basics of choosing between linear and nonlinear models, I’m not going to cover how the researchers chose the optimal functional form of the nonlinear model. Instead, I’ll jump to the solution and not show all the work to get there, much like a cooking show! I want you to see how the following nonlinear model compares to the linear model based on the best solution.

For our data, a rational function provides the best nonlinear fit. A rational function is the ratio of two polynomial functions. For electron mobility, the model is:

Y = (B1 + B2*x + B3*x^2 + B4*x^3) / (1 + B5*x + B6*x^2 + B7*x^3)

The equation for the nonlinear regression analysis is too long for the fitted line plot:

Electron Mobility = (1288.14 + 1491.08 * Density Ln + 583.238 * Density Ln^2 + 75.4167 * Density Ln^3) / (1 + 0.966295 * Density Ln + 0.397973 * Density Ln^2 + 0.0497273 * Density Ln^3)

## Comparing the Regression Models and Making a Choice

In the fitted line plot, the nonlinear relationship follows the data almost exactly. The residual plot displays the randomness that we want to see for an unbiased model. R-squared does not appear because it is invalid for nonlinear regression. However, we can compare the standard error of the regression (S) for the two models. You want S to be smaller because it indicates that the data points are closer to the fitted line. For the linear model, S is 72.5 while for the nonlinear model it is 13.7. The nonlinear model provides a better fit because it is both unbiased and produces smaller residuals.

Nonlinear regression is a powerful alternative to linear regression but there are a few drawbacks. Fortunately, it’s not difficult to try linear regression first.

For more information about fitting curves with both linear and nonlinear regression, and comparing the results, read my post: Curve Fitting Using Linear and Nonlinear Regression. There are numerous other types of regression analysis that you can use. Read my post to learn how to choose the correct type of regression for your data.

If you’re learning regression, check out my Regression Tutorial!

chris says

This is a great explanation, thanks!

Jim Frost says

Hi Chris, I’m glad it was helpful!

Fariba Heidarian says

Hi, Thank you it is really useful. I have a question, in my case I have 5 variable, one of them is dependent variable. I don’t know which relation between the variable might be exist. Actually I am looking for it.

Which independt variable with response variable should I choose to curve fitting?

Jim Frost says

Hi Fariba,

Unfortunately, there is no way to determine which types of relationships exist without taking a very close look at the data. A great place to start is with graphs. Use a scatterplot to graph the relationship between the dependent variable and each independent variable. You can look for curvature in these graphs. These graphs help you determine which variables have curved relationships and the type of curve. Based on the scatterplots, you can fit a model accordingly. It might well take some trial-and-error.

For more information about this topic, read my post about fitting curves, which shows you how to fit specific types of curves.

After you fit a model, be sure to check the residual plots. These plots help you determine whether you’re fitting the curves correctly.

Best of luck!

Patrik Silva says

Hi Jim,

In the following equation:

Mobility = 1243 + 412.3 Ln(Density)

Do we still interpreting the B1 coefficient (412.3) like always or we need to take the exponential of B1 to convert it back to the original unit, to make it meaningful and then interpret it?

My second question is related to the following situation:

Is it Ok to use as independent variables a data that are scaled as a rate (%)? For example, let suppose we are modeling crimes in a particular city, we have in the same model 2 independent variables, let’s say population and unemployed people in their absolute. I am thinking that if I use both in the same model we might have some high correlation between population and unemployed people (and causing a multicollinearity problem), since that zones with more population tend to have also more people unemployed. Is use unemployment rate as independent variables better than use the absolute value of unemployed people?

And then, the third question:

Here, I would like if you can try to think about the geography of the study area, because also the size of the zone (area Km2) area normally related to the population that the zones have.

How could area of the zone be useful for this model?

I don’t know if I was clear in my question.

Thank you in advance,

Patrik Silva

Jim Frost says

Hi again Patrik,

Excellent questions!

Regarding the electron mobility equation, I should first note that the equation you refer to is just a portion of the linear equation that the model uses. The full equation is a cubic model (i.e., it also includes the squared and cubed terms). And, the linear model didn’t fit the data as well as the nonlinear model.

So, with that in mind, the linear model does use the natural log, but only on the independent variable side of things. Consequently, you’d need to take the log of the value of the independent variable but the value that the equation calculates is in the natural units for electron mobility. So, no back transformation is necessary for this model. And, yes, the coefficient applies to the transformed value of the density value rather than the natural units because that variable has been transformed. For models with transformed values, it’s important to note specifically which values have been transformed and which ones have not!

For your second question, yes, you’d likely have a correlation between those two variables–but those variables might cause other problems as well. In this situation, you might try including unemployment

rateas a way of including both variables (rather than the raw numbers for both). I actually talk about this approach in more detail in my post about heteroscedasticity in regression models. Heteroscedasticity refers to residuals that have unequal variance. Cross sectional studies like your are more likely to produce heteroscedasticity if variables like these have a wide range of values. And, as you mention, there is also the possibility for these independent variables to be correlated (multicollinearity), which causes its own problems!I don’t know enough about the subject area to make a recommendation about zones. It sounds like it might be important. I’d recommend seeing how others in the area have included this information in their models.

I hope this helps!

Patrik Silva says

Thank you again Jim,

I was waiting for you, this is a very good feedback.

It helps a lot, at least now I can continue working in my data.

Patrik

nourhane houssam says

hi jim I have a question what if the dependent variable is a ratio which model or method could be applicable and the dependent variable is the secondary school enrollment ratio regress on five independent worker remittances, gdp, government expenditure on education,skilled migration rate and the squared skilledml migration rate waiting for your reply and help and thank you in advance.

carlo debeerst says

Hi, thanks for the helpfull website,

I have some questions on non- lineair regression. My goal is to make a model with 10 possible predictor variabeles(continious) and one outcome variable( continious). . Some predictors have a possible quadratic effect sugested by the literature.

My goal is to develop the multiple regression thats fits the data best. This mean that the model have both linear and quadratic predictors in it.

-the first step I took was to made a new variable for each posible quadratic predictor by taking the square of it. (So for the quadratic predictors are there now 2 predictors)

-as I use SPSS, i took the function, analyse, regression, lineair and put al lineair and quadratic terms in. I uses the functions “backward” and “forward” , so that resulted in only significant predictors for the model to predict de OV.

-my question, some of the predictors have only the quadratic term, and others have both the linear and quadratic term in it . What is the difference between a predictor with a significant quadratic term verus a predictor with both a significant linear and quadratic term in the equation?

-Is this a good aproach or is it better to force both the linear and quadratic terms for the predictors with possible quadratic effect with the function “enter” in the same menu of SPSS?

Thanks, and my excuses for my bad English.

Jim Frost says

Hi Carlo,

A quick note about terminology before I get to your questions. In statistics, linear and nonlinear has a very specialized meaning when it comes to regression models. Both can fit curves. The difference is the functional form of the model. The type of model that you are referring to is technically a linear model that uses polynomials (quadratics) to model curvature. For more information about this issue, see: The Difference between Linear and Nonlinear Models.

The linear term and the quadratic term collectively describe the shape and the orientation of the curve. Sometimes you need both terms to adequately describe the curvature while other times you just need one of them. If the linear term isn’t significant, there is nothing wrong with excluding it from the model. However, there is a tradition of including all lower-order terms that comprise a higher-order term. This approach is known as specifying a hierarchical model.

However, regression models can be non-hierarchical. Generally, you can exclude the lower-order terms when they are not significant, unless theory suggests that you include them. Models that contain too many terms can be relatively imprecise and decrease the model’s capability to predict new observations.

This is an area where statisticians probably disagree and you can make the case either way. Be sure to check your residual plots to be sure that you don’t see patterns when you remove the linear terms. You can also check the standard error of the regression to see if including the extra terms that are not significant reduces the precision of your model.