How to Choose Between Linear and Nonlinear Regression

By Jim Frost 32 Comments

As you fit regression models, you might need to make a choice between linear and nonlinear regression models. The field of statistics can be weird. Despite their names, both forms of regression can fit curvature in your data. So, how do you choose? In this blog post, I show you how to choose between linear and nonlinear regression models.

This blog post covers the basics of choosing between linear and nonlinear regression models. However, I’ve written more in-depth posts about some of the relevant issues. Please follow the links as needed.

First off, let’s cover a few basics. If the two types of regression models are not named based on their ability to fit curves, what is the difference between them?

In a nutshell, linear models must follow one very particular form:

Dependent variable = constant + parameter * IV + … + parameter * IV

The form is linear in the parameters because all terms are either the constant or a parameter multiplied by an independent variable (IV). A linear regression equation simply sums the terms. While the model must be linear in the parameters, you can raise an independent variable by an exponent to fit a curve. For instance, you can include a squared or cubed term.

Nonlinear regression models are anything that doesn’t follow this one form.

While both types of models can fit curvature, nonlinear regression is much more flexible in the shapes of the curves that it can fit. After all, the sky is the limit when it comes to the possible forms of nonlinear models. See the related post below for more details.

Guidelines for Choosing Between Linear and Nonlinear Regression

The general guideline is to use linear regression first to determine whether it can fit the particular type of curve in your data. If you can’t obtain an adequate fit using linear regression, that’s when you might need to choose nonlinear regression.

Linear regression is easier to use, simpler to interpret, and you obtain more statistics that help you assess the model. While linear regression can model curves, it is relatively restricted in the shapes of the curves that it can fit. Sometimes it can’t fit the specific curve in your data.

Nonlinear regression can fit many more types of curves, but it can require more effort both to find the best fit and to interpret the role of the independent variables. Additionally, R-squared is not valid for nonlinear regression, and it is impossible to calculate p-values for the parameter estimates.

Linear and Nonlinear Regression Examples

Let’s fit an example dataset using both linear and nonlinear regression. With these regression examples, I’ll show you how to determine whether linear regression provides an unbiased fit and then how to fit a nonlinear regression model to the same data. Our goal is to develop an unbiased model. These data are freely available from the NIST and pertain to the relationship between density and electron mobility. Download the CSV data file to try it yourself: ElectronMobility.

Example of a linear regression model

First, I’ll attempt to fit the curve using a linear model. Because there is only one independent variable, I can use a fitted line plot. This plot is handy because you can graph the estimated relationship along with the data. In this model, I use a cubed term to fit the curvature.

The fitted relationship in the graph follows the data fairly close and produces a high R-squared of 98.5%. Those sound great, but look more closely and you’ll notice that various places along the regression line consistently under and over-predict the observed values. This model is biased, and it illustrates a point that I make in my post about R-squared. By themselves, high R-squared values don’t necessarily indicate that you have a good model.

Because we have only one independent variable, we can plot the relationship on the fitted line plot. However, when you have more than one independent variable, you can’t use a fitted line plot and you’ll need to rely on residual plots to check the regression assumptions. For our data, the residual plots display the nonrandom patterns very clearly. You want to see random residuals.

Our linear regression model can’t adequately fit the curve in the data. There’s nothing more we can do with linear regression. Consequently, it’s time to try nonlinear regression.

Example of a nonlinear regression model

Now, let’s fit the same data but using nonlinear regression. As I mentioned earlier, nonlinear regression can be harder to perform. The fact that you can fit nonlinear models with virtually an infinite number of functional forms is both its strength and downside.

The main positive is that nonlinear regression provides the most flexible curve-fitting functionality. The downside is that it can take considerable effort to choose the nonlinear function that creates the best fit for the particular shape of the curve. Unlike linear regression, you also need to supply starting values for the nonlinear algorithm. Some datasets can require substantial effort to find acceptable starting values. For instance, some starting values can cause the algorithm to fail to converge on a solution or to converge on an incorrect solution. It’s for these reasons that I always recommend fitting linear models first.

Our example dataset is one that the NIST uses to illustrate a hard-to-fit nonlinear relationship. So, it’s no surprise that the linear model was insufficient. Because this blog post focuses on the basics of choosing between linear and nonlinear models, I’m not going to cover how the researchers chose the optimal functional form of the nonlinear model. Instead, I’ll jump to the solution and not show all the work to get there, much like a cooking show! I want you to see how the following nonlinear model compares to the linear model based on the best solution.

For our data, a rational function provides the best nonlinear fit. A rational function is the ratio of two polynomial functions. For electron mobility, the model is:

Y = (B1 + B2*x + B3*x^2 + B4*x^3) / (1 + B5*x + B6*x^2 + B7*x^3)

The equation for the nonlinear regression analysis is too long for the fitted line plot:

Electron Mobility = (1288.14 + 1491.08 * Density Ln + 583.238 * Density Ln^2 + 75.4167 * Density Ln^3) / (1 + 0.966295 * Density Ln + 0.397973 * Density Ln^2 + 0.0497273 * Density Ln^3)

Comparing the Regression Models and Making a Choice

In the fitted line plot, the nonlinear relationship follows the data almost exactly. The residual plot displays the randomness that we want to see for an unbiased model. R-squared does not appear because it is invalid for nonlinear regression. However, we can compare the standard error of the regression (S) for the two models. You want S to be smaller because it indicates that the data points are closer to the fitted line. For the linear model, S is 72.5 while for the nonlinear model it is 13.7. The nonlinear model provides a better fit because it is both unbiased and produces smaller residuals.

Nonlinear regression is a powerful alternative to linear regression but there are a few drawbacks. Fortunately, it’s not difficult to try linear regression first.

For more information about fitting curves with both linear and nonlinear regression, and comparing the results, read my post: Curve Fitting Using Linear and Nonlinear Regression. There are numerous other types of regression analysis that you can use. Read my post to learn how to choose the correct type of regression for your data.

If you’re learning regression, check out my Regression Tutorial!

Comments

David McCall says

July 3, 2021 at 8:41 pm

Hi Jim,
Thank you very much for this post. Im new to this and I am seeking some guidance if possible. I have two sets of data plotted together. Lets say one spectrum represents the Measured data and the other one represents the Simulated data.

So,
X= Simulated;
Y = Measured;

Now, I want to minimize the difference between them. How do I implement the nonlinear least square in them? Im working with Matlab and there are builtin functions but really not sure how to plug them in.

Thank you very much for your help in advance.

Reply
Mert says

January 13, 2021 at 9:31 am

Hi Jim, just wanted to thank you for your very intuitive explanation. Guys like you make our student life much easier. Really appreciate your post. Keep going.

Best,
Mert

Reply
- Jim Frost says
  
  January 14, 2021 at 2:06 pm
  
  Hi Mert,
  
  Thanks so much for writing! I’m so glad to hear that my blog has been helpful! 🙂
  
  Reply
Chio M says

August 21, 2020 at 12:47 pm

Hi Jim,
Just reading your post as I am exploring the possibility of non linear regression in my model .
However I do have a question about your example. Isn’t the non-linear model in your exampl over fitting the data? I mean, as far as I was aware we don’t want to have such an exact match either because if we get new data then the model wont be able to predict new values effectively.

Thanks
Chio M

Reply
- Jim Frost says
  
  August 24, 2020 at 1:01 am
  
  Hi Chio,
  
  That’s a very astute observation! And, I’m glad you’re asking questions like that. Throughout my posts, I warn people against results that are too good to be true, overfitting, etc.
  
  In this case, these data are from a study about electron mobility. This is one of the datasets that the National Institute of Standards and Technology (NIST) uses to evaluate how well statistical software can fit nonlinear models. As you can see, NIST has a certified model and values for these data.
  
  The reason the model fits so well is because the scientists were measuring physical properties in a well controlled, low noise environment. That’s one scenario where it’s not surprising to obtain a very, very good fit.
  
  Reply
Bryan says

September 18, 2019 at 3:52 pm

Suppose I have data at three time points. Linear models don’t look pretty. A second-order polynomial looks prettier, but I have no theoretical basis for imposing a second-order polynomial on the data. How do I justify using the second order polynomial beyond “it fits better”, when there’s nothing in the biochemistry to explain “it fits better”?

Reply
Tess nachor says

June 11, 2019 at 2:11 am

hi sir,

my question is related to this post. did you mean here that if the predictors are log-transformed (Ln), no back transformation is needed ? also , i would like to ask if when you use WLS regression, can you use log-transformed predictors or there is no need to log transform? thank you.

Tess

Reply
Mosisa says

December 28, 2018 at 7:01 am

Really it is useful. You explain in simple and understandable way. Thank you very much. Now i have got the concept.

Reply
McPhee says

December 15, 2018 at 8:05 am

Hi Jim,

I’d like to ask whether I can use OLS estimation where my dependent variable is measured in percentages like 0% – 100%. I’m confused whether it should be estimated as linear or non-linear regression.

Excellent work on this blog, by the way!!!

McPhee

Reply
Bridget says

October 25, 2018 at 10:23 am

I have an R-square of .694, but further analysis has revealed the relationship between IV and DV is inverted u-shaped. Why is r-square still relatively high?

Reply
- Jim Frost says
  
  October 25, 2018 at 10:47 am
  
  Hi Bridget,
  
  Even though a straight line fit isn’t the best, it is apparently still explaining some of the variance in your DV. For example, suppose that you have an inverted U shape but that as X increases, there is an overall tendency for Y to also increase. There’s an inverted U in there, but it is a slanted U. A fitted line with a positive slope will explain some of that relationship. However, the predictions will be biased because you’re fitting a U with a line!
  
  Additionally, the shape of the U makes a difference. The flatter the U, the better a straight line will fit it. You might not have a really sharp U shape.
  
  It sounds like you need to include a quadratic term. That’ll increase your R-squared!
  
  Reply
Sina says

October 22, 2018 at 11:56 am

Thank you Jim,

So the linear regression type in most statistic tools such as SPSS and Minitab are actually the linear least squares regression!
Now i know i should persist in the non-linear model which resulted much better predictions regarding my specific loss value and this was very helpful.

Sina

Reply
- Jim Frost says
  
  October 22, 2018 at 1:37 pm
  
  Yes, that’s the most common type by far. Stats packages will offer some variations, such as weighted least squares (WLS), but OLS is the most common. It does sound like nonlinear regression is probably better for your case. Best of luck with your analysis!
  
  Reply
Sina says

October 22, 2018 at 10:02 am

Hi Jim,

Thank you for your comprehensive and useful blog,

I have a question i couldn’t find the answer in your posts.
Consider a multiple linear regression model in solubility prediction, I ran this model with non-linear regression in SPSS. by default non linear regression tries to minimize the “sum of squared residuals loss” but i want to define my specific loss value(also called cost function) to the model to minimize it for me. I did this successfully in non-linear regression model, but I couldn’t in the linear one.
Is it possible to define a loss value in a multiple linear regression model?
I couldn’t understand how is the error minimization algorithm works in multiple linear regression comparing to the algorithm of the non-linear regression(converging to a minimum loss value).

Thank you in advance,
Sina

Reply
- Jim Frost says
  
  October 22, 2018 at 11:29 am
  
  Hi Sina,
  
  The most common type of linear model by far is ordinary least squares (OLS). By definition, OLS uses a specific method that minimizes the sum of the square residuals. There might be a different type of linear model that does what you need it to, but I’m unfamiliar with it. If there is, I suspect it is a very specialized analysis and you might need to contact a statistician to help you perform this analysis. Sorry I couldn’t be of more help.
  
  Reply
Darren says

October 11, 2018 at 10:33 pm

Hi Jim,

We have data that if it is via a true biological interaction will fit a hyperbola, whilst if the interaction is just due to natural collision should fit a straight line. As with most biological data it is never as straight forward and you could fit either model to the data, therefore do you know if it is possible using something like GraphPad Prism, SPSS, or even excel, where you can ask whether a non-linear regression or linear regression analysis best fits the data set? Basically taking the human bias out of the equation.

Thanks
Darren

Reply
- Jim Frost says
  
  October 12, 2018 at 2:53 pm
  
  Hi Darren, you mention interaction in your comment, but it sounds like your asking about modeling curvature, so I’ll talk about curve fitting.
  
  I’m pretty sure that I read that JMP has such a feature that helps fit the correct function. Although, I’m not 100% sure about that. I don’t know about the other software packages.
  
  However, this issue can be a tricky one. It’s not always the best practice to remove humans from the equation. Any algorithm will be going by quantitative measures alone without regard to the underlying subject area knowledge. It’s often possible to find a model that fits better than it should in the real world. Algorithms can find these too good to be true models with ease by using powerful computers to fit many different models.
  
  My advice is that if you use these model selection procedures, use them only as a starting point in the early stages of model building. And, carefully examine the models they produce and see if they match theory.
  
  As for the specific question of linear vs nonlinear regression, and evaluating the fit of different models, read my post about Curve Fitting Using Linear and Nonlinear Regression.
  
  In that post, I take a dataset with a difficult curve to fit and work through different approaches to fit the curve and how to evaluate the fit. Ultimately, this requires choosing between a linear and nonlinear model. It doesn’t take the human out of the picture but it does break it down to a few items that aren’t overly subjective.
  
  Best of luck with your analysis!
  
  Reply
Syeda Arooj Kazmi says

September 28, 2018 at 6:41 am

hi jim i have a question, could u please help me answering this… If in your model one data set is linear, the other set is non linear how can you handle linear assumption?

Reply
- Jim Frost says
  
  September 28, 2018 at 10:24 am
  
  Hi Syeda,
  
  I’m not entirely clear what you’re asking. Are you saying that in one model and using one dataset, you have an independent variable that has a nonlinear relationship with the dependent variable and another independent variable that has a linear relationship with the dependent variable? I’ll assume that’s what you mean rather than different data sets. (If you mean something else, please let me know!)
  
  As I see it, you have two basic choices. You can use a data transformation to be able to include the nonlinear relationship in a linear model. Or, you can use nonlinear regression and specify the nonlinear relationship for one independent variable and a linear relationship for the other independent variable. In nonlinear regression, you can still specify linear relationships.
  
  I hope this helps!
  
  Reply
adlah says

September 1, 2018 at 5:25 am

Hi
I have very low R square tried many time to fix it no result, what to do

Reply
- Jim Frost says
  
  September 2, 2018 at 12:56 am
  
  Hi Adlah,
  
  Read my post about low R-square values. It’s not necessarily a bad thing. That post should answer your questions. Read that post and if you have any more questions, post a comment there.
  
  Reply
sreenivasulu kandakuru says

August 27, 2018 at 12:31 pm

great explanation

Reply
carlo debeerst says

July 3, 2018 at 5:14 am

Hi, thanks for the helpfull website,
I have some questions on non- lineair regression. My goal is to make a model with 10 possible predictor variabeles(continious) and one outcome variable( continious). . Some predictors have a possible quadratic effect sugested by the literature.

My goal is to develop the multiple regression thats fits the data best. This mean that the model have both linear and quadratic predictors in it.
-the first step I took was to made a new variable for each posible quadratic predictor by taking the square of it. (So for the quadratic predictors are there now 2 predictors)

-as I use SPSS, i took the function, analyse, regression, lineair and put al lineair and quadratic terms in. I uses the functions “backward” and “forward” , so that resulted in only significant predictors for the model to predict de OV.
-my question, some of the predictors have only the quadratic term, and others have both the linear and quadratic term in it . What is the difference between a predictor with a significant quadratic term verus a predictor with both a significant linear and quadratic term in the equation?

-Is this a good aproach or is it better to force both the linear and quadratic terms for the predictors with possible quadratic effect with the function “enter” in the same menu of SPSS?

Thanks, and my excuses for my bad English.

Reply
- Jim Frost says
  
  July 3, 2018 at 2:57 pm
  
  Hi Carlo,
  
  A quick note about terminology before I get to your questions. In statistics, linear and nonlinear has a very specialized meaning when it comes to regression models. Both can fit curves. The difference is the functional form of the model. The type of model that you are referring to is technically a linear model that uses polynomials (quadratics) to model curvature. For more information about this issue, see: The Difference between Linear and Nonlinear Models.
  
  The linear term and the quadratic term collectively describe the shape and the orientation of the curve. Sometimes you need both terms to adequately describe the curvature while other times you just need one of them. If the linear term isn’t significant, there is nothing wrong with excluding it from the model. However, there is a tradition of including all lower-order terms that comprise a higher-order term. This approach is known as specifying a hierarchical model.
  
  However, regression models can be non-hierarchical. Generally, you can exclude the lower-order terms when they are not significant, unless theory suggests that you include them. Models that contain too many terms can be relatively imprecise and decrease the model’s capability to predict new observations.
  
  This is an area where statisticians probably disagree and you can make the case either way. Be sure to check your residual plots to be sure that you don’t see patterns when you remove the linear terms. You can also check the standard error of the regression to see if including the extra terms that are not significant reduces the precision of your model.
  
  Reply
nourhane houssam says

April 2, 2018 at 7:22 pm

hi jim I have a question what if the dependent variable is a ratio which model or method could be applicable and the dependent variable is the secondary school enrollment ratio regress on five independent worker remittances, gdp, government expenditure on education,skilled migration rate and the squared skilledml migration rate waiting for your reply and help and thank you in advance.

Reply
Patrik Silva says

April 1, 2018 at 1:09 am

Thank you again Jim,

I was waiting for you, this is a very good feedback.

It helps a lot, at least now I can continue working in my data.

Patrik

Reply
Patrik Silva says

March 31, 2018 at 3:26 am

Hi Jim,

In the following equation:

Mobility = 1243 + 412.3 Ln(Density)

Do we still interpreting the B1 coefficient (412.3) like always or we need to take the exponential of B1 to convert it back to the original unit, to make it meaningful and then interpret it?

My second question is related to the following situation:

Is it Ok to use as independent variables a data that are scaled as a rate (%)? For example, let suppose we are modeling crimes in a particular city, we have in the same model 2 independent variables, let’s say population and unemployed people in their absolute. I am thinking that if I use both in the same model we might have some high correlation between population and unemployed people (and causing a multicollinearity problem), since that zones with more population tend to have also more people unemployed. Is use unemployment rate as independent variables better than use the absolute value of unemployed people?

And then, the third question:

Here, I would like if you can try to think about the geography of the study area, because also the size of the zone (area Km2) area normally related to the population that the zones have.

How could area of the zone be useful for this model?

I don’t know if I was clear in my question.

Thank you in advance,

Patrik Silva

Reply
- Jim Frost says
  
  March 31, 2018 at 3:56 pm
  
  Hi again Patrik,
  
  Excellent questions!
  
  Regarding the electron mobility equation, I should first note that the equation you refer to is just a portion of the linear equation that the model uses. The full equation is a cubic model (i.e., it also includes the squared and cubed terms). And, the linear model didn’t fit the data as well as the nonlinear model.
  
  So, with that in mind, the linear model does use the natural log, but only on the independent variable side of things. Consequently, you’d need to take the log of the value of the independent variable but the value that the equation calculates is in the natural units for electron mobility. So, no back transformation is necessary for this model. And, yes, the coefficient applies to the transformed value of the density value rather than the natural units because that variable has been transformed. For models with transformed values, it’s important to note specifically which values have been transformed and which ones have not!
  
  For your second question, yes, you’d likely have a correlation between those two variables–but those variables might cause other problems as well. In this situation, you might try including unemployment rate as a way of including both variables (rather than the raw numbers for both). I actually talk about this approach in more detail in my post about heteroscedasticity in regression models. Heteroscedasticity refers to residuals that have unequal variance. Cross sectional studies like your are more likely to produce heteroscedasticity if variables like these have a wide range of values. And, as you mention, there is also the possibility for these independent variables to be correlated (multicollinearity), which causes its own problems!
  
  I don’t know enough about the subject area to make a recommendation about zones. It sounds like it might be important. I’d recommend seeing how others in the area have included this information in their models.
  
  I hope this helps!
  
  Reply
Fariba Heidarian says

March 27, 2018 at 8:11 am

Hi, Thank you it is really useful. I have a question, in my case I have 5 variable, one of them is dependent variable. I don’t know which relation between the variable might be exist. Actually I am looking for it.
Which independt variable with response variable should I choose to curve fitting?

Reply
- Jim Frost says
  
  March 28, 2018 at 3:21 pm
  
  Hi Fariba,
  
  Unfortunately, there is no way to determine which types of relationships exist without taking a very close look at the data. A great place to start is with graphs. Use a scatterplot to graph the relationship between the dependent variable and each independent variable. You can look for curvature in these graphs. These graphs help you determine which variables have curved relationships and the type of curve. Based on the scatterplots, you can fit a model accordingly. It might well take some trial-and-error.
  
  For more information about this topic, read my post about fitting curves, which shows you how to fit specific types of curves.
  
  After you fit a model, be sure to check the residual plots. These plots help you determine whether you’re fitting the curves correctly.
  
  Best of luck!
  
  Reply
chris says

March 5, 2018 at 8:14 am

This is a great explanation, thanks!

Reply
- Jim Frost says
  
  March 5, 2018 at 9:58 am
  
  Hi Chris, I’m glad it was helpful!
  
  Reply