Nonlinear regression analysis cannot calculate P values for the independent variables in your model. Why not? And, what do you use instead? Those are the topics of this blog post.

Nonlinear regression is an excellent statistical analysis when you need the maximum flexibility for fitting curves in your data. However, just like there are sound reasons for no R-squared values in nonlinear regression, there are valid reasons for why there are no P values for the coefficient estimates.

## Why Are P Values Possible in Linear Regression?

The question above is probably not one that you’ve asked.

P values for the independent variables in linear regression are a valuable statistical tool that seems quite natural. In linear regression, a P value indicates whether the relationship between an independent variable and the dependent variable is statistically significant while controlling for the other variables in the model. For more information, read my post about interpreting P values and regression coefficients.

However, you need to understand why P values are possible in linear regression before you can figure out why they are impossible to calculate for nonlinear regression.

The key point to understand is that a linear regression model is a very restricted form of a model. In a linear regression equation, all terms are either the constant or a parameter multiplied by an independent variable (IV). Then, you build the equation by only adding the terms together. These rules limit the form to just one type:

Dependent variable = constant + parameter * IV + … + parameter * IV

Because of these restrictions, you end up with a consistent form that makes it possible to create a single hypothesis test that is appropriate for all parameter estimates in all linear regression models. Regardless of what an independent variable measures, if the parameter is zero, the value of that term equals zero (0 * IV = 0). This condition indicates that the independent variable has no relationship with the dependent variable because it literally adds nothing to the dependent variable in the equation.

Given the consistent form, the following hypothesis test is valid for all terms in all linear regression models. β_{i} represents the parameter value for an independent variable.

- H
_{0}: β_{i}= 0 - H
_{A}: β_{i}<> 0

The P value for each term measures the amount of evidence against the null hypothesis that the parameter (coefficient) equals zero. If the P value is less than your significance level, reject the null and conclude that the parameter does not equal zero. Changes in the independent variable are related to changes in the dependent variable.

## Why Are P Values Incalculable in Nonlinear Regression?

Conversely, nonlinear regression models can take on virtually an infinite number of forms. There are almost no restrictions on how you can use parameters in a nonlinear regression equation. On the positive side, this flexibility provides nonlinear regression with the most flexible curve-fitting abilities.

However, because there is an incredibly diverse array of potential model forms, it’s impossible to devise a single hypothesis test for all parameters. Instead, the null hypothesis value of each parameter depends on the nonlinear function, the parameter’s location in it, and the research question.

What can you use instead of P values? You’ll need to use your knowledge of both the research area and the nonlinear function to identify the parameter value that corresponds to the null hypothesis. Then, assess the parameter estimates, and particularly the confidence interval of the estimate, to determine whether the variable is statistically significant. If the confidence interval of the estimate excludes the null value, you can conclude that the parameter is statistically significant.

For examples of nonlinear functions, see my post about the differences between linear and nonlinear regression.

To learn about when to use nonlinear regression, read the following:

George says

Jim, thanks so much for your help, this has been really useful and really cleared it up for me!

I suppose 2*SE is a good approximation of the prediction interval, but there’s no way of getting an exact prediction interval for a nonlinear model?

Thanks

George says

Hi Jim,

Thanks for your reply, it’s much appreciated and a great help. I’m not sure an inverse term or log transformation is quite what I’m looking for, as I’m using the Levenberg-Marquardt algorithm to fit my curve. I’m really just looking to test how well this particular curve fits and represents my data.

So I think I’m sticking with nonlinear regression, and I have my standard error for this regression. I’m then using 2*SE for the 95% prediction interval. I’m a little confused however over the difference between a prediction interval (PI), a confidence interval (CI), and the confidence interval of the prediction (NB I’ve read your 3 short description posts for these!). Can you perhaps give me an example of how to derive CI’s, please? I’m assuming this will involve the use of ‘n’ (number of data points/sample size), in a similar vein to p-values? I’m also unsure if I’d be expecting to get a CI for each data point along the curve (i.e. vertical error bars), a mean for the whole curve (i.e. shaded region above and below the regression curve, similar to the PI), or just CI values for the the two regression coefficients derived from the equation (a & b)? In python, my equation looks something like this: b * (1 – np.exp((x * -1.0)/a))

Thanks!

Jim Frost says

Hi George,

I just presented those other models as possibilities you might want to try because they can fit curves that approach an asymptote. If you read the curve fitting post, I initially thought the nonlinear model was going to provide the best fit for the example. However, it turns out on of the models with an inverse term provides the best fit for that particular dataset.

Just be aware that 2*SE is just an approximation for the prediction interval. It’s often very close but it’s an approximation.

The difference between prediction intervals and confidence intervals of the prediction is based on what you want to create a CI around.

If you want to create a CI around an individual new observation, that’s a prediction interval. PIs indicate the likely range for an individual. You specify the values of the IVs and want to know where the next single observation with those settings are likely to occur.

A confidence interval of a prediction creates a range around the mean. When you enter IV values into a regression equation, the fitted value you obtain is the mean value of the population that has those settings/values. Like any mean, you can create a CI around it that tells you the range the population mean is likely to fall within. Importantly, these CIs don’t tell you where an individual is likely to fall.

So the difference is between where an individual observation with particular IV values is likely to fall versus where the mean of a population with those IV values is likely to be. Obviously, predicting the value for an individual observation is more difficult than predicting the mean for that population. Consequently, the PI will always be wider than the CI of the prediction.

I hope that answers your question!

George says

Hi Jim,

Great website! This has become my go-to guide for all things stats, as it’s always so well explained. I have a question I’m hoping you can help with.

I have some data onto which I’ve fitted an asymptotic regression curve (concave). Since this is nonlinear I can’t use r-squared for the quality of fit, or p-value for the significance. Instead I’m using standard error of the regression for the quality of the fit, but I’m unsure what I can use to test the significance, if not the p-value?

Any help would be great!

Thanks

Jim Frost says

Hi George,

I’m so glad my site has been helpful.

In some instances, you can use an inverse term or log transformations in a linear model to fit an asymptotic curve. I show an example in my post about curve fitting techniques. I actually use an asymptotic curve as the example and try to fit it using a variety of techniques, including linear with an inverse term, log transformation, and nonlinear regression. I obviously don’t know which method will be best for your data, but there are some possibilities to try. If one of the linear models fit the data adequately, you’d obtain p-values. Although, note that log transformations can prevent you from comparing goodness-of-fit measures, such as R-squared.

If you need to use nonlinear regression, you do lose value statistics such as the p-values and R-squared. Instead, you can use confidence intervals of the coefficient estimates to replace p-values. Look to see if the CIs exclude meaningful values. You’ll need to determine the meaningful values, which could be based on other research, subject area knowledge, and theory. And, use the standard error of the regression instead of R-squared to assess overall model fit. I show how to use the standard error in the curve fitting post.

I hope that helps!

Maitreya says

Hi Jim,

I am working on nonlinear regression. I have estimated the parameters but I am a little confused about their significance. Please can I have your email so that I can send you the results.

Denis Norkin says

Thanks a lot for the great explanation,

I have a question though, for a given data set im a polynomial fit, and i want to iterate between orders ro determine the best order to fit my data without over fitting it.

what should be the predictor to evaluate it if not P or R^2?

Jim Frost says

Hi Denis,

A polynomial fit is not nonlinear regression! See my post about the difference between linear and nonlinear regression to learn why. So, using p-values and R-squared (adjusted or predicted) is just fine!

I typically use the p-value for the highest-order term. Usually, I only consider up to the quadratic term. Using a cubic term is extremely rare and you’d need good theoretical reasons to explain the multiple bends. Otherwise, you risk artificially bending the line just to fit the points better (overfitting the model).

You could also use adjusted R-squared. You would not use regular R-squared because the higher-order polynomials will always improve the fit by some degree. That’s the general knock against R-squared.

Also, I show an example of how predicted R-squared is very good in this exact scenario of a polynomial in my post about adjusted and predicted R-squared.

Volki says

This is a very good explanation.Thank you !

Jenny Alfaro says

Thank you, for your quick answer.

I’ll try your method and I’ll use the SE to compare.

Thanks.

Jenny

Jenny Alfaro says

Hi Jim.

I want to thanks you for this post ’cause is very useful but I’ve a question.

I’m working in rStudio with an interesting dataset (It consists of 26 different studies) on cancer, so I’m using non-linear regressions of 3 very popular population models (Exponential, Logistic and Gompertz) to explain tumor growth.

Although the post seems very explanatory and resolved many of my questions, it created me two questions. I need to compare between the non-linear regressions that fit best to my data, taking into account whether the data are comparable, it means, 5 studies evaluated at 20 days were explained by the non-linear regression of the Gompertz model but I want to know If these regressions have differences between them or not. What kind of test can I use to compare these results and know if they have differences?

I tried to be flexible and used GAM, which partially explains my results and tried to run an Anova, however, my data does not fit to the necessary assumptions and I had to discard that idea.

How valid is If I see if there are differences between the 5 studies explained by Gompertz, when comparing whether the confidence intervals overlap?

Jim Frost says

Hi Jenny,

I typically use the standard error of the regression to compare nonlinear models. Just like there are no p-values in nonlinear regression, R-squared values are also not valid for nonlinear models. To see how I use S to compare a nonlinear model to a linear model, read my post about curve fitting. You can use the same process to compare nonlinear models.

Unfortunately, I’ve never had to compare the CIs from a nonlinear model in the manner that you desribe–so I’m not sure about that. Nonlinear regression is more complex than linear regression. And, you’re performing a complex comparison between nonlinear models. You might need to consult with a statistician who can really dedicate the time into understanding your study and the very specific statistical needs it has.

Marisol Riddell says

Thanks Jim, but now I’m more confused. So my textbook says on page 271 of Introduction to Econometrics by Stock and Watson, that the adjusted R^2 can be used to compare the Log-Lin and log Log models, but not Lin-Log to Log-Log, so the point is that the dependent variable must be the same, but its okay if it’s nonlinear… can you please help me clarify?

Jim Frost says

Hi Marisol,

So, I’m not completely sure why your book is saying that but I might not be fully clear on what it is saying.

It sounds like you’re talking about semi-log models and log-log models and both of those are ways to incorporate nonlinear relationships into the linear model framework. R-squared and F-tests should be valid in these contexts. But, again, I might be missing some detail that your textbook raises.

I do talk about these types of models a bit in posts about fitting curvature and another about log-log plots and models.

Marisol Riddell says

Hi Jim,

Love this website. It has helped with several questions I had. One question I can’t seem to find a straight answer to anywhere is this:

I know I can not use r^2 to determine fit for a nonlinear model… but the F-test is measured with adjusted r^2. Can I use an F test to determine fit of a nonlinear model?

Thanks!

Jim Frost says

Hi Marisol,

That’s a great question. Alas, F-tests are only for linear models, and you can’t use them for nonlinear models. By nonlinear, I’m referring to the statistical sense of the word. You can use F-tests for linear models that use polynomials (and other methods) to model curvature.

Personally, I like to use the standard error of the regression to determine how well a nonlinear model fits the data. You can also assess the residual plots and look for the same things you would when using linear regression.

I hope this helps!

Sameer Kesava says

Hi Jim,

Thank you very much for the blogs, they are very helpful. I have a question with regards to nonlinear data fitting. I have scientific data to which I am fitting linear combination of functions such as Lorentz, Gaussian representing to model a system’s physical properties. The software that measures the data is also used to analyze the data and utilizes Levenberg-Marquardt algorithm to fit the starting models (with starting parameters and range for parameters). After data fitting, it spits out values for parameters with their error estimates and the Mean Squared Error for the fit. And importantly, the correlation matrix for the parameters and no p-values. Some of the parameters have a high correlation (close to 1 or -1) while some have low. The model representing the physical phenomenon can sometimes have as high as 30 parameters. How do I test if some of the parameters are redundant because even if the correlation is high between, lets say, parameters x and y, sometimes both are necessary and sometimes one can be fixed while other is varied. Most important, is there an accepted threshold value for the correlation coefficient below which we can say these parameters are independent. The most important aspect of my analysis is to obtain models representing physical phenomenon which have independent parameters. I apologize for such a long question. I will be eagerly waiting for your reply.

Thank you

Sameer

Vikash says

Nice sir…

Jim Frost says

Thank you, Vikash!

Dwarkesh says

Hello Jim,

Thanks for posting concepts regularly.

I have got a question… When you say that your model is non linear I guess you mean it is nonlinear in parameters, or estimate, because for model which are nonlinear in variable you can transform variable to have a linear relation.

Now the topic mentioned above is for nonlinear in parameter, i guess. And can we use p- value fundamental in nonlinear in variable models? Or it holds for both nonlinear models.

Keep posting great ideas.

Thanks,

Dwarkesh

Jim Frost says

Hi Dwarkesh, yes, I mean a model that is nonlinear in the parameters. That’s a great point and I’ll go back and make it more clear in the post . . . that I’m referring to nonlinear in the parameters. Thank you! And, you’re correct, you can model curvature using a linear model (polynomial, logs, etc) and p-values are quite appropriate there.

Psych n Stats Tutor (@psychnstats) says

Contrasting linear and non-linear was helpful, thanks. Am used to working only with linear

Jim Frost says

You’re very welcome. I’m glad that you found it helpful! I find that it’s easy to forget that linear regression is a very specific special case.

Vishal says

Hi, Jim,

I came across your website on facebook and i find your articles well organized and easy to understand. I have been meaning to ask, this question – why do we have the standard P value as 0.05? Why is it not 0.10 or 0.5 for that matter? When i do ask someone this question, i receive vague answers like, it is as it is, or this was what has been followed for many years.

Jim Frost says

Hi Vishal, first, thanks for the compliments! That makes my day! You’re actually asking about the significance levels rather than P values. The experimenter chooses the significance level, but the P values are calculated by the hypothesis test. Part of the reason we commonly use 0.05 is because of tradition. In fact, both 0.01 and 0.05 were set in place long ago and have persisted over the years. There is some logic behind these values. The significance level is the probability of obtaining a false positive when the null hypothesis is actually correct. So, you know that you want a low value. 10% and 50% error rates were deemed too high. You could also go very low, such as 0.001 and only have a 0.1% error rate. However, lower significance levels also reduce the power of the study. This reduction means that if you use a very low significance level, you might miss some real effects. So, 0.01 and 0.05 were seen as good trade-offs between avoiding false positives while not reducing power too much. However, once they were set in place almost a century ago, they persisted without question largely. However, that is changing right now. Both Bayesian analysis and simulations studies have shown that if you obtain a p-value that is near 0.05, it’s not very strong evidence because it results in a higher than expected error rate. Consequently, there is currently a push to lower the standard significance level from 0.05 to 0.005. We’ll have to see if the proposed standard is adopted or whether we stick with the traditional values!

Vishal says

Hi, Jim,

I appreciate this detailed explanation. Its starting to make sense after reading it over and over. I’m happy with your answer. Thank you!

I will recommend your page to my friends. It is like no other!

Jim Frost says

Hi Vishal, you’re very welcome! Thank you for recommending my website. I really appreciate that!