Curve Fitting using Linear and Nonlinear Regression

By Jim Frost 46 Comments

In regression analysis, curve fitting is the process of specifying the model that provides the best fit to the specific curves in your dataset. Curved relationships between variables are not as straightforward to fit and interpret as linear relationships.

For linear relationships, as you increase the independent variable by one unit, the mean of the dependent variable always changes by a specific amount. This relationship holds true regardless of where you are in the observation space.

Unfortunately, the real world isn’t always nice and neat like this. Sometimes your data have curved relationships between variables. In a curved relationship, the change in the dependent variable associated with a one-unit shift in the independent variable varies based on the location in the observation space. In other words, the effect of the independent variable is not a constant value.

Read my post where I discuss how to interpret regression coefficients for both linear and curvilinear relationships to see this in action.

In this post, I cover various curve fitting methods using both linear regression and nonlinear regression. I’ll also show you how to determine which model provides the best fit.

Why You Need to Fit Curves in a Regression Model

The fitted line plot below illustrates the problem of using a linear relationship to fit a curved relationship. The R-squared is high, but the model is clearly inadequate. You need to do curve fitting!

When you have one independent variable, it’s easy to see the curvature using a fitted line plot. However, with multiple regression, curved relationships are not always so apparent. For these cases, residual plots are a key indicator for whether your model adequately captures curved relationships.

If you see a pattern in the residual plots, your model doesn’t provide an adequate fit for the data. A common reason is that your model incorrectly models the curvature. Plotting the residuals by each of your independent variables can help you locate the curved relationship.

In other cases, you might need to depend on subject-area knowledge to do curve fitting. Previous experience or research can tell you that the effect of one variable on another varies based on the value of the independent variable. Perhaps there’s a limit, threshold, or point of diminishing returns where the relationship changes?

To compare curve fitting methods, I’ll fit models to the curve in the fitted line plot above because it is not an easy fit. Let’s assume that these data are from a physical process with very precise measurements. We need to produce accurate predictions of the output for any specified input. You can download the CSV dataset for these examples: CurveFittingExample.

Curve Fitting using Polynomial Terms in Linear Regression

Despite its name, you can fit curves using linear regression. The most common method is to include polynomial terms in the linear model. Polynomial terms are independent variables that you raise to a power, such as squared or cubed terms. Learn more about linear regression.

To determine the correct polynomial term to include, simply count the number of bends in the line. Take the number of bends in your curve and add one for the model order that you need. For example, quadratic terms model one bend while cubic terms model two. In practice, cubic terms are very rare, and I’ve never seen quartic terms or higher. When you use polynomial terms, consider standardizing your continuous independent variables.

Linear

Quadratic

Cubic

Use my free online Linear Regression Calculator! It analyzes the relationship between two variables using simple linear, quadratic, or cubic models. It also graphs the data with the best fit line, displays the regression equation, and provides key model statistics.

Our data has one bend. Let’s fit a linear model with a quadratic term.

The R-squared has increased, but the regression line doesn’t quite fit correctly. The fitted line over- and under-predict the data at different points along the curve. The high R-squared reinforces the point I make in my post about how to interpret R-squared. High R-squared values don’t always represent good models and that you need to check the residual plots!

Let’s try other models.

Learn more about using the Quadratic Formula and Polynomials Explained for a mathematical refresher of these concepts.

Curve Fitting using Reciprocal Terms in Linear Regression

When your dependent variable descends to a floor or ascends to a ceiling (i.e., approaches an asymptote), you can try curve fitting using a reciprocal of an independent variable (1/X). Use a reciprocal term when the effect of an independent variable decreases as its value increases.

The value of this term decreases as the independent variable (X) increases because it is in the denominator. In other words, as X increases, the effect of this term decreases, and the slope flattens. X cannot equal zero for this type of model because you can’t divide by zero.

For our data, the increases in Output flatten out as the Input increases. There appears to be an asymptote near 20. Let’s try curve fitting with a reciprocal term. In the data set, I created a column for 1/Input (InvInput). I fit a model with a linear reciprocal term (top) and another with a quadratic reciprocal term (bottom).

For our example dataset, the quadratic reciprocal model provides a much better fit to the curvature. The plots change the x-axis scale to 1/Input, which makes it difficult to see the natural curve in the data.

To show the natural scale of the data, I created the scatterplot below using the regression equations. Clearly, the green data points are closer to the quadratic line.

On the fitted line plots, the quadratic reciprocal model has a higher R-squared value (good) and a lower S-value (good) than the quadratic model. It also doesn’t display biased fitted values. This model provides the best fit to the data so far!

Curve Fitting with Log Functions in Linear Regression

A log transformation allows linear models to fit curves that are otherwise possible only with nonlinear regression.

For instance, you can express the nonlinear function:

Y=e^B0X₁^B1X₂^B2

In the linear form:

Ln Y = B₀ + B₁lnX₁ + B₂lnX₂

Your model can take logs on both sides of the equation, which is the double-log form shown above. Or, you can use a semi-log form which is where you take the log of only one side. If you take logs on the independent variable side of the model, it can be for all or a subset of the variables.

Using log transformations is a powerful method to fit curves. There are too many possibilities to cover them all. Choosing between a double-log and a semi-log model depends on your data and subject area. If you use this approach, you’ll need to do some investigation.

Let’s apply this to our example curve. A semi-log model can fit curves that flatten as the independent variable increases. Let’s see how a semi-log model fits our data!

In the fitted line plot below, I transformed the independent variable.

Like the first quadratic model we fit, the semi-log model provides a biased fit to the data points. Additionally, the S and R-squared values are very similar to that model. The model with the quadratic reciprocal term continues to provide the best fit.

So far, we’ve performed curve fitting using only linear models. Let’s switch gears and try a nonlinear regression model.

Curve Fitting with Nonlinear Regression

Nonlinear regression is a very powerful alternative to linear regression. It provides more flexibility in fitting curves because you can choose from a broad range of nonlinear functions. In fact, there are so many possible functions that the trick becomes finding the function that best fits the particular curve in your data.

Most statistical software packages that perform nonlinear regression have a catalog of nonlinear functions. You can use that to help pick the function. Further, because nonlinear regression uses an iterative algorithm to find the best solution, you might need to provide the starting values for all of the parameters in the function.

Our data approach an asymptote, which helps us choose a nonlinear function from the catalog below.

The diagram in the catalog helps us determine the starting values. Theta1 is the asymptote. For our data, that’s near 20. Based on the shape of our curve, Theta2 and Theta3 must be both greater than 0.

Consequently, I’ll use the following starting values for the parameters:

Theta1: 20
Theta2: 1
Theta3: 1

The fitted line plot below displays the nonlinear regression model.

The nonlinear model provides an excellent, unbiased fit to the data. Let’s compare models and determine which one fits our curve the best.

Comparing the Curve-Fitting Effectiveness of the Different Models

R-squared is not valid for nonlinear regression. So, you can’t use that statistic to assess the goodness-of-fit for this model. However, the standard error of the regression (S) is valid for both linear and nonlinear models and serves as great way to compare fits between these types of models. A small standard error of the regression indicates that the data points are closer to the fitted values.

Model	R-squared	S	Unbiased
Reciprocal – Quadratic	99.9	0.134828	Yes
Nonlinear	N/A	0.179746	Yes
Quadratic	99.0	0.518387	No
Semi-Log	98.6	0.565293	No
Reciprocal – Linear	90.4	1.49655	No
Linear	84.0	1.93253	No

We have two models at the top that are equally good at producing accurate and unbiased predictions. These two models are the linear model that uses the quadratic reciprocal term and the nonlinear model.

The standard error of the regression for the nonlinear model (0.179746) is almost as low the S for the reciprocal model (0.134828). The difference between them is so small that you can use either. However, with the linear model, you also obtain p-values for the independent variables (not shown) and R-squared.

For reporting purposes, these extra statistics can be handy. However, if the nonlinear model had provided a much better fit, we’d want to go with it even without those statistics. Learn why you can’t obtain P values for the variables in a nonlinear model.

Closing Thoughts

Curve fitting isn’t that difficult. There are various methods you can use that provide great flexibility to fit most any type of curve. Further, identifying the best model involves assessing only a few statistics and the residual plots.

Setting up your study and collecting the data is a time intensive process. It’s definitely worth the effort to find the model that provides the best fit.

Any time you are specifying a model, you need to let subject-area knowledge and theory guide you. Additionally, some study areas might have standard practices and functions for modeling the data.

Here’s one final caution. You’d like a great fit, but you don’t want to overfit your regression model. An overfit model is too complex, it begins to model the random error, and it falsely inflates the R-squared. Adjusted R-squared and predicted R-squared are tools that can help you avoid this problem.

Learn how to choose the correct regression model!

If you’re learning regression, check out my Regression Tutorial!

Note: I wrote a different version of this post that appeared elsewhere. I’ve completely rewritten and updated it for my blog site.

Comments

MUHAMMAD ARIF says

March 23, 2024 at 12:01 pm

My question regarding R^2 for nonlinear regression is ” if I calculate Pearson’s correlation between dependent variable and predicted value of dependent variable from fitting nonlinear model and square it, now it becomes coefficient of determination for nonlinear regression model and compare with R^2 of linear model.” comment please?

Loading...

Reply
- Jim Frost says
  
  March 23, 2024 at 5:41 pm
  
  Hi Muhammad,
  
  There are several issues relating to that practice.
  
  First off, I need to clarify the difference between nonlinear in a regression model versus a curvilinear line. A nonlinear regression model doesn’t just mean the regression line curves. It has to curve in specific ways. Linear model can fit some curvature using polynomials. To learn more, read linear vs. nonlinear models.
  
  So, let’s start by focusing on truly nonlinear models. R-squared is invalid for nonlinear models. It’s not appropriate to use with them. So, that would be one reason not to use your approach if you’re using a truly nonlinear model in a technical sense.
  
  Now, let’s suppose you have a curved relationship but you model it with a linear model using a polynomial. Despite the curvature, R-squared is valid in this context because you’re using a linear model. However, your method for calculating R-squared is not valid. Pearson’s correlation applies only to straight line data without curvature. If your fitted line is straight, your method would be fine. (One clarification: Find the correlation between the independent and dependent variables, don’t use the predicted values.) But, you’re saying that you’re fitting a curved relationship, so Pearson’s correlation isn’t valid in the first place. Hence, squaring it to get R-squared also isn’t valid.
  
  I’m not sure which context applies to your situation, but either way it’s not a good approach.
  
  Loading...
  
  Reply
Jerry says

November 3, 2023 at 5:59 pm

Hi Jim,

You write excellent posts which expand my ability to do regressions. In the table above showing the different models, what makes a particular model “biased” or “unbiased”? It seems that both R-squared and S for some of them are similar to the top two models. And, for another point I’d like clarification on, is “S” the standard error of the regression (that is, the mean distance of values from the regression line) and is it in units of the dependent variable or percentage units of the dependent variable? Thanks.

Loading...

Reply
- Jim Frost says
  
  November 3, 2023 at 9:14 pm
  
  Hi Jerry,
  
  In this context, unbiased means that model doesn’t systematically over or under predict as various ranges of values. You want the entire range to fall randomly above and below the fitted line. The easiest way to see this is in a residual plot where you look at the residuals vs. fitted values. You should see that random spread around zero for the entire range of fitted values. No patterns.
  
  In this post, all the models that I indicate are biased in the table have portions along the fitted value lines where it systematically over and under predicts. You can see that in the graph for each model throughout this post.
  
  You’re absolutely correct that the biased and unbiased models can have similar R-squared and S values because those statistics don’t evaluate bias. You can have high values of R-squared (or, equivalently, low values of S) and still have a biased model. And you can have low R-squared (high S) with unbiased models. So, those statistics don’t relate to bias.
  
  I write about this aspect in more detail in several posts. In my post about Interpreting R-squared, I show why/how you can obtain high values even for biased models. And I talk about what to look for in residual plots in my post about Residual Plots.
  
  Loading...
  
  Reply
Umi Pollmann says

June 18, 2022 at 9:00 am

Dear Jim, if I would use curve fitting using reciprocal terms in linear regression, are the p-value and R-squared valid unlike with a nonlinear regression? Thank you in advance

Loading...

Reply
- Jim Frost says
  
  June 20, 2022 at 5:05 pm
  
  Hi Umi,
  
  Nonlinear regression has a very specific definition in statistics that actually doesn’t relate to curvature vs. straight line. Yeah, I know, statisticians and the names they come up with!
  
  If you use reciprocal terms, it’s still linear regression and p-values and the R-squared are still valid–unless you include something else that is truly nonlinear. Read my post about the difference between linear and nonlinear regression for more details!
  
  Loading...
  
  Reply
Pete Allen says

January 23, 2022 at 12:57 pm

Hi,

How many points would you need before you could reliably identify a non linear trend?

Loading...

Reply
Charles says

September 4, 2021 at 2:09 pm

Hi Jim,

Thank you so much for this clear and really helpful blog!

I only have some simply questions since I’m a fresher of statistics.

I have a power curve fit model: y=0.5M^0.5 (where M is a physical parameter)
since it’s kind of power function (nonlinear), so is it linear regression? and can I use R^2 to explain the goodness of fitting? From you blog I suppose I can use log fit to tranfer it to linear regression?

Sorry for the stupid questions and I’m looking forward for your reply.

Regards,
Charles

Loading...

Reply
Yujin says

July 12, 2021 at 6:41 am

Great Tutorial. It’s interesting to me that I can use Linear Regression for curve-fitting.
I thought I need to learn Nonlinear Regression.

Loading...

Reply
- Jim Frost says
  
  July 12, 2021 at 5:27 pm
  
  Hi Yujin,
  
  The naming can be confusing! Nonlinear regression can fit a wider variety of curve shapes but often linear regression can fit your curve. I always recommend starting with linear regression because it’s easier and see if that works for your data.
  
  Loading...
  
  Reply
Julian says

May 20, 2021 at 5:57 am

Hello Jim! very useful information

I fitted two quadratic models to explain plant growth under two different treatments in a period of time. More than describe the “trending”, I would like to compare them and see if plants grown at the two treatments differed statistically.

Any comment would be appreciate

Loading...

Reply
Krishnan says

May 4, 2021 at 6:25 am

Can I do multi variable non linear regression in spss or excel ?

Loading...

Reply
- Jim Frost says
  
  May 4, 2021 at 2:33 pm
  
  Hi Krishnan,
  
  Read my post where I cover how to do regression analysis in Excel. You can fit curvilinear relationships using things like polynomial terms.
  
  Loading...
  
  Reply
Patrick says

April 20, 2021 at 7:00 am

Hi Jim – Excellent article. What is the statistics package you are using for non-linear regression? Is it an add-in to Excel?

Loading...

Reply
- Jim Frost says
  
  April 20, 2021 at 2:33 pm
  
  Hi Patrick,
  
  I’m using Minitab to do all the analyses, include the nonlinear regression, in this article.
  
  Loading...
  
  Reply
Bart Zehren says

September 21, 2020 at 12:11 pm

I wish to select a curve fitting model for data from a set of survey responses on pricing. Without giving way too much detail, I’ll simplysay have four pairs of X, Y coordinates – each coordinate being itself a measure of central tendency.

The data will conform to variations of an inverted U shape on a X, Y graph for which one wants to find the value of X (+/-) to maximize Y. The actual shape of the inverted U will vary across studies – sometimes very regular and balanced (i.e., mirror-imaged) on both sides; other times irregular or nonsymmetric, left to right. The shape is not a bug, it’s the whole point of doing the research. We want to discover and model real world shapes of that inverted U to find its peak (and the +/- error around it).

In all successful instances, the four data pairs will be such that the curve reaches its highest Y value somewhere between the 2nd and 3rd pairs of X, Y coordinates. One then uses the equation (if satisfactory) to compute the X (+/-) that corresponds to the maximum value of Y (+/-). My question: Would a nonlinear regression model, e.g., quadratic, be best, or some other specialized type of curve fitting model? Or is it more complicated, with perhaps multiple answers depending on…?

Loading...

Reply
Olga says

April 17, 2020 at 1:58 pm

Hi Jim,

I have a question regarding the value of the std. error of regression when using the double-log linear model (i.e. both independent + dependent variables are transformed). Surely, because the error has the same units as the dependent variable, to compare it with other models’ results it has to be converted back by exponentiation.
In my case this (10^err) doesn’t work, it gives me cca 2 orders of magnitude lower value than expected according to the visual comparison of the models (my y-values are in the range of thousands). The double-log model is best according to the R-squared, but nevertheless not so much better to have such a minute error…

Thanks for any suggestions and keep up the good work!

Loading...

Reply
- Jim Frost says
  
  April 18, 2020 at 1:29 am
  
  Hi Olga,
  
  You can’t compare R-squared values between models that transform the DV and models that don’t. When you transform the DV, the goodness-of-fit statistics apply only to the transformed values. For double-log models vs untransformed, check the residual plots to determine which model best satisfies the least squares assumptions. Particularly look for patterns that indicate you’re not fitting curvature correctly and check for heteroskedasticity in the untransformed models. If you can’t get a good fit using the untransformed DV, then you might need to use a transformation. In that case, you’re basing the decision on the inability to fit the data adequately as determined by assessing the residual plots rather than goodness-of-fit measures.
  
  I hope this helps!
  
  Loading...
  
  Reply
Soroush says

March 26, 2020 at 5:19 am

Hello Jim,
Does using a linear relationship to fit a curved relationship always cause errors to be heteroscedastic, nonnormal and correlated simultaneously? If so, in case of encountering such problems (having heteroscedastic, nonnormal or correlated errors), how to realize that they are are a sign of using wrong functional form (using a linear relationship to fit a curved relationship) or otherwise?

Loading...

Reply
- Jim Frost says
  
  March 26, 2020 at 5:18 pm
  
  Hi Soroush,
  
  No, using linear models to fit curvature doesn’t necessarily cause those problems. In fact, I use a polynomial in a model that uses body mass index (BMI) to predict body fat percentage and, because the model fits the data well, it doesn’t have those problems. It all depends on how well your model fits the data. Sometimes linear models can adequately fit the curvature and there are no problems. Other times it can’t.
  
  There are various ways to assess the functional form of your model. I’ve written about a bunch of them and rather than retyping them here, please go read my post about model specification for an overview with links to more details.
  
  I hope this helps!
  
  Loading...
  
  Reply
Myriam says

December 10, 2019 at 9:20 pm

Hello!

Thanks for your help with this blog.

I would like to know if it is possible to compare two curves from two datasets instead of a curve with a non-linear regression.
For example if you have two sub-datasets A and B and you want to know if A and B are from the same data or not. Do you have a test that will let you know if the curves of A and B are fitting?

Loading...

Reply
- Jim Frost says
  
  December 11, 2019 at 4:18 pm
  
  Hi Myriam,
  
  I don’t know of a test for nonlinear regression. That’s assuming you’re using the statistically correct definition for nonlinear (not just fitting a curve but the form of the model itself is nonlinear). Given that you can’t obtain p-values out of the box for nonlinear parameter estimates, I doubt there is such a test “out of the box.” A statistician might be able to devise a custom test for particular functions. That’s my hunch, but I haven’t investigated that question specifically.
  
  However, if you’re using linear regression to model curves, such as polynomial terms, you’re in luck. You just need to combine the two datasets into one and create a categorical variable that identifies the original dataset. Then include the appropriate interaction terms. I discuss the process in my post about statistically comparing regression lines. I don’t discuss curvature in it, but you’d just need to include interaction terms for the terms that model the curvature in addition the other interactions I mention.
  
  Loading...
  
  Reply
Kiran S says

August 13, 2019 at 6:04 am

hi jim
am kiran, i want to fit an regression curve for the data which contains four independent variables and on dependent variable out of the four independent two are nonlinear and two are linear i want to fit a curve to this data please help me

Loading...

Reply
Shaz says

November 15, 2018 at 5:58 am

Hi Jim,

I am getting my head around on understanding one thing. My dependent variable has lots of zeros. The residuals will potentially be non-normal in this case. But how can the zeros pose a challenge to non-linearity of the relationship? I am confusing the concept of non-linear relationship and non-normal errors in the context of a highly skewed dependent variable with lots of zeros.

Moreover, how can log transformation correct for non-linearity and non-normality here?

Loading...

Reply
- Jim Frost says
  
  November 15, 2018 at 11:46 am
  
  Hi Shaz,
  
  This is a fairly complicated problem that affects some subject areas more than others. Unfortunately, I don’t have any first-hand knowledge of dealing it, which limits how much I can help.
  
  Typically, this type of problem goes beyond using transformation to resolve it.
  
  If you are dealing with count data, you might look into zero inflated models. I discuss those a bit in my post about choosing the correct type of regression analysis. You’ll find that in the count data section at the end.
  
  Another method I’ve heard a bit about is separate your dataset into two datasets. One is dataset indicates the presence of whatever you’re measuring. The other is the amount. You create separate models for each. Model the presence dataset using logistic regression and the other with ordinary regression. Then, you merge the models That might or might not work for your data.
  
  This issue is something that will probably take a bit of research on your part. What I write above is really the extent of my knowledge. I’m sure there are also a variety of subject specific variations on this issue as well.
  
  I hope this helps to at least point you in the right direction!
  
  Loading...
  
  Reply
cristina says

October 27, 2018 at 7:58 am

how could I fit a nonlinear data set to a linear function?

Loading...

Reply
- Jim Frost says
  
  October 27, 2018 at 4:32 pm
  
  Hi Cristina,
  
  In the first portion of this post, I show you a variety of ways that you can fit curves using a linear model.
  
  Loading...
  
  Reply
Al says

October 7, 2018 at 2:06 pm

Hi jim,
Why does a linear regression model with an x and an x-square term not have high multicollinearity automatically? The correlation between x and x-squared should be very high.

Loading...

Reply
- Jim Frost says
  
  October 7, 2018 at 6:58 pm
  
  Hi Al,
  
  Yes, you’re completely correct–and squared terms do cause very high multicollinearity. If you check the VIFs (that measure multicollinearity), you’ll find very high values. Fortunately, there is an easy solution to fix multicollinearity caused by these types of terms. Read my post about multicollinearity for more information!
  
  Loading...
  
  Reply
Al says

October 7, 2018 at 2:02 pm

Hi Jim

Loading...

Reply
Albert says

August 22, 2018 at 1:56 pm

Hi Jim, Thank you for this thorough explanation!

Loading...

Reply
Xie Chang says

August 22, 2018 at 10:09 am

Hello Jim,

I have one question regarding multiple regression. Actually I’m trying to find the energy (E) of an object using the mass (M) and the shape factor (s) multiplied by the velocity (V) as independent variables:

E= β+ β1(M)+ β2 (sV)^2

In this case, I’m using Excel (data analysis: regression option) to find (β, β1 and β2). The best fit (highest R^2) is obtained if the term (sV) is squared. in this case, it is still a Multiple linear regression or Multiple nonlinear regression because one of the terms is squared??

Thanks,

Xie

Loading...

Reply
- Jim Frost says
  
  August 24, 2018 at 2:39 am
  
  Hi Xie,
  
  It is still linear regression analysis. To learn why, read my post about the difference between linear and nonlinear regression.
  
  Have you tried including the sV term (not squared) as well?
  
  Best of luck with your analysis!
  
  Loading...
  
  Reply
Adriano says

July 20, 2018 at 7:43 am

Hi Jim.
Thank for all the strait to the point information.
I have a rather not so simple question, and hoping for a as simple as possible explanation.
I have 10 predictors which affect a specific beer consumption, like: price, trade penetration, advertising, temperature etc.
What is a procedure of fitting a nonlinear regression with more predictors?
Thanks

Loading...

Reply
- Jim Frost says
  
  July 20, 2018 at 4:07 pm
  
  Hi Adriano,
  
  First off, we need to clarify whether you mean a true nonlinear model or a linear model that uses polynomials to fit curvature. There are huge differences between the two types. In fact, I’ve never heard of a true nonlinear model that has 10 predictors. One seems to be the most common case. So, I’m going to assume that you actually mean a linear model that uses polynomials and/or data transformation. To be sure about this, you should read my post, The Differences between Linear and Nonlinear Models. You’ll be able to tell the difference and know what type of model you’re using.
  
  As for fitting a model with 10 predictors and potential curvature. Choosing a model to fit your data is known as model specification. You should read my post about it: Model Specification: Choosing the Correct Regression Model. This post goes over all the different statistical and non-statistical methods for choosing the best model. In addition to that information, given that you are particularly interested in modeling curvature, you should graph the individual relationships between each predictor and the response. This process will help you visually assess curvature and help you include the correct polynomial terms–or possibly use other methods to fit the curve. You should also think about the potential curvature from a theoretical basis. These are always important tasks to perform, but more so because you’re specifically concerned about curvature.
  
  One final warning. Because you have 10 predictors and possible polynomials, you need to worry about overfitting your model. You need a certain number of observations per term in your model or you risk obtaining invalid, misleading results. Read my post about overfitting for more information.
  
  I hope this helps!
  
  Loading...
  
  Reply
bob says

June 27, 2018 at 9:22 am

Hi, thanks for your helpful webpage! I’m running some statistic analysis on spss to check for both linear and non-linear effects ( about 10 predictor variable and one outcome variable, al are of continiues level) in a multiple linear regression . My goal is to check If I can come to a better model for predicting the outcome variable if I check for posible non-linear effect. I took the folowing steps, Is this a good approach?
-made a linear model with only the significant predictors(function, analyse, regression, linear, “backward”, “forward”)

-made an extra variable for the ones the literature suggest possible quadratic effect, so I made new variables by the square of them ( so I did a transformation)

-I putted the squared variables in the total model, and checked I they are significant
thanks

Loading...

Reply
- Jim Frost says
  
  June 27, 2018 at 2:59 pm
  
  Hi Bob,
  
  A quick terminology issue before we get to your question. Linear and nonlinear have very specific meanings in statistics that refer to the form of the model and not whether the line is curved. I know that’s confusing! That’s why I wrote an entire post about that issue–The Difference Between Linear and Nonlinear Regression. In statistical terms, your model with squared variables is a linear model even though it will fit a curve!
  
  Your general process sounds correct. Although, I have a few suggestions. For one thing, be sure to assess the residual plots for the model without the squared variables. If there is curvature that you need to fit, you’ll often see it in the residual plots. And, those plots are a great way to verify that you’re fitting any curvature adequately.
  
  When you include the squared terms, check their p-values to see if they’re significant. That can help you determine whether those terms are good additions to the model.
  
  Finally, it looks like you’re using a stepwise procedure to select your model. Just be aware that research shows that stepwise procedures generally only get you close to the best model but not exactly to it. Read my post about Stepwise Regression for more information. Stepwise chooses the final model based strictly on statistical significance. To specify the correct model, you typically need to use subject-area knowledge and theory to guide you along with the statistical measures. Read my post about Model Specification for more about this!
  
  I hope this helps!
  
  Loading...
  
  Reply
Wisley Wan says

May 26, 2018 at 5:39 am

Hi Jim, Please ignore my previous message – I’ve found it!

Loading...

Reply
- Jim Frost says
  
  May 28, 2018 at 2:24 am
  
  Hi Wisley,
  
  I’m glad that you find the blog to be helpful. That means a lot to me! I’m also glad that you were able to try example out yourself!
  
  Loading...
  
  Reply
Wisley Wan says

May 26, 2018 at 5:34 am

Hi Jim,

Thank you very much for the blog. It is very clear and helps me understand the issue better.

I tried the polynomial linear regressions using excel (standardized the IV), but it is weird that the interception is 0 but the other coefficients are both correct. the R-squared dropped to only 7%. I have checked but couldn’t find where went wrong….Could you please give me some tips?

Thanks!

Loading...

Reply
Patrik Silva says

March 29, 2018 at 1:15 pm

It helped for sure!

Thank you Jim, for your prompt answer. I understood very well.

I see all the affection that you are giving us here.

Thank you for sharing your valuable (and I imagine scarce) time with me. I thank you very much.

Loading...

Reply
Patrik Silva says

March 29, 2018 at 11:24 am

Hi Jim, this post is definitely wonderful, because it provide the foundation of regression analysis…I was always thinking that linear regression is the one where the correlation seems to look like linear (line), like you were saying that we maybe think.

Very, very clear!!! However i have some questions:

1) How do we convert back to the original unit of the data, and how can we interpret the coefficient, after using the transformation and polynomial terms.

2) By transforming the data for example reciprocal transformation the curve looks inverse, that OK, because that is what we want. But how can we interpret the graph? I think we lose all the power to explain the graph since is not in a readable unit. For example if the line has a positive slope we say as the X increase the Y variable tend to increase also. but in the reciprocal everything is inverse, I got lost in this point.

I hope you understand my question and clarify it to me.

Thank you!

Patrik Silva

Loading...

Reply
- Jim Frost says
  
  March 29, 2018 at 12:10 pm
  
  Hi Patrik,
  
  You’re right, the names of the analyses (linear and nonlinear regression) really gives the wrong impression about when you should use each one!
  
  On to your questions.
  
  For converting the transformed data back to the original units, you can do the calculations yourself. The precise calculations depend on the nature of the transformations that you’ve used. However, most statistical software should be able to back convert the values for you. So, I’d check that out first. One thing to note, if you use polynomials to fit curvature, you don’t need to back transform anything. All the units use their original scale. For example, suppose your model is: y = 2 + 2X + X^2. If your X = 2, then your y = (2 + 4 + 4 = 10). No transformation is necessary! However, it does make understanding the relationship between the X and the Y more complex because it changes.
  
  In general, most statistical software can produce main effects plots that incorporate all the transformations. These plots display the relationship between an independent variable and the dependent variable while incorporating transformations and polynomials. If the relationship is curved, you’ll see it in these graphs. Looking at the graph helps you characterize the nature of the relationship, which brings me to your second question.
  
  For the model that uses the reciprocal, I had to actually create the Linear vs Quadratic Reciprocal Model comparison graph by hand because the software couldn’t do that for reciprocal variables. However, once I created the graph, I can use it to describe the relationship because it’s all in natural units at that point.
  
  The way that I’d characterize the quadratic reciprocal relationship is that as input increases, the output also increases. Initially the output increases at a very high rate but as input increases, the rate by which output increases slows down as is asymptotically approaches ~20. For example, looking at the quadratic curve in that graph, you can see that increasing X by 1 unit corresponds to different changes in Y. If your input is at 1 and you increase it to 2, the output increases by quite a bit–from ~6 to ~10. However, if your input is at 10 and you increase it to 11, the output barely increases at all–stays right around for both settings ~19. Our model incorporates all of that mathematically!
  
  You raise an important point. While these transformations help you fit curves that are present in your data, they can obscure the reality behind the relationships, You need to transform the numbers back to their natural units and use graphs to understand the relationships. Fortunately, statistical software can automate that process. One advantage of nonlinear models is that you don’t transform the data but rather you specify the model that fits the data without transformation. However, the relationships, coefficients, etc can be just as hard to understand! As an example of that, just look at the nonlinear model in this post and you’ll see that equation is cryptic! The data aren’t transformed but the equation is not easier to understand. Again, use the graph to better understand the nature of the relationship.
  
  I hope this helps!
  
  Loading...
  
  Reply
Ahmed says

March 21, 2018 at 1:11 pm

HI

I have 5 variables with 3 levels and 1 variable for 2 levels. based on that, had designed 18 mixes and i have tested one response for different ages (4 periods).
Now, i have 8 columns: 6 for variables (x1,x2,…x6) and the age, finally the response
I have selected the option of regression- fit regression model and have found the all anova table and regression model
the problem is the relation between these variables and this response by using main effect plot was straight line and this i dont know how can change it to curve
someone told me need to divide the response on the square root of the age

please help me, i appreciate that

Loading...

Reply
- Jim Frost says
  
  March 21, 2018 at 2:02 pm
  
  Hi Ahmed, you need to fit a model that can handle the curvature, such as by including polynomial terms (e.g., X^2). Based on the analysis names, it sounds like you’re using Minitab. If so, include your variables on the main dialog box, then click Model, and there you can include the higher-order terms (polynomials and interactions). Then, when you create a main effect plot, it should display any curvature that is present.
  
  It sounds like either you’re not fitting those polynomial terms or, if you did, maybe curvature isn’t present? How do the residual plots look?
  
  Loading...
  
  Reply
Josey says

December 31, 2017 at 7:43 pm

I’m curious. I use non-linear regression to model the progression of prostate cancer. The independent variable is PSA (Prostate-specific antigen), a product of healthy and cancerous prostate cells. A half-life regression model works well in predicting whether the cancer is growing or, conversely, whether the cancer treatment is working. In the former case, cells numbers are doubling. In the latter case, cells numbers are halving.

Is it possible to model both at the same time with the same data? In other words, is it possible to estimate how many cells are doubling in number because they are resistant to treatment and how many are halving i number because the treatment is effective?

Loading...

Reply