P-values and coefficients in regression analysis work together to tell you which relationships in your model are statistically significant and the nature of those relationships. The coefficients describe the mathematical relationship between each independent variable and the dependent variable. The p-values for the coefficients indicate whether these relationships are statistically significant.

After fitting a regression model, check the residual plots first to be sure that you have unbiased estimates. After that, it’s time to interpret the statistical output. Linear regression analysis can produce a lot of results, which I’ll help you navigate. In this post, I cover interpreting the p-values and coefficients for the independent variables.

**Related post**: When Should I Use Regression Analysis?

## Interpreting P-Values for Variables in a Regression Model

Regression analysis is a form of inferential statistics. The p-values help determine whether the relationships that you observe in your sample also exist in the larger population. The p-value for each independent variable tests the null hypothesis that the variable has no correlation with the dependent variable. If there is no correlation, there is no association between the changes in the independent variable and the shifts in the dependent variable. In other words, there is insufficient evidence to conclude that there is effect at the population level.

If the p-value for a variable is less than your significance level, your sample data provide enough evidence to reject the null hypothesis for the entire population. Your data favor the hypothesis that there *is* a non-zero correlation. Changes in the independent variable *are* associated with changes in the response at the population level. This variable is statistically significant and probably a worthwhile addition to your regression model.

On the other hand, a p-value that is greater than the significance level indicates that there is insufficient evidence in your sample to conclude that a non-zero correlation exists.

The regression output example below shows that the South and North predictor variables are statistically significant because their p-values equal 0.000. On the other hand, East is not statistically significant because its p-value (0.092) is greater than the usual significance level of 0.05.

It is standard practice to use the coefficient p-values to decide whether to include variables in the final model. For the results above, we would consider removing East. Keeping variables that are not statistically significant can reduce the model’s precision.

Related post: F-test of overall significance in regression

## Interpreting Regression Coefficients for Linear Relationships

The sign of a regression coefficient tells you whether there is a positive or negative correlation between each independent variable the dependent variable. A positive coefficient indicates that as the value of the independent variable increases, the mean of the dependent variable also tends to increase. A negative coefficient suggests that as the independent variable increases, the dependent variable tends to decrease.

The coefficient value signifies how much the mean of the dependent variable changes given a one-unit shift in the independent variable while holding other variables in the model constant. This property of holding the other variables constant is crucial because it allows you to assess the effect of each variable in isolation from the others.

The coefficients in your statistical output are estimates of the actual population parameters. To obtain unbiased coefficient estimates that have the minimum variance, and to be able to trust the p-values, your model must satisfy the seven classical assumptions of OLS linear regression.

## Graphical Representation of Regression Coefficients

A simple way to grasp regression coefficients is to picture them as linear slopes. The fitted line plot illustrates this by graphing the relationship between a person’s height (IV) and weight (DV). The numeric output and the graph display information from the same model.

The height coefficient in the regression equation is 106.5. This coefficient represents the mean increase of weight in kilograms for every additional one meter in height. If your height increases by 1 meter, the average weight increases by 106.5 kilograms.

The regression line on the graph visually displays the same information. If you move to the right along the x-axis by one meter, the line increases by 106.5 kilograms. Keep in mind that it is only safe to interpret regression results within the observation space of your data. In this case, the height and weight data were collected from middle-school girls and range from 1.3 m to 1.7 m. Consequently, we can’t shift along the line by a full meter for these data.

Let’s suppose that the regression line was flat, which corresponds to a coefficient of zero. For this scenario, the mean weight wouldn’t change no matter how far along the line you move. That’s why a near zero coefficient suggests there is no effect—and you’d see a high (insignificant) p-value to go along with it.

The plot really brings this to life. However, plots can display only results from simple regression—one predictor and the response. For multiple linear regression, the interpretation remains the same.

## Use Polynomial Terms to Model Curvature in Linear Models

The previous linear relationship is relatively straightforward to understand. A linear relationship indicates that the change remains the same throughout the regression line. Now, let’s move on to interpreting the coefficients for a curvilinear relationship, where the effect depends on your location on the curve. The interpretation of the coefficients for a curvilinear relationship is less intuitive than linear relationships.

As a refresher, in linear regression, you can use polynomial terms model curves in your data. It is important to keep in mind that we’re still using linear regression to model curvature rather than nonlinear regression. That’s why I refer to curvilinear relationships in this post rather than nonlinear relationships. Nonlinear has a very specialized meaning in statistics. To read about this distinction, read my post: The Difference between Linear and Nonlinear Regression Models.

This regression example uses a quadratic (squared) term to model curvature in the data set. You can see that the p-values are statistically significant for both the linear and quadratic terms. But, what the heck do the coefficients mean?

## Graphing the Data for Regression with Polynomial Terms

Graphing the data really helps you visualize the curvature and understand the regression model.

The chart shows how the effect of machine setting on mean energy usage depends on where you are on the regression curve. On the x-axis, if you begin with a setting of 12 and increase it by 1, energy consumption should decrease. On the other hand, if you start at 25 and increase the setting by 1, you should experience an increased energy usage. Near 20 and you wouldn’t expect much change.

Regression analysis that uses polynomials to model curvature can make interpreting the results trickier. Unlike a linear relationship, the effect of the independent variable changes based on its value. Looking at the coefficients won’t make the picture any clearer. Instead, graph the data to truly understand the relationship. Expert knowledge of the study area can also help you make sense of the results.

Related post: Curve Fitting using Linear and Nonlinear Regression

## Regression Coefficients and Relationships Between Variables

Regression analysis is all about determining how changes in the independent variables are associated with changes in the dependent variable. Coefficients tell you about these changes and p-values tell you if these coefficients are significantly different from zero.

All of the effects in this post have been main effects, which is the direct relationship between an independent variable and a dependent variable. However, sometimes the relationship between an IV and a DV changes based on another variable. This condition is an interaction effect. Learn more about these effects in my post: Understanding Interaction Effects in Statistics.

In this post, I didn’t cover the constant term. Be sure to read my post about how to interpret the constant!

The statistics I cover in the post tell you how to interpret the regression equation, but they don’t tell you how well your model fits the data. For that, you should also assess R-squared.

If you’re learning regression, check out my Regression Tutorial!

Toby says

Great blog with detailed explanation! It helps clear my doubts for p-value.

Thank you Jim! and Happy new year! ðŸ˜€

Jim Frost says

Thank you, Toby! And, I’m very happy you found the blog to be helpful! Happy new year to you too!!

Javed Iqbal says

Thanks Jim for the nice explanation. This regression seems to violate one of the model assumption namely the homoskedasticity. Log transformation should work here.

Jim Frost says

Hi Javed, thanks for your comment. The residuals for this model are homoscedastic–or very close to it. Their variance are fairly equal across the entire range. The variance might appear to be lower in the very low end of the range, but there are also fewer observations in that region, which can make the dispersion appear to be smaller. At any rate, it is close enough. To see how a true case of heteroscedasticity appears, along with multiple methods for correcting it, read my post about heteroscedasticity. By the way, I explain in that post why I always recommend trying other methods of addressing this problem before using a transformation.

ADIL HUSSAIN RESHI says

Really fabulous ..it cleared all my doubts about p- value

Jim Frost says

Hi Adil, Thanks! I’m so glad to hear that it was helpful!

Rali says

Hi Mr. Jim

Thanks for the helpful blog

all the best

Jim Frost says

Hi Rali, you’re very welcome! I’m glad it was helpful!

Ayush says

This is really one of the best websites I have come across for DATA SCIENCE… Great effort put up by Sir Jim…

Jim Frost says

Thank you, Ayush!

Rajasekar says

I am currently working on a multiple regression model, where i have 4 x variable and all my variable are not statistically significant. I know when this happen i can reject null hypothesis but like to know what might be the wrong , do i need to add some more x variable in this case.Also the R Square =0.109842937

Adjusted R Square =0.034084889

MN says

Thank you very much for the wonderful elaboration. Amazing!!

Jim Frost says

You’re very welcome, MN! I’m glad it’s helpful!

Hanan Shteingart says

the following claim is not true if the features are correlated, what’s known as multicollinearity: “The sign of a regression coefficient tells you whether there is a positive or negative correlation between each independent variable the dependent variable”. In fact, a feature could have a positive correlation with the target yet a negative coefficient and vice vera.

Jim Frost says

Hi Hanan,

You raise a good point. The interpretation that I present, including the portion that you quote, is accurate when your model doesn’t contain a severe problem. However, if your model does contain a severe problem, it can produce unreliable results, which includes the possibility that the coefficients don’t accurately describe the relationship between the independent variables and the dependent variable. The problem isn’t with how to interpret coefficients, but rather with a condition in the model that causes it to produce coefficients that you can’t trust.

As you point out, multicollinearity can produce unreliable, erratic coefficients. In some cases, the sign of the coefficient can even be incorrect. However, the sign switch doesn’t necessarily have to happen when your model has multicollinearity. I write more multicollinearity, including switched signs, in this post: Multicollinearity in Regression Analysis: Problems, Detection, and Solutions.

By the way, there are a number of other potential problems that can cause your model to produce results that can’t trust. Multicollinearity is just scratching the surface of that. These problems include an incorrectly specified model, overfitting the model, heteroscedasticity, and data mining among others. I spend quite a bit of time talking about these problems, how they can invalidate your results, and what you can do to address them.

I hope this helps!

eric says

Thank you very much for the explanation Jim!

If the p-value is under the significant level, this would indicate that there is enough evidence to reject the null hypothesis. The null hypothesis being here that there is no correlation between 2 variables (in a single linear regression).

Here is my first question: how do we decide how to set the significant level? Is it purely arbitrary?

My second question is: since the coefficient of correlation varies -1 and 1, it is tempting to conclude that there is a significant correlation (positive or negative) between 2 variables is the coefficient of correlation is close to -1 or 1 and that there is no correlation when the coefficient of correlation is close to 0. However I think this assumption is false but can’t get the intuition to understand why.

Could you help me about those questions?

Many thanks for your time and your attention

Best regards

Eric

Hrishikesh Geed says

Thanks for the explaination Jim !!.

I have one doubt, how do you calculate the p-value corresponding to each coefficient?

How do you decide the standard deviation,and the sample mean for calculating the z value for each coefficient?

Thanks

Hrishi