P-values and coefficients in regression analysis work together to tell you which relationships in your model are statistically significant and the nature of those relationships. The coefficients describe the mathematical relationship between each independent variable and the dependent variable. The p-values for the coefficients indicate whether these relationships are statistically significant.
After fitting a regression model, check the residual plots first to be sure that you have unbiased estimates. After that, it’s time to interpret the statistical output. Linear regression analysis can produce a lot of results, which I’ll help you navigate. In this post, I cover interpreting the p-values and coefficients for the independent variables.
Interpreting P-Values for Variables in a Regression Model
Regression analysis is a form of inferential statistics. The p-values help determine whether the relationships that you observe in your sample also exist in the larger population. The p-value for each independent variable tests the null hypothesis that the variable has no correlation with the dependent variable. If there is no correlation, there is no association between the changes in the independent variable and the shifts in the dependent variable. In other words, there is insufficient evidence to conclude that there is an effect at the population level.
If the p-value for a variable is less than your significance level, your sample data provide enough evidence to reject the null hypothesis for the entire population. Your data favor the hypothesis that there is a non-zero correlation. Changes in the independent variable are associated with changes in the dependent variable at the population level. This variable is statistically significant and probably a worthwhile addition to your regression model.
On the other hand, a p-value that is greater than the significance level indicates that there is insufficient evidence in your sample to conclude that a non-zero correlation exists.
The regression output example below shows that the South and North predictor variables are statistically significant because their p-values equal 0.000. On the other hand, East is not statistically significant because its p-value (0.092) is greater than the usual significance level of 0.05.
It is standard practice to use the coefficient p-values to decide whether to include variables in the final model. For the results above, we would consider removing East. Keeping variables that are not statistically significant can reduce the model’s precision.
Interpreting Regression Coefficients for Linear Relationships
The sign of a regression coefficient tells you whether there is a positive or negative correlation between each independent variable and the dependent variable. A positive coefficient indicates that as the value of the independent variable increases, the mean of the dependent variable also tends to increase. A negative coefficient suggests that as the independent variable increases, the dependent variable tends to decrease.
The coefficient value signifies how much the mean of the dependent variable changes given a one-unit shift in the independent variable while holding other variables in the model constant. This property of holding the other variables constant is crucial because it allows you to assess the effect of each variable in isolation from the others.
The coefficients in your statistical output are estimates of the actual population parameters. To obtain unbiased coefficient estimates that have the minimum variance, and to be able to trust the p-values, your model must satisfy the seven classical assumptions of OLS linear regression.
Statisticians consider regression coefficients to be an unstandardized effect size because they indicate the strength of the relationship between variables using values that retain the natural units of the dependent variable. Effect sizes help you understand how important the findings are in a practical sense. To learn more about unstandardized and standardized effect sizes, read my post about Effect Sizes in Statistics.
Graphical Representation of Regression Coefficients
A simple way to grasp regression coefficients is to picture them as linear slopes. The fitted line plot illustrates this by graphing the relationship between a person’s height (IV) and weight (DV). The numeric output and the graph display information from the same model.
The height coefficient in the regression equation is 106.5. This coefficient represents the mean increase of weight in kilograms for every additional one meter in height. If your height increases by 1 meter, the average weight increases by 106.5 kilograms.
The regression line on the graph visually displays the same information. If you move to the right along the x-axis by one meter, the line increases by 106.5 kilograms. Keep in mind that it is only safe to interpret regression results within the observation space of your data. In this case, the height and weight data were collected from middle-school girls and range from 1.3 m to 1.7 m. Consequently, we can’t shift along the line by a full meter for these data.
Let’s suppose that the regression line was flat, which corresponds to a coefficient of zero. For this scenario, the mean weight wouldn’t change no matter how far along the line you move. That’s why a near zero coefficient suggests there is no effect—and you’d see a high (insignificant) p-value to go along with it.
The plot really brings this to life. However, plots can display only results from simple regression—one predictor and the response. For multiple linear regression, the interpretation remains the same.
Contour plots can graph two independent variables and the dependent variable. For more information, read my post Contour Plots: Using, Examples, and Interpreting.
Use Polynomial Terms to Model Curvature in Linear Models
The previous linear relationship is relatively straightforward to understand. A linear relationship indicates that the change remains the same throughout the regression line. Now, let’s move on to interpreting the coefficients for a curvilinear relationship, where the effect depends on your location on the curve. The interpretation of the coefficients for a curvilinear relationship is less intuitive than linear relationships.
As a refresher, in linear regression, you can use polynomial terms model curves in your data. It is important to keep in mind that we’re still using linear regression to model curvature rather than nonlinear regression. That’s why I refer to curvilinear relationships in this post rather than nonlinear relationships. Nonlinear has a very specialized meaning in statistics. To read about this distinction, read my post: The Difference between Linear and Nonlinear Regression Models.
This regression example uses a quadratic (squared) term to model curvature in the data set. You can see that the p-values are statistically significant for both the linear and quadratic terms. But, what the heck do the coefficients mean?
Graphing the Data for Regression with Polynomial Terms
Graphing the data really helps you visualize the curvature and understand the regression model.
The chart shows how the effect of machine setting on mean energy usage depends on where you are on the regression curve. On the x-axis, if you begin with a setting of 12 and increase it by 1, energy consumption should decrease. On the other hand, if you start at 25 and increase the setting by 1, you should experience an increased energy usage. Near 20 and you wouldn’t expect much change.
Regression analysis that uses polynomials to model curvature can make interpreting the results trickier. Unlike a linear relationship, the effect of the independent variable changes based on its value. Looking at the coefficients won’t make the picture any clearer. Instead, graph the data to truly understand the relationship. Expert knowledge of the study area can also help you make sense of the results.
Related post: Curve Fitting using Linear and Nonlinear Regression
Regression Coefficients and Relationships Between Variables
Regression analysis is all about determining how changes in the independent variables are associated with changes in the dependent variable. Coefficients tell you about these changes and p-values tell you if these coefficients are significantly different from zero.
All of the effects in this post have been main effects, which is the direct relationship between an independent variable and a dependent variable. However, sometimes the relationship between an IV and a DV changes based on another variable. This condition is an interaction effect. Learn more about these effects in my post: Understanding Interaction Effects in Statistics.
In this post, I didn’t cover the constant term. Be sure to read my post about how to interpret the constant!
If you’re learning regression and like the approach I use in my blog, check out my eBook!
Note: I wrote a different version of this post that appeared elsewhere. I’ve completely rewritten and updated it for my blog site.