How do you compare regression lines statistically? Imagine you are studying the relationship between height and weight and want to determine whether this relationship differs between basketball players and non-basketball players. You can graph the two regression lines to see if they look different. However, you should perform hypothesis tests to determine whether the visible differences are statistically significant. In this blog post, I show you how to determine whether the differences between coefficients and constants in different regression models are statistically significant.
Suppose we estimate the relationship between X and Y under two different conditions, processes, contexts, or other qualitative change. We want to determine whether the difference affects the relationship between X and Y. Fortunately, these statistical tests are easy to perform.
For the regression examples in this post, I use an input variable and an output variable for a fictional process. Our goal is to determine whether the relationship between these two variables changes between two conditions. First, I’ll show you how to determine whether the constants are different. Then, we’ll assess whether the coefficients are different.
Related post: When Should I Use Regression Analysis?
Hypothesis Tests for Comparing Regression Constants
When the constant (y intercept) differs between regression equations, the regression lines are shifted up or down on the y-axis. The scatterplot below shows how the output for Condition B is consistently higher than Condition A for any given Input. These two models have different constants. We’ll use a hypothesis test to determine whether this vertical shift is statistically significant.
Related post: How Hypothesis Tests Work
To test the difference between the constants, we need to combine the two datasets into one. Then, create a categorical variable that identifies the condition for each observation. Our dataset contains the three variables of Input, Condition, and Output. All we need to do now is to fit the model!
Interpreting the results
The regression equation table displays the two constants, which differ by 10 units. We will determine whether this difference is statistically significant.
Next, check the coefficients table in the statistical output.
For Input, the p-value for the coefficient is 0.000. This value indicates that the relationship between the two variables is statistically significant. The positive coefficient indicates that as Input increases, so does Output, which matches the scatterplot above.
To perform a hypothesis test on the difference between the constants, we need to assess the Condition variable. The Condition coefficient is 10, which is the vertical difference between the two models. The p-value for Condition is 0.000. This value indicates that the difference between the two constants is statistically significant. In other words, the sample evidence is strong enough to reject the null hypothesis that the population difference equals zero (i.e., no difference).
The hypothesis test supports the conclusion that the constants are different.
Hypothesis Tests for Comparing Regression Coefficients
Let’s move on to testing the difference between regression coefficients. When the coefficients are different, a one-unit change in an independent variable is related to varying changes in the mean of the dependent variable.
The scatterplot below displays two Input/Output models. It appears that Condition B has a steeper line than Condition A. Our goal is to determine whether the difference between these slopes is statistically significant. In other words, does Condition affect the relationship between Input and Output?
Performing this hypothesis test might seem complex, but it is straightforward. To start, we’ll use the same approach for testing the constants. We need to combine both datasets into one and create a categorical Condition variable. Here is the CSV data file for this example: TestSlopes.
We need to determine whether the relationship between Input and Output depends on Condition. In statistics, when the relationship between two variables depends on another variable, it is called an interaction effect. Consequently, to perform a hypothesis test on the difference between regression coefficients, we just need to include the proper interaction term in the model! In this case, we’ll include the interaction term for Input*Condition.
I fit the regression model with Input (continuous independent variable), Condition (main effect), and Input *Condition (interaction effect). This model produces the following results.
Interpreting the results
The p-value for Input is 0.000, which indicates that the relationship between Input and Output is statistically significant.
Next, look at Condition. This term is the main effect that tests for the difference between the constants. The coefficient indicates that the difference between the constants is -2.36, but the p-value is only 0.093. The lack of statistical significance indicates that we can’t conclude the constants are different.
Now, let’s move on to the interaction term (Input*Condition). The coefficient of 0.469 represents the difference between the coefficient for Condition A and Condition B. The p-value of 0.000 indicates that this difference is statistically significant. We can reject the null hypothesis that the difference is zero. In other words, we can conclude that Condition affects the relationship between Input and Output.
The regression equation table below shows both models. Thanks to the hypothesis tests that we performed, we know that the constants are not significantly different, but the Input coefficients are significantly different.
By including a categorical variable in regression models, it’s simple to perform hypothesis tests to determine whether the differences between constants and coefficients are statistically significant. These tests are beneficial when you can see differences between models and you want to support your observations with p-values.