Interaction effects occur when the effect of one variable depends on the value of another variable. Interaction effects are common in regression analysis, ANOVA, and designed experiments. In this blog post, I explain interaction effects, how to interpret them in statistical designs, and the problems you will face if you don’t include them in your model.

In any study, whether it’s a taste test or a manufacturing process, many variables can affect the outcome. Changing these variables can affect the outcome directly. For instance, changing the food condiment in a taste test can affect the overall enjoyment. In this manner, analysts use models to assess the relationship between each independent variable and the dependent variable. This kind of an effect is called a main effect. However, it can be a mistake to assess only main effects.

In more complex study areas, the independent variables might interact with each other. Interaction effects indicate that a third variable influences the relationship between an independent and dependent variable. This type of effect makes the model more complex, but if the real world behaves this way, it is critical to incorporate it in your model. For example, the relationship between condiments and enjoyment probably depends on the type of food—as we’ll see in this post!

## Example of Interaction Effects with Categorical Independent Variables

I think of interaction effects as an “it depends” effect. You’ll see why! Let’s start with an intuitive example to help you understand these effects conceptually.

Imagine that we are conducting a taste test to determine which food condiment produces the highest enjoyment. We’ll perform a two-way ANOVA where our dependent variable is Enjoyment. Our two independent variables are both categorical variables: Food and Condiment.

Our ANOVA model with the interaction term is:

Satisfaction = Food Condiment Food*Condiment

To keep things simple, we’ll include only two foods (ice cream and hot dogs) and two condiments (chocolate sauce and mustard) in our analysis.

Given the specifics of the example, an interaction effect would not be surprising. If someone asks you, “Do you prefer ketchup or chocolate sauce on your food?” Undoubtedly, you will respond, “It depends on the type of food!” That’s the “it depends” nature of an interaction effect. You cannot answer the question without knowing more information about the other variable in the interaction term—which is the type of food in our example!

That’s the concept. Now, I’ll show you how to include an interaction term in your model and how to interpret the results.

## How to Interpret Interaction Effects

Let’s perform our analysis. All statistical software allow you to add interaction terms in a model. Download the CSV data file to try it yourself: Interactions_Categorical.

The p-values in the output below tell us that the interaction effect (Food*Condiment) is statistically significant. Consequently, we know that the satisfaction you derive from the condiment *depends* on the type of food.

But, how do we interpret the interaction effect and truly understand what the data are saying? The best way to understand these effects is with a special type of graph—an interaction plot. This type of plot displays the fitted values of the dependent variable on the y-axis while the x-axis shows the values of the first independent variable. Meanwhile, the various lines represent values of the second independent variable.

On an interaction plot, parallel lines indicate that there is no interaction effect while different slopes suggest that one might be present. Below is the plot for Food*Condiment.

The crossed lines on the graph suggest that there is an interaction effect, which the significant p-value for the Food*Condiment term confirms. The graph shows that enjoyment levels are higher for chocolate sauce when the food is ice cream. Conversely, satisfaction levels are higher for mustard when the food is a hot dog. If you put mustard on ice cream or chocolate sauce on hot dogs, you won’t be happy!

Which condiment is best? It depends on the type of food, and we’ve used statistics to demonstrate this effect.

## Overlooking Interaction Effects is Dangerous!

When you have statistically significant interaction effects, you can’t interpret the main effects without considering the interactions. In the previous example, you can’t answer the question about which condiment is better without knowing the type of food. Again, “it depends.”

Suppose we want to maximize satisfaction by choosing the best food and the best condiment. However, imagine that we forgot to include the interaction effect and assessed only the main effects. We’ll make our decision based on the main effects plots below.

Based on these plots, we’d choose hot dogs with chocolate sauce because they each produce higher enjoyment. That’s not a good choice despite what the main effects show! When you have statistically significant interactions, you cannot interpret the main effect without considering the interaction effects.

Given the intentionally intuitive nature of our silly example, the consequence of disregarding the interaction effect is evident at a passing glance. However, that is not always the case, as you’ll see in the next example.

## Example of an Interaction Effect with Continuous Independent Variables

For our next example, we’ll assess continuous independent variables in a regression model for a manufacturing process. The independent variables (processing time, temperature, and pressure) affect the dependent variable (product strength). Here’s the CSV data file if you want to try it yourself: Interactions_Continuous.

In the regression model, I’ll include temperature*pressure as an interaction effect. The results are below.

As you can see, the interaction term is statistically significant. But, how do you interpret the interaction coefficient in the regression equation? You could try entering values into the regression equation and piece things together. However, it is much easier to use interaction plots!

**Related post**: How to Interpret Regression Coefficients and Their P-values for Main Effects

In the graph above, the variables are continuous rather than categorical. To produce the plot, the statistical software chooses a high value and a low value for pressure and enters them into the equation along with the range of values for temperature.

As you can see, the relationship between temperature and strength changes direction based on the pressure. For high pressures, there is a positive relationship between temperature and strength while for low pressures it is a negative relationship. By including the interaction term in the model, you can capture relationships that change based on the value of another variable.

If you want to maximize product strength and someone asks you if the process should use a high or low temperature, you’d have to respond, “It depends.” In this case, it depends on the pressure. You cannot answer the question about temperature without knowing the pressure value.

## Important Considerations for Interaction Effects

While the plots help you interpret the interaction effects, use a hypothesis test to determine whether the effect is statistically significant. Plots can display non-parallel lines that represent random sample error rather than an actual effect. P-values and hypothesis tests help you sort out the real effects from the noise.

The examples in this post are two-way interactions because there are two independent variables in each term (Food*Condiment and Temperature*Pressure). It’s equally valid to interpret these effects in two ways. For example, the relationship between:

- Satisfaction and Condiment depends on Food.
- Satisfaction and Food depends on Condiment.

You can have higher-order interactions. For example, a three-way interaction has three variables in the term, such as Food*Condiment*X. In this case, the relationship between Satisfaction and Condiment depends on both Food and X. However, this type of effect is challenging to interpret. In practice, analysts use them infrequently. However, in some models, they might be necessary to provide an adequate fit.

Finally, when you have interaction effects that are statistically significant, do not attempt to interpret the main effects without considering the interaction effects. As the examples show, you will draw the wrong the conclusions!

If you’re learning regression, check out my Regression Tutorial!

Neha says

Thank you for amazing posts. the way you express concepts is matchless.

Jim Frost says

You’re very welcome! I’m glad they’re helpful!

lijiancai says

Can I know which software did you use, because I use SPSS, but the result was not the same with you.

Jim Frost says

Hi, I’m using Minitab statistical software. I’m not sure why the results would be different. A couple of possibilities come to mind. Minitab presents the fitted means rather than raw data means–I’m not sure which values SPSS present. Minitab doesn’t fit the model using standardized variables, which SPSS might. I don’t have SPSS myself, otherwise I’d try it out. I do have confidence that Minitab is calculating the correct values. There must be some methodology difference.

Mona says

what does it mean when I have a significant interaction effect only when i omit the main effects of the independent variables (by choosing the interaction effect in “MODEL” in SPSS). it is “legal” to report the interaction effect without reporting the main effects?

Jim Frost says

Hi Mona,

That is a bit tricky.

If you had one model where the main effects are not significant, but the interaction effects are significant, that is perfectly fine.

However, it sounds like in your case you have to decide between the main effects or the interaction effects. Models where the statistical significance of terms change based on the specific terms in the model are always difficult cases. This problem often occurs (but is not limited to) in cases where you multicollinearity–so you might check on that.

This type of decision always comes down to subject area knowledge. Use your expertise, theory, other studies, etc to determine what course of action is correct. It might be OK to do what you suggest. On the other, perhaps including the main effects is the correct route.

Jim

Apple says

what is the command for conintuous by continuous variables interaction plot in stata?

Thanks

Jim Frost says

Hi, I’ve never used Stata myself, but I’ve seen people use “twoway contour” to plot two-way interaction effects in Stata. Might be a good place to start!

Sol says

Hi Jim, thank you very much for your post. My question is how do you interpret an insignificant interaction of a categorical and a continuous variable, when the main effects for both variables are significant? For the sake of simplicity if our logit equation is as follows Friendliness = α + βAge + βDog + βAge*Dog. Where Friendliness and Dog are coded as dummy variables that take the values of either 1 or 0 depending on the case. So if all but the interaction term, βAge*Dog, is significant, does that mean the probability of a dog being friendly is independent of its age?

Jim Frost says

If the Age variable is significant, then you know that friendliness

isassociated with age, and dog is as well if that variable is significant. A significant interaction effect indicates that the effect of one variable on the dependent variable depends on the value of another variable. In your example, lets assume that the interaction effect was significant. This tells you that the relationship between age and friendliness changes based on the value of the dog variable. In that case, it’s not a fixed relationship or effect size. (It’s also valid to say that the relationship between dog and friendliness changes based on the value of age.)Now, in your case, the interaction effect is not significant but the two main effects are significant. This tells you that there is a relationship between age and friendliness and a relationship between dog and friendliness. However, the exact nature of those relationships DO NOT change based on the value of the other variable. Those two variables affect the probability of observing the event in the outcome variable, but one independent variable doesn’t affect the relationship between the other independent variable and the dependent variable.

The fact that you have one categorical variable and a continuous variable makes it easier to picture. Picture a different regression line for each level of the categorical variable. These fitted lines display the relationship between the continuous independent variable and the response for each level of dog. A significant interaction effect indicates that the differences between those slopes are statistically significant. An insignificant interaction effect indicates that there is insufficient evidence to conclude that the slopes are different. I actually show an example of this situation (though not with a logistic model) that should help.

I hope that makes it more clear!

Luka says

Hello,

I am interested how to read for interaction effect if we just have a table of observations, for example

A B C

2 4 7

4 7 8

6 9 13

In the lecture I attended this was explained as “differences between differences” but I didn’t get what this refers to.

Thanks

Jim Frost says

Hi Luka, it’s impossible to for me to interpret those observations because I don’t know the relationships between the variables and there are far too few observations.

In general, you can think of an interaction effect as an “it depends” effect as I describe in this blog post. Suppose you have two independent variables X1 and X2 and the dependent variable Y. If the relationship between X1 and Y changes based on the value of X2, that’s an interaction effect. The size of the X1 effect depends on the value of X2. Read through the post to see how this works in action. The value of the interaction term for each observation is the product of X1 and X2 (X1*X2).

An effect is the difference in the mean value of Y for different values of X. So, if the interaction effect is significant, you know that the differences of Y based on X will vary based on some other variable. I think that’s what your instructor meant by the differences between differences. I tend to think of it more as the relationship between X1 and Y depends on the value of X2. If you plot a fitted line for X1 and Y, you can think of it as the slope of the line changes based on X2. There’s a link in this blog post to another blog post that shows how that works.

I hope this helps!

Syahmi says

Your explanation is really great! Thank you so much. I totally will recommend you to my friends

Jim Frost says

You’re very welcome! Thank you for recommending me to your friends!

Luka says

Thanks for help, I appreciate it!

Yeasin says

Great work Jim! People get very vague idea whenever they look at google to learn the basic about interaction in statistics. Your writing is a must see and excellent work that demonstrated the basic of interaction. Thanks heaps.

Jim Frost says

Hi Yeasin, thank you! That means a lot to me!

Tanikan says

Hi Jim,

Thank for the valuable tutorial.

I have 2 questions as follows:

1. In more complex study areas, the independent variables might interact with each other. What do you mean by complex area? Is it social science?

2. I have run Mancova and observed that results of two-way = interaction. I found that SPSS does not run post-hoc. Can I use the t-test after that?

My model is factorial design (2 levels of X1, 2 levels of X2, and 2 levels of X3) on Y.

I report in paper for two-way and three way interaction on below. Is it ok?

Two-way interaction

Among the X2 level 1 group, the mean of Y among subjects who viewed X3 level 2 (adjusted M = xxx, SE =xxx) is significantly higher than those who viewed X3 level 1 (adjusted M = xxx, SE = xxx) with t(xx) = xx, p < xx.

three-way interaction

Among the subjects who viewed the X3 level 2, the mean of Y of the subjects who expressed X1 level 2 (adjusted M = xxx, SE = xxx) is significantly greater than those who expressed X1 level 1 (adjusted M = xxx, SE = xxx) for those who had X2 level 1 [t(xx) = xxx, p < xxx].

Thank you in advance

Jim Frost says

Hi Tanikan,

Thanks for the great questions!

Regarding more and less complex study areas, in the context of this post, I’m simply referring to subject matter where only main effects are statistically significant as being simpler. And, subject areas where interaction effects are significant as more complex. I’m calling them more complex because the relationship between X and Y is not constant. Instead, that relationship depends on at least one other variable. It’s just not as simple.

I would not use t-tests for that purpose. I’m surprised if SPSS can’t perform post-hoc tests when there are interaction effects–but I use other statistical software more frequently. With your factorial design, there will be multiple groups based on the interactions of your factors. As you compare more groups, the need for controlling the family/joint/simultaneous error rate becomes even more important. Without controlling for that joint error rate, the probability that at least one of the many comparisons will be a false positive increases. T-tests don’t control that joint error rate. It’s important to use a post hoc test.

At least for the two-way interaction effects, I highly recommend using an interaction plot (as shown in this post) to accompany your findings. I find that those graphs are particularly helpful in clarifying the results. Of course, that graph doesn’t tell you which specific differences between groups are statistically significant. The post hoc tests for those groups will identify the significant differences.

I hope this helps!

Alicia says

Hi, Jim!

I have a sort of somehow interaction-related question, but I didn’t know where to post it, so this entry seemed the most adequate to me.

I work with R and I would like to use an ANCOVA to evaluate the effect of a factor (age, for example, with two levels, adult and subadult) in the regression of body length (log transformed, logLCC) and weight (log transformed, logweight). This regression measures body condition of an individual (higher weights at same lenghts indicate a better condition, that is, sort of “fluffyness”).

So, when I run the analysis:

aov(logweight~logLCC*age)

I obtain a significant interaction between logLCC:age (p=0.0068). I understand this means that slopes for each age class are not paralell. However, the factor age alone it’s not significant (p=0.2059).

What does this mean? How is it interpreted?

I have tried deleting the interaction from the model, but it loses a lot of explicative power (p=0.0068). So, what should I do? I am quite lost with this issue…

Thank you so much in advance,

Alicia

Jim Frost says

Hi Alicia!

First, before I get into the interaction effect, a comment about the model in general. I don’t know if you’re analyzing human weight or not. But, I’ve modeled Percent Body Fat and BMI. While I was doing that, I had to decide whether to use Height, Weight, and Height*Weight as the independent variables and interaction effect or should I use body mass index (BMI). I found that both models fit equally as well but I went ahead with BMI because I could graph it. I did have to include a polynomial term because the relationship was curvilinear. I notice that you’re using a log transformation. That might well be just fine and necessary. But, I found that I didn’t need to go that route. Just some food for thought. You can read about this BMI and %body fat model.

Ok, so on to your interaction effect. It’s not problematic at all that the main effect for age is not significant. In fact, when you have a significant interaction you shouldn’t try to interpret the main effect alone anyway. Now, if it had been significant and you wanted determine the entire effect of age, you would’ve had to assess both the main effect and the interaction effect together. Now, you just need assess the interaction effect alone. But, it’s always easiest to interpret interaction effects with graphs, as I do in this blog post.

In the post, I show examples of interaction plots with two factors and another with two continuous variables. However, you can certainly create an interaction plot for a factor * continuous variable. For your model, this type of graph will display two lines–one for each level of the age factor. Because you already know the interaction term is significant, the difference between the two slopes is statistically significant. (If the main effect had been significant, the interaction plot would have included it in the calculations as well–but it is fine that it’s not significant.)

It sounds like you should leave the interaction effect in the model. Some analysts will also include the main effects in the model when they are included in a significant interaction effect even if the main effect is not significant by itself (e.g., age). I could go either way on that issue myself. Just be sure that the interaction makes theoretical/common sense for your study area. But, I don’t see any reason for concern. The insignificant main effect is not a problem.

I hope this helps!

Alicia says

Hi Jim,

first of all… thank you very much for your early response!

And after that… I am so sorry! I forgot to explain that I work with lizards, not with humans. My measurement of body length (logLCC) corresponds to the log-transformed Snout-Vent Length (logSVL, whose acronym in spanish, given that it’s my mothertongue, is LCC; I forgot to translate it!). The relationship among these two variables tend to be linear.

So, in these animals, the regression of logSVL and logweight is a common and standardized method to assess body condition. Residuals from this regression are used to assess body condition; if they’re positive the animal is more “chubby” (better condition) and, if they’re negative, the animal is more “skinny” (worse condition). The aim of my ANCOVA is to compare the effect of age on this regression.

Anyway, following your advice I created an interaction plot which displays two lines, one for each level of the age factor. The two lines cross in a certain middle point, diverging prior and after that point. Thanks to your detailed answer, I understand that this means that age interacts somehow with body length (what sounds logical, as lizards grow together with aging), but I still don’t know how to interpret this in relation to body condition (regression).

Thanks again for your detailed, kind and early response!

Jim Frost says

You’re very welcome! And, subject area knowledge and standards definitely should guide your model building. I always enjoy learning how others use these types of analysis. And, that’s interesting actually using the residuals to assess a specimen’s health!

If you can, and are willing, post the interaction plot, I can take a stab at interpreting it. (I know I can post images in these comments but I’m not sure about other users.) Basically, the relationship between body length and weight depends on the age factor. Or, stated another way, you can’t use body length to predict weight until you know the age.

Alicia says

Hi, Jim!

Thank you again for your willingness! Unfortunately, I can’t /don’t know how to post the plot in the comments… If you are willing, you can contact me by email so I can send it to you, plus the results of the regression or whatever information that could be helpful.

Thank you!

Shruti says

Hi Jim,

Thanks for your explanation! It was really useful. I have a couple of follow-up questions. Let’s suppose a situation with 2 regression models, both of which have the exact same variables, except the second model has an additional interaction term between two variables already in the first model.

1. Now comparing the 2 regression equations, why do coefficients of other variables (apart from the interaction term and the 2 variables used to create the interaction term) change?

2. How do we compare and interpret the change in coefficients of variables which were used to create the interaction term in the first and second models?

Let me know in case it’s better for me to explain with an example here.

Thanks!

Jim Frost says

Hi Shruti,

I think I understand your questions.

1) Any time you add new terms in the model, the coefficients can change. Some of this occurs because the new term accounts for some of the variance that was previously accounted for by the other terms, which causes their coefficients to change. So, some change is normal. The changes can tend to be larger and more erratic when the model terms are correlated. The interaction term is clearly correlated with the variables that are included in the interaction. When you include an interaction term, you can help avoid this by standardizing your continuous variables.

2) I have heard about cases where analysts try to interpret the changes in coefficients when you add new terms. My take on this is that the changes are not very informative. Let’s assume that your interaction term is a valuable addition to the model. In that case, you can conclude the model without the interaction term is not as good of a model and it’s coefficient estimates might well be biased. Consequently, I wouldn’t attribute much meaning to the change in coefficient values other than your new model with the interaction term is likely to better.

However, one caveat, I believe there are fields that do place value in understanding those changes. I’m not sure that I agree, but if your field is one that has this practice, you should probably check with an expert.

I hope I covered everything!

Susanne says

Hello Jim!

Thanks for making such very clear posts. I tutor students with stats and its really tough to find good easy to follow material that EVERYONE can get. So to stumble on such a clear explanation is a breath of fresh air 😀

Now I recently saw in one of my students powerpoints that they are taught they have to redo the ANOVA analysis without the interaction if the interaction is not significant. Maybe i’ve always missed something but I have never heard of this before. Does this sound familiar to you and if so can you explain to me why this is?

thanks!

Susanne

Jim Frost says

Hi Susanne, thanks so much for your kind words. They mean a lot to me–especially coming from a stats tutor!

I have always heard that you should not include the interaction term when it is not significant. The reason being is that when you include insignificant terms in your model, it can reduce the precision of the estimates. Generally, you want to leave as many degrees of freedom for the error as you can.

Courtney Barrs says

Hi Jim,

Thankyou for this post, I found it incredibly helpful.

I am having trouble interpreting my own results of a two-way repeated ANOVA and was wondering if you could help me out.

Participants were exposed to two different videos, controlled with a counter balance. Video 1 consisted of a comedy sketch, while video 2 was of a nature documentary. Every 2 mins the participants had to indicate on a likert scale how Bored they felt at the time. For the analysis I averaged the boredom score over the first and second half of the video.

IV1: Video (Comedy vs Nature)

IV2: Time (Time 1 vs Time 2)

DV: Boredom score

My analysis output reveals a significant main effect of video p<.000, and non significant effect for time p=.192. However I have an effect of interaction for video*time, p<.000.

How would you go about interpreting these results?

Thanks in advance!

Jim Frost says

Hi Courtney,

I’m happy to hear that you found this post helpful!

The first thing that I’d recommend is graphing your results using an interactions plot like I do in this post. That’s the easiest way to understand interactions. It’s great that you’ve done the ANOVA test because you already know that whatever pattern you see in the plot is, in fact, statistically significant. Given the significance, I can conclude the lines on your plot won’t be parallel.

For your results, you can state them one of two ways. Both ways are equally valid from a statistical standpoint. However, one way might make more sense than the other given your study area or what you’re trying to emphasize.

1) The relationship between Video and Boredom depends on Time. Or:

2) The relationship between Time and Boredome depends on Video.

For the sake of illustration, let’s go with #2. You might be able to draw the conclusion along the lines of: As subjects progress from time 1 to time 2, the average boredom score increases more slowly for those who watch comedy compared to those who watch a nature documentary. Of course, you’d have to adapt the wording to match your actual results. That’s the type of conclusion that you can draw, and you’re able to say that it is statistically significant given the p-value for the interaction term.

Given that the interaction term is significant, you don’t need to interpret the main effects terms at all. And, it’s no problem that one of the main effects is not significant.

I hope this helps!

Courtney says

Hi Jim,

Thankyou so much for your quick and helpful response, it really means a lot!

This is what initially confused me when it came to interpreting my results, looking at my interaction graph there was no cross over. Both conditions are more or less parallel with one another, the gradient between time 1 and time 2 for comedy is almost 0. However, there is quite the drop for the nature video in the boredom rating at time 2.

Because the interaction graph does not cross over, does this mean that only in the Nature video does the boredom decrease significantly at Time 2? Will I need to conduct a t-test to check this?

Many thanks!

Courtney

Courtney Barrs says

Hi Jim,

Thankyou for such a quick and helpful response!

Graphing the interaction effect is actually what confused me when it came to interpretting my results. The conditions are actually parallel to one another, there is no cross over. The gradient for the comedy condition is almost zero, whereas, there is a dramatic drop in rating of boredom between time 1 and time 2 for the nature video.

With this in mind does the interpretation then mean: A difference in boredom is found across time depending on condition. Therefore, only if you are watching the nature video will you become significantly more bored at time 2. Will I need to conduct a t-test to conform this?

Many thanks!

Courtney

Jim Frost says

Hi Courtney,

You bet! 🙂

Technically, a significant interaction effect means that the difference in slopes is statistically significant. The lines don’t actually have to cross on your graph–just have different slopes. Well, having different slopes means that the lines must cross at some point theoretically even if that point isn’t displayed on your graph.

As for the interpretation, the zero slope for comedy indicates that as time passes, there is no tendency to become more or less bored. However, for nature videos, as time passes, there is a tendency to become more bored. (I’m assuming that the drop in rating that you mention corresponds to “becomes more bored”.) This difference in tendencies is statistically significant. The significant interaction indicates that the relationship between the passage of time and boredom depends on the type of video the subjects watch.

Again, an interaction effect is an “it depends” effect. Do the subjects become more bored over time? It depends on what type of video they watch! You can’t answer that question without knowing which video they watch.

So, the interaction tells you that the difference in slopes is statistically significant, which is different than the whether the difference between group means are statistically significant. To identify the specific differences between group means that are statistically significant, you’ll need to perform a post hoc test–such as Tukey’s test. These tests control the joint error rate because as you increase the number of group comparisons, the chance of a Type I error (false positive) increases if you don’t control it. I don’t have a blog post on this topic yet but plan to write one.

The interaction term tells you that the relationship changes while the post hoc test tells you whether the difference between specific group means is statistically significant.

Saheeda says

This is one of the best explanations I have read to explain ‘interactions’. Thanks!

Jim Frost says

Thanks so much, Saheeda! Your kind words mean a lot to me! I’m glad it was helpful.

Bill says

Hello. Jim. Thank for your great article.

Sorry in advance for my English. Moreover, my understanding for SPSS and stat is quite limited so some question might be silly.

I’m doing 4×5 factorial ANOVA. One of the test has Sig. interaction effect but I don’t know what exact method should I interpret it. Some told that I need to do simple main effect test, some told that Post Hoc is enough so I’m quite confused.

Another test the graph shown some cross-over line (because there are a lot of levels of iv) but the sig. value is 0.069 = not significant interaction effect right?. However I’ve read that if the line crossed, the interaction is exist. So how should I summarize?

I’m willing to send the information for you if u need.

Thank you.

Bill

Jim Frost says

Hi Bill,

You have some excellent questions!

When you have a significant interaction effect, you know you can’t interpret the main effects without considering the interaction effects. As I show in the post, interaction effects are an “it depends” effect. The interpretation for one factor depends on the value of another factor. If you don’t assess the interaction effect, you might end up putting ketchup on your ice cream!

Assessing the Post Hoc test results can be fine by itself as long as you include the interaction term in the ad hoc test. Taking that approach, you’ll see the groupings based on the interaction term and know which groups are significantly different from each other. I also like to graph the interaction plots (as I do in this post) because it provides a great graphical overview of the interaction effect.

There’s an important point about graphs. They can be very valuable in helping you understand your results. However, they are not a statistical test for your data. An interaction plot can show non-parallel lines even when the interaction effect is not significant. When you work with sample data, there’s always the chance that sample error can produce patterns in your sample that don’t exist in the population. Statistical tests help you distinguish between real effects and sample error. These tests indicate when you have sufficient evidence to conclude that an effect exists in the population.

When you have crossed lines in an interaction plot but the test results are not statistically significant, it tells you that you don’t have enough evidence to conclude that the interaction effect actually exists in the population. Basically, the graph says that the effect exists in the sample data but the statistical test says that you don’t have enough evidence to conclude that it also exists in the population. If you were to collect another random sample from the same population, it would not be surprising if that pattern went away!

I hope this helps!

Bill says

Thanks for your help. I really appreciate.

Might need your help again after I finished the post hoc.

Hope you okay with that. Haha.

Again, THANK YOU.

Sincere,

Bill

Hakim says

Thank Jim, your explanation is very nice to follow, by the way, i have a model e.e. growth=average year of schooling +political stability+average year of schooling*political stability. the stata output gives individual coefficient positive while interactive coefficient negative. unfortunately i been asked by the reviewer to explain why interaction sign is negative any statistical or theoretical explanation please.

Jim Frost says

Hi Hakim, it’s difficult to interpret the coefficients for interaction terms directly. However, I can tell you that there is nothing at all odd about having a negative sign for an interaction term. Interaction terms modify the main effects. Sometimes it adds to them while other times it subtracts. It all depends on the nature of what you’re studying.

I’d suggest creating interaction plots, like I do in this post, because they’re much easier to understand than the interaction coefficients. Look through the plots to see whether they make sense given your understanding of the subject-area. These plots are a graphical representation of the interaction terms. Therefore, if the plots make sense, your model is good to go. If they don’t, then you need to figure out what happened. I think the reviewers will find the plots easier to understand than the coefficient.

I hope this helps!

Hakim says

Thanks Jim for your quick response and comprehensive explanation..

Ting-Chun Chen says

Hi Jim,

May I ask what reference about interaction effect do you suggest to study?

I want to know more about interaction effect in clinical trial.

Thank you.

Sincere,

Ting-Chun

Jim Frost says

Hi Ting-Chun, most any textbook about regression analysis, ANOVA, or linear models in general will explain interaction effects. My preferred source is Applied Linear Regression Models. That’s a huge textbook of 1400 pages, but that’s why I like it! I don’t have a reference specifically for interaction effects, but would recommend something that discusses linear models in all of its aspects.

I hope this helps!

Jim

Ting-Chun Chen says

Thanks for your help and your quick response. I really appreciate.

Again, THANK YOU.

Sincere,

Ting-Chun

demmie says

how does interaction affect my study statistically

Jim Frost says

Hi Demmie, this is the correct post for finding that answer. Read through it and you’ll find the answer you’re looking for. If you have a more specific question, please don’t hesitate to ask!

Anoop says

Hello jim,

What if want to know 1. How does Icecream and hotdog affect enjoyment by itself

2. How does icecream and hotdog affect enjoyment when condiments are included?

In this case, isn’t both the main effect and interaction are equally important for a researcher?

Jim Frost says

Hi Anoop,

Great questions! You can see how ice cream and hot dog affect enjoyment by themselves by looking at the main effects plot. This plot shows the enjoyment level that each food produces is approximately equal.

Yes, understanding main effects like these are important. However, when there are significant interaction effects, you know that the main effects don’t tell the full story. In this case, the main effect for, say hot dog, doesn’t describe the full effect on enjoyment. The interaction term includes hot dog, so you know that some of hot dog’s effect is also in the interaction. If you ignore that, you risk misinterpreting the results. As I point out in the blog, if you go only by main effects, you’ll choose a hot dog . . . with chocolate sauce. You’d pick the chocolate sauce because it’s main effect is larger than mustard’s main effect.

To see how ice cream and hot dogs affect enjoyment when you include the interaction effect, just look at the interaction plot. The four points on that plot show the mean enjoyment for all four possible combinations of hot dog/ice cream with chocolate sauce/mustard. It displays the total effects of main effects plus interaction effects. For example, the interaction plot shows that for hot dogs with mustard, the mean enjoyment is about 90 units (the top-left red dot in the graph). Alternatively, you could enter values into the equation to obtain the same values.

I’d agree that understanding both main effects and interaction effects are important. My point is that when you have significant interaction effects, don’t look at only the main effects because that can lead you astray!

Satu says

Hi Jim!

Thank you very much for your blog site, you explain things well and understandable, thank you for that!!

I would still like to make sure, that I understand correctly what you said before.. I am running a repeated measures ANOVA and I am struggling with interpretations of interactions. So, is it so, that if the interaction effect is not significant, then you should not interpret the multivariate comparisons between groups? I have a model with 5 groups and I am trying to see if there are any differences between them in the change of X variable in two time points. In multivariate tests it shows that the change would be different in one of the groups (also the plot figure shows that), but the overall interaction effect is significant. So what would be the right way to interpret the results? Just say that there were no significant interaction i.e. tha change was similar in all groups, or say that one group was different but the interaction effetc was not (for some reason?).

Thank you already for your answer!

Satu

Michela says

Hi Jim,

This blog post is so useful thank you very much! I have however still fail to interpret one of my statistics output. I carried out a two-way mixed ANOVA analysis and inputted these data:

– between-subject variable is two therapy techniques (MD and RT)

– within-subject variable (Time with 3 levels: pre, mid and post)

– dependant variable was well-being scores.

I ran the analysis and found that for the between-subject variables there were no significant difference between the well-being scores for MD and RT therapies. However when looking at my within-subject variables. The table stated that there was a significant main effect of Time on wellbeing scores but no significant interaction between Time*Therapy on well-being scores.

Am i right in implying that with the significant main effect of time it basically states that over-time, wellbeing scores improved, independent of the therapy techniques. Can i then conclude RT and MD positively improved well-being in general and that not one is better then the other? Or is that wrong? As one of my hypothesis states that MD and RT will have a positive effect on wellbeing scores.

Thank you so much for taking time to read this and helping me !!

Michela

Jim Frost says

Hi Michela,

Your interpretation sounds correct. The data you have suggests that as time passes, well being increases. You don’t have evidence that either treatment works better than the other. Often you’d include a control group (no treatment). It’s possible that there is only a time effect and no treatment effect. A control group would help you establish whether it was the passage of time and/or the treatments.

In other words, right now it could be that both treatments are equally effective. But, it’s also possible that neither treatment is effective and it’s only the passage of time–as the saying goes, time heals all wounds!

Michela says

Hi Jim,

Thanks for your reply. Yes that was one of the problems that was pointed out in my dissertation; was that it did not have a control group that was compared to :/ It was due to the fact that alongside time constraints, the sample size was already so small so it was difficult to get enough people to make 3 separate groups :/ So should am i wrong to accept the hypothesis that both RT and MD has a positive effect on wellbeing levels? Or do i have to reject that as i did not have a control group?

Kind Regards,

Michela

Jim Frost says

Hi Michela,

Unfortunately, it is hard to draw any conclusions about the treatments. It’s possible that both had the same positive effect on well being. However, it’s also possible that neither had an effect and instead it was entirely the passage of time. I definitely understand how it is hard to work with a small sample size!

If other researchers have studied the same treatments, you can evaluate their results. That might lend support towards your notion. But, that’s a tenuous connection without a control group.

Best wishes to you!

Jim

Anoop says

Hi Jim,

I have an interaction significant ( 0.004) for supplement use and physical activity interaction. The nonusers had a Hazard Ratio 0f 0.61(.46-0.80) ( lower risk) where users had a HR 1.40 (.85-2.3) ( high risk). My question is although it looks like a qualitative interaction ( opposite in direction), since the users CI crosses margin of no difference, how do you interpret it? Can we say users had a higher hazard when combined with PA?

Thank you

Jim Frost says

HI Anoop,

I can’t interpret the main effect of supplement use without understanding the interaction effect. Can you share, the hazard ratios for your interaction. In other words, the ratios for the following groups: user/high activity, user/low activity, non-user/high activity, and non-user low-activity.

I don’t know how you recorded activity, so those groups are just an example. Then we can see how to interpret it!

Thanks!

Jim

Anoop says

Hey Jim,

Not sure why ur posting doesn’t show. But it shows in my email.

This is a trial is looking at if physical activity vs Control can reduce physical disability. We are looking at a certain supplement users vs nonusers in the trial. Interaction was significant ( p=.003)

PA C

Users 7.1 6.1 HR 1.40 (.85 – 2.3)

Nonusers 5.4 10.2 HR 0.61(.46 – 0.80)

How do you interpret this result?

Thank you so much. Also you should start a youtube page. We need more people like you in this world 🙂

Jim Frost says

Hi again Anoop,

I checked and I see my comment showing up under yours. I think it might be a browser caching thing that is causing you not to see my reply on the blog post. Refresh might do the trick.

At any rate, this example will also show the importance of several other concerns in statistics–namely understanding the subject area, the variables involved, and statistical vs. practical significance. So, with that said, let’s take a look at your results!

I’m not sure what the dependent variable is, but I’ll assume that higher values represent a greater degree of disability. If that’s not the case, you got really strange results! In the interaction table you provided, I see three group means that are roughly equal and one that stands out. I’m not sure if the differences between any of those three group means (5.4, 6.1, and 7.1) are statistically significant. You can perform a post hoc analysis in ANOVA to check this (I plan to write a blog post about that at some point). Even if they are significant, you have to ask yourself if those differences are practically significant given your knowledge of the subject area and the dependent variable. I don’t know the answer to that.

And, then there is the one group mean (10.2) that is noticeably different than the other three groups. To me, it looks like that subjects in the control group who don’t use the supplement have particularly bad results. And, the other three groups might represent a better outcome. Again, use your subject-area knowledge to determine how meaningful this result is in a practical sense.

If that’s the case, it suggests to me that subjects have better outcomes as long as they use the supplement and/or engage in physical activity. In other words, the worst case is to not do either the activity or use the supplement. If you do one or both of physical activity and supplement usage, you seem to be better off in an approximately equal manner. And, again, I don’t know if the differences between the other three outcomes are statistically significant and practically significant. In other words, those differences could just represent random sample error and/or not be meaningful in a practical sense.

I hope this clarifies things! And, yes, I do plan to start a YouTube channel at some point. I need to finish a book that I’m working on first though!

Take care,

Jim

Anoop says

Thank you for the long post Jim!

I used a cog regression model and the results is hazard ratio’s. The trial is physical activity vs control. And we are doing a subgroup analysis with the supplement.

The above table shows for Users the CI is 1.40 ( .85 to 2.3) and not significant.

For nonusers, the HR shows 0.61 ( 0.46-.80) and significant.

And the interaction between these two is significant. My question is isn’t this an example of qualitative interaction where the direction is opposite for users vs non-users. Like if you plot the forest plot, the lines are on 2 sides of no difference line.?

Jim Frost says

Hi Anoop,

The interesting thing about statistics is that the analyses are use in a wide array of fields. Often, these fields develop their own terminology for things. In this case, I wasn’t familiar with the term qualitative interaction, but it seems to be used in medical research. I’ve read that a qualitative interaction occurs when one treatment is better for some subsets of patients while another treatment is better for another subset of patients. It sounds like a qualitative interaction occurs when there is a change in direction of treatment effects. A non-crossover interaction applies to situations where there is a change in magnitude among subsets but not of direction.

So, I learned something new about how different fields apply different terminology to statistical concepts!

I’m not sure why you’d have only two hazard ratios when you know that the interaction effect is significant? Right there you know that you can’t interpret the main effect for supplement usage without knowing the physical activity level. It seems like you’d need 4 hazard ratios.

As for whether this classifies as a qualitative interaction given the definition above, you’ll first have to determine whether those differences between the three groups I identified before are both statistically significant and practically significant. If the answers to both questions are yes, then it would seem to be a qualatative interaction. However, if either answer is no, then I don’t think it would. And, I’m going by your dependent variable. If you want to answer that using hazard ratios, you’d need four of them as I indicate above. You can’t answer that question with only two ratios.

I hope this helps!

Mei says

Hi Sir. Thank you for this wonderful post as this is very helpful. But I still can’t seem to understand or interpret my interaction plot. My main effects are significant and my interaction effect are also significant but then looking at the regression coefficient (result from SPSS), moderator(IV2) is a negative significant predictor of DV but looking at my interaction plot, they are both positive significant predictor? I’m not sure if you get it because I am also having difficulty explaining the situation because I am just a beginner when it comes to psychological statistics. Thank you in advance, Sir!

Jim Frost says

Hi Mei, I don’t understand your scenario completely. However, there is nothing wrong with having positive coefficients for main effects and negative coefficients for interaction effects. When you have significant interaction effects, then the total effect is the main effect plus interaction effect. In some cases, the interaction effect adds to the main effect but sometimes it subtracts from it. It’s ok either way. I find that assessing the interaction plots is the easiest way to interpret the results when you have significant interaction effects.

Habtamu Tolera says

I do have 20 IV binary or categorical variables and one binary DV. My question is shall I check col linearity first and run bi variate analysis or otherwise. help me please

Habtamu Tolera says

do have 20 IV binary or categorical variables and one binary DV. My question is shall I check col linearity first and run bi variate analysis or otherwise. help me please

Mei says

Thank you for the reply, Sir. I will do my best to interpret the interaction plot. 🙂

Erick Turner says

Jim, like many others here, I love your intuitive explanation.

I thought it would be a good exercise to replicate what you did in your example. (I’m using Stata, and I understand you don’t use that, but the results should still be the same.) Unfortunately, I’m having trouble replicating your results and I don’t know why. Using values of 0 and 1 for each of the IVs, I’m getting significant results for both of them and for the interaction variable, while you got NS results for one of the IVs.

I’ll paste the output below. (Sorry, the formatting got lost.)

. regress enjoyment food_01 condiment_01 food_cond

Source | SS df MS Number of obs = 80

————-+—————————— F( 3, 76) = 212.43

Model | 15974.9475 3 5324.98248 Prob > F = 0.0000

Residual | 1905.09733 76 25.0670701 R-squared = 0.8935

————-+—————————— Adj R-squared = 0.8892

Total | 17880.0448 79 226.329681 Root MSE = 5.0067

——————————————————————————

enjoyment | Coef. Std. Err. t P>|t| [95% Conf. Interval]

————-+—————————————————————-

food_01 | -28.29677 1.583258 -17.87 0.000 -31.45011 -25.14344

condiment_01 | -24.28908 1.583258 -15.34 0.000 -27.44241 -21.13574

food_cond | 56.02826 2.239065 25.02 0.000 51.56877 60.48774

_cons | 89.60569 1.119533 80.04 0.000 87.37594 91.83543

——————————————————————————

Any clue as to what’s I’m doing wrong?

Jim Frost says

Hi Erick, offhand I don’t know what could have happened. As you say, the results

shouldbe the same. I’ll take a closer look and see if I can figure anything out.Michela says

Dear Jim,

I found your blog while trying to find an answer to a reviewer comment to a paper I submitted.

So now I am looking for answers.

One of my hypothesis was on a moderated mediation model.

Considering the moderation I have (measured as continuous variables):

X=job demands

M (moderator)= team identification

Y= workplace bullying

The fact is that when I looked at the results the effect of X on Y is positive; the effect of M on Y is negative but my problem is that I have the interaction term (X*M) that is positive, while I (and especially the reviewer) was expecting a negative effect.

The graph makes sense to me (and partly the reviewer) but he/she is expecting that I am giving him/her some explanation about this positive interaction effect.

I hope you could help me in explaining me why and explain that to the reviewer!

Jim Frost says

Hi Michela,

I seem to have been encountering this question frequently as of late! The answer is that the coefficient for an interaction term really doesn’t mean much by itself. After all, the interaction term is a product of multiple variables in the model and the coefficient. Depending on the combination of variable values and the coefficient, a positive coefficient can actually represent a negative effect (i.e., if the product of the variable values is negative). Additionally, the overall combined effect of the main effect and interaction effect can be negative. It might be that the interaction effect just makes it a little less negative than it would’ve been otherwise. The interaction term is basically an adjustment to the main effects.

Also, realize that there is a bit of arbitrariness in the coefficient sign and value for the interaction effect when you use categorical variables. Linear models need to create indicator variables (0s and 1s) to represent the levels of the categorical variable. Then, the model leaves out the indicator variable for one level to avoid perfect multicollinearity. Suppose you have group A and group B. If the model includes the indicator variable for group A, then 1s represent group A and 0 represents not group A. Or, it could include the indicator variable for group B, then 1s represent group B and 0 represents not group B. If you have only two groups A and B, then the 1s and 0s are entirely flipped depending on which indicator variable the model includes. You can include either indicator variable and the overall results would be the same. However, the coefficient value will change including conceivably the sign! You can try changing which categorical level the software leaves out of the model, which doesn’t change the overall interpretation/significance of the results but can make the interpretation more intuitive.

Finally, it’s really hard to gain much meaning from an interaction coefficient itself for all the reasons above. However, you can see the effect of this term in the interaction plot. As long as the interaction plot makes sense theoretically, I wouldn’t worry much about the specific sign or value of the coefficient. I’d only be worried if the interaction plots didn’t make sense.

I hope this helps!

Erick Turner says

Mystery solved! It wasn’t an issue of the difference in software but rather in the type of model. I had asked Stata to run a regression model and got output that didn’t match up. However, when I ask Stata to run ANOVA (including the interaction term), I got output that matched yours. For other Stata users, the syntax to use is “anova enjoyment food condiment food#condiment”.

Jim Frost says

Hi Erick, thanks so much for the update! I had rerun the analysis to be sure that I hadn’t made a mistake, but it produced the same results in the blog post. I guess this goes this goes to show how crucial it is to know what your statistical software is doing exactly! I still wonder what produced the difference between the regression and the ANOVA model because they both use the same math underneath the hood? In other words, what is different between Stata’s regression and ANOVA model?

Erick Turner says

However, I’m still puzzled as to why I got such different output when I transformed the data to 0/1 dummy variables, created an interaction variable, and then ran regression.

Erick Turner says

I see our replies crossed in cyberspace and are that we are similarly puzzled. I’m assuming you ran an ANOVA routine and that it gives you regression output automatically. Just out of curiosity, what if you were to convert your variables to 0/1 and ask your software to just run regression?

Jim Frost says

I used regression analysis in Minitab and it automatically creates the indicator variables behind the scenes. So, I just told it to fit the model. Depending on which level of each categorical variable that the software leaves out, you’d expect different numeric results (although, they’d tell the same story). You wouldn’t expect differences in what is and is not significant though. I wonder if STATA possibly uses sequential SS for one of it’s analyses? Minitab by default uses adjusted SS. Using Seq. SS could change which variables are significant. I was going to test that but haven’t tried yet.

Dan Mark says

First of all, thank you for the clear explanation. It is hard sometimes to find someone who can explain it in plain English!

Secondly, I still face an issue what to put on my axis in my research. I saw in your explanation that you put the dependent variable, the interaction term and one independent variable on the axis. My question is why you did not put both the independent variables that are in the interaction term, and the interaction term on the axis.

Already many thanks!

Jim Frost says

Hi Dan,

Thanks so much. I work really hard to find the simplest way to explain these concepts yet staying accurate!

Graphing relationships for multiple regression can get tricky. The problem is that the typical two-dimensional graph has only two-axes. So, you have to figure out the best way to arrange these two axes to produce a meaningful graph. This isn’t a problem for simple regression where you have one dependent variable and one independent variable. You can graph those two variables easily on fitted line plots. You have as many variables as you have axes.

Once you get to multiple regression you will have more than two variables (one DV, and at least 2 IVs, and possibly other terms such as interaction terms) than axes. You definitely want to include the dependent variable on an axis (typically the y-axis) because that is the variable you are creating the model for. Then, you can include one IV on the X-axis. At this point, you’ve used up your available axes! The solution is to use separate lines to represent another variable (as shown in the legend). That’s how you get the two IVs into the graph that you need for a two-way interaction. Then you just assess the patterns of those lines.

Instead, if I had put an IV on both X and Y-axes, the graph would not display the value of the DV. The whole point of regression/ANOVA is to explore the relationships with the DV. Consequently, the DV has to be on the graph.

I hope this helps clarify the graphs! The interaction plots I show in this post are the standard form for two-way interactions.

Marlie Greeff says

Dear Jim

Your blog is amazing! Makes everything more understandable for someone with no stats background! Thank you!

Jim Frost says

Hi Marlie, thanks so much for your nice comment. It means a lot to me because that’s my goal for the blog! I’m glad it’s been helpful for you.

Joe R says

Hi Jim,

Thanks for this blog post, really appreciate your efforts to break things down in a simple, intuitive and visual way.

I am a bit confused by the continuous variable example (regarding interactions), specifically your interpretation.

I used your linear model, plotting the coefficients in Excel and manual calculating the Strength for several points of ‘test’ data.

In the article you write – “For high pressures, there is a positive relationship between temperature and strength while for low pressures it is a negative relationship”.

This is what your interaction plot also shows, but plugging actual values in (see below) to the equation – using your coefficients outline above – proves that this is not true.

Test Data

Temperature Pressure Time Temprature*Pressure Predicted Strength Values Difference

95 81 32 7695 3,891

115 81 32 9315 4,258 367

95 63 32 5985 3,477

115 63 32 7245 3,800 323

As you can see, for 2 ‘sets’ of data above, each with a low (63) and high (81) pressure setting, Predicted Strength increases as Temperature Increases.

Am i missing something?

Joe

Jim Frost says

Hi Joe,

I can’t quite tell from your comment how you set up your data. So, I’m unable to figure out how things are not working correctly. However, I can assure you that when you plug the values in the equation, the fitted values behave according to the interpretation (i.e., that the relationship changes direction for low and high values of pressure).

To illustrates how this works, I put together an Excel spreadsheet. In the spreadsheet, there are two tables–one for low pressure and the other for high pressure. Both tables contain the same values for Temperature and Time. However, each table uses a different value for Pressure. The low pressure table uses 63.68 while the high pressure table uses 81.1. I then take these values and plug them into the equation in the Strength column to calculate the fitted values for strength.

As you can see from the numbers in the tables and associated graphs, there is a negative relationship between Strength and Temperature when you use a low pressure but a positive relationship when you use a high pressure.

You can find all of this my spreadsheet with the calculations for the continuous interaction. The two graphs below are also in the spreadsheet.

I hope this helps clarify how this works!

Michela says

Thanks for this, very helpful!

I hope the reviewer will be satisfied as well 🙂

Marieke says

Hi Jim,

I am working on a model which includes an interaction variable. Pro-immigration attitude = educational level + employment (dummy) + educational level * employment . When including the interaction variable, the employment variable becomes insignificant (p=0.83). I was wondering how to interpret this?

Jim Frost says

Hi Marieke,

There are several ways to look at this issue. The first is explaining how the dummy variable goes from being significant to insignificant. When you fit the model without the interaction effect, the model was forced to try to include that effect in with the variables that were included in the model. Apparently, it apportioned enough of the explained variance to the employment variable to make it significant. However, after you added the interaction effect, the model could more appropriately assign the explained variance to that term. Your example illustrates how leaving important terms out of the model (such as the interaction effect) can bias the terms that you do include in the model (the employment dummy variable).

Now, on to the interpretation itself! It’s easiest to picture your results as if you are comparing the constant and slope between two different regression lines–one for the unemployed and the other for the employed. Hypothetically speaking, if the employment dummy variable had been significant, you’d have a case where the constant would tell you the average pro-immigration attitude for someone who is

unemployed(the zero value for the dummy variable) and has no education. You could then add the coefficient for the dummy variable to the constant and you’d know the average pro-immigration attitude for someone who isemployed(the 1 value for the dummy variable) and has no education. In other words, you have sufficient evidence to conclude that there are two different y-intercepts for the two regression lines. However, because your actual p-value for the dummy variable is not significant, you have insufficient evidence to conclude that the y-intercepts for these two lines are different.On the other hand, because the interaction term is significant, you have sufficient evidence to conclude that the

slopeof the line for the employed is different from the slope of the line for the unemployed.I’ve written a post about these ideas, which includes graphs to make it easier to understand. Read my post about comparing regression lines.

I hope this helps!

sarim mohd says

Hi Jim,

Thanks for the wonderful and simple tutorial.

I have a panel dataset that consists of 146 companies for 7 years. My dependent variable is Profit and Independent variables are Board Size, Number of meetings, board dividend decision, CEO duality (it is a dummy variable, 1 if the CEO is also the chairman, 0 otherwise).

Results for non-parametric test indicated that the size of the board is significantly different for firms with CEO duality and for firms with non-duality.

Therefore, after testing for the main effect, I want to test if such differences in the board size of firms with CEO duality and firms with non-duality is getting reflected in the performance. For this purpose I introduced an interaction effect:

Profitability = Board size*Duality + number of meetings + board dividend decision

So, if my interaction is significant (positively), can I interpret it as “the firms with CEO duality are performing better than the firms with non-duality”? Does the coefficient on the interaction is telling, how the coefficient changes when we go from a duality to non-duality?

Also, is interaction is creating any linearity problem for my estimations?

Am I right in doing so?

I hope my question is understandable.

Jim Frost says

Hi Sarim,

Unlike main effects, you typically don’t interpret the coefficients of the interaction effects. Yes, it is possible to plug in values for the variables in the interaction term and then multiply them by the coefficient, and repeat that multiple times, to see what values come out. However, it’s much easier to use interaction plots–as I do in this blog post. Those plots essentially plug in a variety of values into the equation to show you what the interaction effect means. It’s just a whole a lot easier to understand using those plots.

I don’t have enough information to tell you what the interaction means for your case specifically. There’s no way I could say what a positive interaction coefficient represents. But, here is what it means generally. Keep in mind that an interaction effect is basically an “it depends” effect as I describe in this post. In your case:

If the interaction term is significant, you know that the effect of board size on profitability depends on CEO duality. In other words, you can’t know the effect of board size on profitability without also knowing the CEO status. Think of a scatter plot with profitability on the Y-axis and board size on the X-axis. You have two lines on this plot. One line is for Duel CEOs and the other is for non-Dual CEOs. When the interaction term is significant, you have sufficient evidence to conclude that the slopes of those two lines are significantly different. The specific interpretation depends on the exact nature of those two lines–maybe the two slopes are in opposite directions (positive and negative) or maybe one is just steeper than the other in the same direction. That’s what you’ll see on the interaction plot and you can interpret the results accordingly.

If the interaction term is not significant, the effect of board size on profitability does NOT depend on CEO duality. You don’t need to know CEO status in order to understand the predicted effect of board size on profitability. On the graph that I describe, you cannot conclude that the slopes of the two lines are different.

As for correlation among your independent variables, yes, multicollinearity can be a problem when you include interaction terms. If you had an interaction term with two continuous variables, I’d recommend standardizing them, but it might not make much a difference for your interaction between a continuous variable and a binary variable. If you want to read about that, I’ve written about about standardizing the variables in a regression model that can read.

I hope that helps!

Katie says

Hi Jim,

I am interpreting a model with the fixed effects of: diet injection diet*injection group. The P-value for diet*injection is P = 0.09 which would be a tendency. My question is if this is a tendency but not below 0.05 is it appropriate to leave the interaction in the model? When discussing my results is it appropriate to only describe the interaction or the fixed effects of diet and injection?

Jim Frost says

Hi Katie,

This is a tricky to question answer in general because it really depends on the specific context of your study.

First off, I hesitate to call any effect with a p-value of 0.09 a tendency. A p-value around 0.05 really isn’t that strong of evidence by itself. For more information about that aspect, read my post about interpreting p-values. Towards the end of that post I talk about the strength of evidence associated with different p-values.

As for leaving it in the model or taking it out. There are multiple things to consider. You should review the literature, similar studies, etc. and see what results they have found. Let theoretical considerations guide you during the model specification process. If there are any strong theoretical, practical, literature related reasons for either including or excluding the interaction term, take those to heart. Model specification shouldn’t be only by the numbers. I write about this process in my post about specifying the correct model. The part about letting theory guide you is towards the end.

And, one final thought. There is a school of thought that says that if you have doubts about whether you should include or exclude a variable or term, it’s better to include it. If you exclude an important term, you risk introducing bias into your model–which means you might not be able to trust the rest of the results. Adding unnecessary terms can reduce the precision and power of your model, but at least you wouldn’t be biasing the other terms. I’d fit the model with and without the interaction term and see if and how the other terms change.

If the coefficients and/or p-values of the other terms change enough to change the overall interpretation of the model, then you have to really think about which model is better and that probably takes you back to theoretical underpinnings I mention above. If they don’t change noticeably, then whether you include or exclude the interaction term depends on your assessment of the importance of that interaction term specifically in the context of your subject area. And, again that takes you back to theory, other studies, etc but it’s not as broad of question to grapple with compared to the previous case where the rest of the model changes.

That’s all why the correct answer depends on your specific study area, but hopefully that gives you some ideas to consider.

Sir Yiadus says

please assuming that you include an interaction term and all the other variables including the interaction term becomes insignificant though they were significant before introducing the interaction term. Pleases does that mean?

Jim Frost says

Hi, it sounds like the model might be splitting the explanatory power of each term between the main effects and the interaction effects and the result is that there isn’t enough explanatory power for any individual term to be significant by itself. If that’s the case, you might need a larger sample size. Is the overall model significant?

Also, whenever you include interaction terms you’re introducing multicollinearity into the model (correlation among the independent variables). You might gain some power by standardizing your continuous predictors. Read my post about standardizing your variables for more information about how it helps with multicollinearity.

Those would be my top 2 thoughts. You should also review the literature, your theories, etc. and hypothesize the results that you think you should obtain, and then back track from there to look for potential issues. After all, insignificant results might not be a problem if that’s the right answer. And, you should at least consider that possibility.

But, the fact that they’re significant without the interaction term and that goes away when you at the interaction term makes me think there is something more going on.

Jessy Grootveld says

Hi Jim,

I have a question about interpreting output of the MANCOVA.

I myself am conducting research to see whether people’s tech-savviness perceptions have an effect on the effect that assignments to an experimental condition had on peoples brand attitude, purchase intention, and product liking.

In the MANCOVA, my supervisor told me to add the Conditions_All variable as a main effect to the customized model, and Conditions_All*Tech-savviness_perceptions as an interaction effect.

I got the following output:

Conditions_All p = .013

Conditions_All*Tech-savviness perceptions p = .011

How do I interpret these p-values? What does the significance of the first p-value on Conditions_All tell me? And how is that related to the significance of the interaction effect of Conditions_All and Tech-savviness perceptions?

Thank you in advance for your help.

Kind regards,

Jessy Grootveld

Jim Frost says

Hi Jessy,

Your output indicates that both the main effect and interaction effects are statistically significant assuming that you’re using a significance level of 0.05.

The main effect for Conditions_All is the portion of the effect that is independently explained by that variable. If you know the value of Conditions_All, then you know that portion of its effect without needing to know anything else about the other variables in the model.

However, because the interaction effect is also statistically significant and that term includes Conditions_All, you know that the main effect is only a portion of the total effect. Some of Conditions_All’s effect is included in the interaction term. However, to understand this portion of the effect, you need to know the value of the other variable (Tech-Saviness).

To understand the complete effect of Conditions_All, you need to sum the main effects (the portion that is independent from the other variables in the model) and the interaction effect (the portion that depends on the other variable).

I hope this helps!

Sir Yiadus says

Thank you very much. I am grateful

Redina says

Hello Jim,

I am a master student and I have included interaction terms in my thesis. the problem is that the main effects are significant and the interaction term is insignificant. moreover, the interaction term has an opposite sign to what was expected. The problem is that I have a very theoretical part that supports that there actually is an interaction term between my variables. what might be an answer to this?

Thank you in advance for your help,

Redina

Jim Frost says

Hi Redina,

There are a couple of things you should realize about your results.

The first thing is that insignificant results do not necessarily suggest that an effect doesn’t exist in the population. Keep in mind that you fail to reject the null hypothesis, which is very different than accepting the null hypothesis.

For your study, your results aren’t necessarily suggesting that the interaction effect doesn’t exist in the population. Instead, you have insufficient evidence in your sample to conclude the the interaction effect exists in the population. That’s very different even though it might sound the same. Remember that you can’t prove a negative. Consequently, your results don’t necessarily contradict theory.

In other words, the interaction effect may well exist in the population but for some reason your sample and analysis failed to detect it. I can think of four key reasons offhand.

1) The sample size is too small to detect the effect.

2) The sample variability is high enough to reduce the power of the test. If the variability is inherent in the population (rather than say measurement error or some other variability that you can reduce), then increasing the sample size is the easiest way to address this problem.

3) Sampling error by chance produced a fluky sample that doesn’t exhibit this effect. This would be a Type II error where you fail to reject a null hypothesis that is false. It happens.

4) There was some issue in your design that caused the experimental conditions to not match the conditions for which the theory applies.

I think exploring those options, and possibly others, would be helpful, and probably useful discussion for your thesis.

As for the sign being the opposite of what you expected, I have a couple of thoughts. For one thing, you don’t typically interpret the signs and coefficients for interaction terms. Given the way the values in interaction terms are multiplied, the signs and coefficients often are not intuitive to interpret. Instead, use graphs to understand the interaction effects and see if those make theoretical sense.

Additionally, because your interaction term is not significant, you have insufficient evidence to conclude that the coefficient is different from zero. So, you cannot say that the coefficient is negative for the population. In other words, the CI for the interaction effect includes zero along with both positive and negative values. I hope that makes sense. Again the CI is not ruling out the possibility that the coefficient could be positive, which is what you expect. But, you don’t have enough evidence to support concluding that it is either positive or negative

I hope this helps!

Redina says

Thank you a lot! I’m grateful.

Naman says

Hi Jim,

Thank you for that super useful explanation. I am doing my thesis and have a few questions. I would be grateful if you can answer these within 24 hrs as my thesis is due in 2 days.

I am doing a time series cross section fixed effects regression. The theory on the topic suggests an interaction between main independent variable (N- dummy variable) and S(continuous). I have included them in an interaction in one of the models. I also have another interaction between main independent variable (N- dummy variable) and A(continuous variable). I have also included them in an interaction in a separate model.

However, I also need a main model in which these interactions are not there, so that I can get the exact impact of the scheme N, my question is do I include the independent and control variables S and A in that main model ? If yes, won’t the thesis defense committee ask me why do you have N in interaction with S and A in one model each and not in interaction in the main model?

The previous studies would have different analysis with analysing the impact of the interactions and they would have some kind of main model with a few different IV’s without any interactions.

I have to include S and A in the main model because they are the control variables but I don’t know if I should include their interaction terms in that main model as well or not. Won’t that be too much ?

Thanks so much in advance,

Naman

Jim Frost says

Hi Naman,

I think I understand your analysis, and I have a couple of thoughts.

One, I don’t understand why you want to produce separate models that leave out significant effects? When you omit an important effect, you risk biasing your model. Why not present one final model that represents the best possible model that describes all of the significant effects? Separate models with only some of the significant effects in each doesn’t seem like a good idea.

Two, you want to gain the exact impact of N. However, you won’t gain this by removing the interaction terms. In fact, you’d be specifically removing some of N’s effect by doing that.

Both the main effect and interaction effect for N are significant. The main effect is the independent effect of N. That is the portion of N’s effect that does not depend on the other variables in the model. However, because the interaction term is significant, you know that some of N’s effect does depend on the other variables in the model. So, some of N’s effect is independent of those other variables while some of it depends on those other variables. That’s why both the main effect and interaction effect are significant.

By excluding the interaction you are excluding some of N’s effect. Is this important? Well, reread this post and see how trying to interpret the main effects without factoring in the interaction effects can lead you to the wrong conclusions. You might end up putting mustard and your ice cream sundae! When you have significant interaction effects, it’s crucial that you don’t attempt to interpret the main effects in isolation.

Consequently, I would include the interaction effects in your main model. The results might not seem as clean and clear cut, but they are more accurate. They reflect the true nature of the study area.

I hope this helps!

Victoria says

Hi Jim

This page is very helpful. I was wondering about a particular scenario I have with my data. A have a predictor that is positively correlated with an outcome in a bivariate correlation. In a linear regression model including a control variable, the predictor is no longer significant. However, when I explore interactions between the control variable and the predictor in a regression model, both the interaction term and the predictor by itself are significant.

My first question is – can I “trust” the model with the interaction term (model 2), even though in the model without the interaction term (model 1) the predictor was not significant?

I should add that the interaction is theoretically sound (which is why I explored it in the first place).

My second question is – what if the same scenario occurs for predictors that were not even correlated with the outcome in initial exploratory bivariate correlations? I am wondering if I should even be entering these into a model in the first place. However, again, I am looking at these particular predictors because there is a theory that says they should relate to the outcome, and again, the interaction can be explained by the theory.

Thank you very much for your time and sorry if my query is a bit confusing!

Victoria (UK)

Jim Frost says

Hi Victoria,

I’m glad you found this helpful! I think I understand your question. And, it reminds me that I need to write a blog post about omitted variable bias and trying to model an outcome with too few explanatory variables!

I think part of the confusion is the difference between how pairwise correlations and multiple regression model the relationships between variables. Pairwise correlations only assess whether a pair of variables are correlated. It does not account for any other variables. Multiple regression accounts for all variables that you include in the model and holds them constant while evaluating the relationship between each independent variable and the dependent variable. Because multiple regression factors in a lot more information than pairwise correlation, the results can differ.

This issue is particularly problematic when there is a correlation structure amongst the independent variables themselves. When you leave out important variables from the analysis, this correlation structure can either strengthen or weaken the observed relationship between a pair of variables. This is known as omitted variable bias. This can happen in regression analysis when you leave an important variable out of the model. It can also happen in pairwise correlation because that procedure only assesses two variables at a time and can leave out important variables. I think this might explain why you observe different results between pairwise correlation and your multiple regression analysis. Check for a correlation between your control variable and predictor. If there is one, it probably at least partly explains what is going on.

As for whether you can trust the significant interaction term. Given that it fits theory and that it is significant after you add the other variables, I’d lean towards saying that yes you can trust it. However, as is always the case in statistics, there are caveats. One, I of course don’t know what you’re studying it’s hard to give any blanket advice. You should be sure that you have a sufficient number of observations to support your model. With two independent variables and an interaction term, you’d need around 30 observations. If you have notably fewer, you might be overfitting your model, which can produce unreliable results. Also, be sure to check those residual plots because that can help you avoid an underspecified model. And, as discussed earlier, if you omit an important variable, it can bias the results. If you leave out any important variables from your regression model, it can bias the variables and interaction terms in your model.

Regarding the other variables that don’t appear to have any correlation with the outcome variable, you can certainly consider adding them to the model to see what happens. Although, if you’re adding them just to check, it’s a form of data mining that can lead to its own problems of chance correlations. You can also check the pairwise correlations between all of these potential predictors. Again, if they are correlated with predictors, that correlation structure can bias their apparent correlation with the outcome variable. If they are correlated with any of the predictors in the model or with the response, there’s some evidence that you should include them. Ideally you should have a theoretical reason to include them as well.

I’d also recommend reading my post about regression model specification because it covers a lot of these topics.

I hope this helps!

ahmed says

Thank you for astonishing posts.

From understanding to statistics, it can explain the following cases

1) The factors under study are significant and the interaction is not significant?

This is because the main factors have separated effects from each other. That means that factor A has an effect on the character under study ( Ex. Root Yield) separate from the effect of factor B. The meaning of the interaction is not significant, under different levels of factor A that factor B gives the same results. (As a hypothetical example and not true).

Nitrogen fertilizer is used at different rates and potassium fertilizer at other rates. For example, the effect of nitrogen fertilization increases the yield by increasing the concentration of nitrogen and potassium reduces the yield. At each nitrogen concentration, the different levels of potassium reduce the yield and vice versa at each concentration of potassium, the different levels of nitrogen increase the yield

2) The factors under study are insignificant and the interaction is significant?

This means that the factors under study had the different influences for each level from other factor. For example levels of nitrogen and varieties of plants, under each level of nitrogen arrangement of varieties of plants is different. For example, at the high concentration the order of the varieties is ABC,

ACB for medium concentration and CAB for low concentration

What do you think of this interpretation?

With complement

Prof. Dr. Ahmed Ebieda

Jim Frost says

Hi Ahmed, thank you for you kind words about my posts! I really appreciate that!

Yes, your interpretations sound correct to me. I’d just add another case where both the main effects and interaction effects are significant. In that case, some proportion of the effects are separate or independent from the other factor while some proportion depends on the value of the other factor.

Fergal says

Hi Jim,

I have found both your initial piece on interaction effects, and the forum section to be extremely helpful.

Just looking to bounce something off you very quickly please.

I’m completing my MSc dissertation and for my stat analysis, I’ve carried out 2 (Gender: Male & Female) x 2 (Status: Middle & Low) between-between ANOVA.

For all my 5 dependent variables, there have been either main effects of Gender or Status, however there have been no interaction effects.

My 3 main questions are:

1. Although there was no main interaction effect, is it still possible to run a post hoc test (using a Bonferroni correction on Gender*Status) and report on some of the findings if they come up as significant?

Otherwise, all I’ll be reporting on is the main effect(s) (**as below) which I’m conscious may leave my analysis rather shallow…

2. In William J. Vincent’s ‘Statistics in Kinesiology’, he states that if either the main effects or interaction are significant, then further analysis is appropriate. He advocates conducting ‘a simple ANOVA’ across Gender at each of the levels of status and vice-versa.

Firstly, excuse my ignorance, I’m not exactly sure what’s meant by ‘simple ANOVA’ or how to do one, and apparently Jamovi (my stat analysis software), doesn’t have the facility to conduct one as of yet.

The question, can I just go straight into my post hoc tests instead of conducting the simple ANOVA as from what I gather, they’re basically running the same ??

3. I’m planning on reporting the results of my 2 x 2 ANOVA as: mean ± standard deviation, and the p values (significance accepted at p<.05). Is this acceptable/sufficient or is it best practice to include the f value as well?

A rough example of what I'm on about is something like this:

**

Figure …. shows the ……. Standard Scores. There was a main effect of Gender (p=0.009), whereas no Status effect was detected (p=0.108). There was no interaction effect between Gender and Status (p=0.0.669). Females scored significantly better than males in the ….. test (7.62±2.13 vs. 6.66±2.21, p=0.009), whereas the Low and Middle group scores were statistically unchanged at 6.83±2.07 to 7.44±2.31 (p=0.108) respectively. These standard scores equate to a 4.8% difference between females and males, and 3.05% difference between Middle and Low group participants.

(graphs will be included)

Does this seem sufficient or should/can I dig further into the Gender main effect?

The post hoc tests (Gender*Status) are what will enable me to do that, if it's a thing you deem them acceptable to conduct.

Once again, this whole page has been of huge help to me. Thanks very much in advance for your time and apologies if the query is rather confusing.

Regards,

Fergal.

ahmed says

Hi Jim

Thanks a lot for your fast replay and your explanations.

But, I have the simple question?

Can I write recommendation for all three cases (Factors=significant & Interaction Not , Factors=significant & Interaction Not and Factors=Not significant & Interaction Significant) or some of them it can’t recommend?.

Please, explain by examples for each case (This is one example from my results )

My example:

2 Factors (3 levels of nitrogen & 3 levels of Potassium)

Increasing Nitrogen and Potassium increase the root yield

( also in case one factor increase root yield and other decrease it)

For each case what is the recommendation?

Because some friends said: if interaction is not significant, there is no recommendation.

I think this is not true?

Please, what is your opinion?

Jim Frost says

Hi Ahmed, yes, when there is an interaction, you can make a recommendation. You just need the additional information. I explain this process in this post. For example, in the food and condiment sample, to make a recommendation to maximize your enjoyment, you can make a condiment recommendation, but you need to know what the food is first. That’s how interactions work. Apply that approach to your example. It helps if you graph the interactions as I do.

Aidan says

Hi Jim,

Firstly, I can’t believe I have only found this site today – it’s awesome, thanks!

I’m trying to interpret some results and having read your blog, can you please tell me if i’m correct in my understanding regarding main effects and interactions?

I’ve performed an 2-way mixed-model ANOVA (intervention x time) to assess the effects of three interventions on the primary outcomes (weight-loss).

There was a significant main effect for weight-loss but when I perform post-hoc analysis, there is no significant result.

My understanding of this is that, over time, weight-loss was significant as an entire group however, no one intervention was better than the other?

Any input from anyone would be welcomed!

Thanks

Jim Frost says

Hi Aidan, I’m glad you’ve found my website to be helpful!

Which main effect was significant? Was the interaction effect significant?

Sometimes the hypothesis test results can differ from the post-hoc analysis results. Usually that happens when the results are borderline significant. However, I can’t suggest an interpretation without knowing the other details.

Tom says

Hi Jim: thank you for this post. I am working on a couple of hypotheses to test both direct and interaction effects…results are a bit more nuanced than examples above, so I would be interested in your advice…I am using PLA-SEM…direct effect of X on Y (Beta = 0.19) is not significant (t statistic greater than 1.96). Nevertheless I still have to run second hypothesis to determine if a third variable moderates relationship between X and Y. When adding the interaction term, R2 did increase on Y. however, interaction effect was also not significant. it seems I fail to reject null hypothesis. This being said I am shaky on how I would interpret this, for the results were not as anticipated…it is exploratory research, if that matters…thoughts? Tom

Tom says

Rather t statistic less than 1.96…my mistake

Jim Frost says

Hi Tom,

It sounds like neither your main effects nor interaction effect are significant? Is that the case?

If so, you need to remember that failing to reject the null does not mean that the effect doesn’t exist in the population. Instead, your sample provided insufficient evidence to conclude that the effect exists in the population. There’s a difference. It’s possible that the effects do exist but for a number of possible reasons, your hypothesis test failed to detect it.

These potential reasons include random chance causing the sample to underestimate the effect, the effect size being too small to detect, too much variability in the sample that obscures the effect, or a sample size that is too small. If the effect exists but you fail to reject the null hypothesis, it is known in statistics as a Type II error. For more information about this error, read my post Types of Errors in Hypothesis Testing.

I hope this helps!

Tom says

Thank you, Jim…this is very helpful…of course I was hoping for a better outcome…but I am guessing the predictor variable is not quite nuanced enough to produce a noticeable effect…thanks again..and I will definitely check the source material you provided…tom

Joost Huybregts says

Dear Jim,

First of all, I would like to say how helpful this website is. Your explanations are really clear!

I have a question regarding the interpretation of an interaction variable.

The interaction consists of two contininous variables, but one has been transformed to it’s natural logarithm.

How do I interpret it’s coefficient with respect to the dependent variable?

Thanks for your time!

Joost

Jim Frost says

Hi Joost,

Thanks so much! And, I’m glad my website has been helpful!

One of the tricky things about data transformations is that it makes interpretation much more complex. It also makes the entire fit of the model less intuitive to understand. That’s why I always recommend that transformations are the last option in terms of data manipulation. When you do need to transform your data, you’ll often need to perform a back transformation to understand the results. That’s probably what you’ll need to do. Some statistical software will do this for you very easily.

For some specific transformations, you can make some interpretations without the back transformations, and one of those is the natural log. I talk about this in my post about log-log plots. That’s not exactly your situation where you’re looking at an interaction effect. Interactions effects can be tricky to understand to begin with, but more so when a transformation is involved. Typically, you don’t interpret the coefficient of interaction terms directly, but particularly not when the data are transformed. Again, you will probably need to back transform your results and then graph those to understand the interaction.

I hope this helps!

SIKANDAR ABDUL QADIR says

Hello Jim,

Thank you for providing such a useful resource.

I am SPSS for my Thesis which is related to the Entrpreneurship and Export.

I am using the Ordinal Regression for the analysis, I am unable to understand how to put the interaction in Model using ordinal regression as we have two options there in when you are using the Ordinal Regression i.e. Scale and Location which one should I use.

I used Location and in interaction for example I have 2 variables one having 2 answers (Starting Phase and Operating Phase) and Other having 3 answers (Low Medium High) so the total interaction terms will be 6, but for those six terms I am getting only 2 numbers for others it says the parameter is set to zero because it is redundant. Why is it like this can you please explain.

Thanks,

Sikandar

Victoria says

Thank you for your reply. It was very helpful indeed. Long live this helpful site!

Best wishes

Victoria

ahmed says

Hi Jim

Thanks a lot.

Pakistan Journal of Agricultural science https://www.pakjas.com.pk/

indicated that for ‘

Instructions to Authors’

“12. Statistical models with factorial structure must normally conform to the principle that factorial interaction effects of a given order should not be included unless all lower order effects and main effects contained within those interaction effects are also included. Similarly, models with polynomial factor effects of a given degree should normally include all corresponding polynomial factor effects of a lower degree (e.g. a factor with a quadratic effect should also have a linear effect).

13. Main effects should be explained/ exploited only if interaction involving them is not significant. Otherwise the significant interaction should be explored further and focus should be on the interaction effects only.”

For about point 13, the main effect is not necessary if the interaction is significant

What is your opinion about this information?

Jim Frost says

Hi Ahmed,

Regarding #12, that’s referred to as a hierarchical model when you keep all of the lower-order terms that comprise a higher-order term–whether that’s an interaction term or a polynomial. Retaining the hierarchical structure is the traditional statistical advice. However, it’s not absolutely necessary. In fact, if you have main effects and other lower-order terms that are not significant but you include them in the model anyway, it can reduce the precisions of your estimates. Depending on the number of nonsignificant terms you’re keeping, it’s not always good to include them. However, when you include polynomials and interaction terms, you’re introducing multicollinearity into your model, which has it’s own negative consequences. You can address this type of multicollinearity by standardizing the continuous predictors, which produces a regression equation in coded units. The software can convert it back to uncoded units, but only if the model is hierarchical! So, there pros and cons to whether you have a hierarchical model or not. Of course, if all the lower-order terms are all significant it becomes a non-issue. If only a few are not significant, you can probably leave them in without problems. However, if many are not significant, you’ve got some thinking to do!

As for #13, I entirely agree with it. I discuss this concern in my blog post. When you have a significant interaction effect but you consider only the main effects, you can end up drawing the wrong conclusions. You might put mustard on your ice cream! The only quibble I have with the wording for #13 is that you’re not totally disregarding the main effects. You really need to consider the main effect in conjunction with the interaction effect. You can think of the interaction effect as an adjustment (positive or negative) to the main effect that depends on the value of a different variable. A statistically significant interaction effect indicates that this adjustment is unlikely to be zero. The graphs I use in this post are the combined effect of the main effect plus the interaction effect. That gives you the entire picture.

Paulo Quadri says

Hi Jim, thanks for all the time and useful explanation.

I am struggling with fully understanding the interpretation of my own work. I am exploring changes in poverty as a function of proximity to touristic attractions (localities with more attractions nearby should have more poverty reduction. However, in addition to a bunch of other covariates, my model includes an interaction term between “number of attractions” and the region where my observations (localities) are in the country,and I have 5 regions (North, South, etc…). Here are my main questions:

1. Is the estimate of “number of attractions” telling me the effect of this variable overall, or just in the region that is omitted? My understanding is that when you have experimental settings is that this estimate would be the effect of the main variable of interest under “control” conditions. But there are no “leveles’ of treatment here, these are just geographic regions so I am not sure about how to interpret this.

2. The interaction coefficients between “number of attractions” and “region_north”, “region_south”, etc… are, as far as I understand, relative to the estimate of the omitted region, correct? But, are these coefficients what I should report, or should I perform a linear combination (add) the interaction estimate plus the estimate of “number of attractions” alone? Some readings highlight this last step as something necessary but others don’t even mention it. If I do perform this linear combination, then how does the relationship to the omitted region changes?

3. Lastly, when plotting the estimates (my variables are all rescaled to have a mean of zero and sd = 2 so that we can plot the estimates and compare impacts on change in poverty) should I include both, the coefficient of my main variable (“number of attractions”) AND the interactions? Or is the estimate of the main variable by itself irrelevant now?

Thank you so much and sorry for the multiple questions!

Paulo

Jim Frost says

Hi Paulo,

Keep in mind that an interaction effect is an “it depends” effect. In your analysis, the effect of tourist attractions on poverty reduction depends on the region. If you have a significant main effect and interaction effect, you need to consider both in conjuction. The main effect represents the portion of the effect that does not depend on other variables in the model. You can think of the interaction effect as an adjustment to the main effect (positive or negative) that depends on the other variable (region). A significant interaction indicates that this adjustment is not zero.

To determine the total effect for the number of attractions on poverty reduction, you need to take the main effect and then adjust it based on region. I believe you’re correct that the interaction coefficients are relative to the ommitted region. There are other coding schemes that are available, but the type you mention is the most common in regression analysis. In this case, the adjustment for the omitted region is zero.

Personally, I find it most helpful to graph the interaction effects like I do in this post where the y-axis represents the fitted values for the combined main effect and interaction effect. That way you’re seeing the entire effect for number of tourist attractions–the sum of both the effect that does NOT depend on other variables and the effect that DOES depend on other variables in the model. You can then see if the results are logical. Perhaps those regions that have a negative adjustment are harder or more expensive to travel to? I always find that graphs are particularly useful for understanding interaction effects. Otherwise, you’re plugging a bunch of numbers into the regression equation.

Best of luck with your study! I hope this helps!

Nik says

Hi Jim,

Thank you so much for the information! I was wondering if there is a way to use qfit in Stata and plot the confidence intervals and point out the statistical significance of the interaction terms. I need to understand whether different groups have different wage growth trajectory. So I interacted group indicator with experience and square of experience term. As expected, not all terms are significant. Is there a way to show this graphically?

Jim Frost says

Hi Nik,

I’m not the most familiar with Stata but I did look up qfit. That command seems to be mainly used to graph the quadratic relationship between a predictor and response variable, or multiple pairs of variables. I didn’t see options for confidence intervals but I can’t say for sure.

However, if you are looking for confidence intervals for the differences between group means, the method that I’m familiar with involves using the post-hoc comparisons that are commonly used with ANOVA. These comparisons will give CIs for the differences between the group means. When you have interactions with groups, you’ll have means for combinations of groups and can you determine which differences between combinations of groups are significantly different from other combinations of groups. I plan to write a blog post about that at some point! That’s a different way of illustrating and interaction effect and it might be more like what you’re looking for. Maybe–I’m not 100% sure what you need exactly.

Also, some software can plot the fitted value for interactions that include squared terms. Maybe that’s what you’re looking for? I’m including a picture of an a significant interaction that includes a squared term. How to display this depends on your software and, as I mentioned, I’m not the most familiar with Stata.

Best of luck with your analysis!

Kristi says

Hi Jim,

I wanted to thank you for the useful resource, I really appreciate it!

I have a question about doing two-way ANOVA’s. I did a plant tissue analysis (30 variables) in replicates of 12 in each of 3 treatment areas. I redid the test three years later and Im using treatment and year as my two factors. I want to determine (1) if there is a differenc between treatments and (2) if they are changing over time.

The results of my Twoway-Anova showed about half the variables having a significant interaction between time and treatment. You mentioned in an early post that if the interaction is not significant then you rerun with out the intereaction. If only treatment or only year is significant though can I rerun a simple one-way ANOVA using only the significant factor? If so how to I sumerize all these vairables and different analysis (Oneway and Twoway Anovas) in a table.

Also in your opinion is a Two-ANOVA the best way answer my 2 research questions.

Thank you!

Nicholas Lehker says

Hi Jim,

I am still having a hard time interpreting interaction effects and main effects. I am currently reading a study in which patients who have suffered a stroke under go physical rehabilitation in two conditions to determine if a specific therapy is beneficial. The first group under goes physical therapy with trans-cranial direct current stimulation and the control group undergoes sham with physical therapy given over five days. The dependent variable is upper extremity function measured by a scale called Upper extremity Fugl-meyer score

here is the break down.

Dependent variable- Fugl-meyer

Independent- within subject -time (pre intervention, post intervention), between subject(sham v.s real intervention)

The author report this

an analysis of variance with factors TIME and GROUP

showed a significant effect of TIME (F(1,12) = 24.9,

p < 0.001) and a significant interaction between TIME

and GROUP (F(1,12) = 4.8, p = 0.048) suggesting that

the effect of TIME was different between the cathodal

tDCS and sham tDCS groups for UE-FM scores

Is it safe to say that the dependent variable depends on the interaction of time and the group assignment. As well as time being the main effect is only significant. In other words it does not matter group assignment just time?

Thank you,

Jim Frost says

Hi Nicholas,

Here’s how I’d interpret the results for this specific study. Keep in mind, I don’t know what the study is assessing, but I’m going strictly by the statistics that you report.

The results seem to make sense. You have two intervention groups and the pre- and post-test measurements.

The significant interaction indicates that the effect of the intervention depends on the time. That makes complete sense. For the pre-test observation, the subjects will have been divided between groups but presumably have not yet been exposed to the intervention. There should not be a difference at this point in time. If the intervention affects the dependent variable, you’d expect it to appear in the post-test measurement only. Hence, the intervention effect depends on the time, which makes it an interaction effect in this model. These results seem consistent with that idea based on the limited information that I have.

Time also has a significant main effect, which suggests that a portion of the changes in the dependent variable are independently associated with the time of the measurement (i.e., some of the changes occur overtime regardless of the intervention). However, the intervention does have an effect that depends on the time (i.e., only after the subjects experience the intervention). So, it is inaccurate to say that group assignment does not matter. It does matter, but it depends on the time of the observation. If the study was conducted as I surmise, that makes sense! Subjects need to experience the intervention before you’d expect to observe an effect.

That’s how I’d interpret the results.

Nicholas Lehker says

Jim,

Thank you so much that make a lot of sense.

Lan Chu says

Dear Jim,

Thanks so much for the great post !

I am working on my dissertation, comparing the treatment effect of an intervention on women’s empowerment in Uganda and Tanzania. The intervention is exactly the same in the 2 countries. In order to do so, I combine 2 dataset together and run a regression model in which I include a country dummy variable (1 for Tanzania and 0 for Uganda) and an interaction term between country and treatment in order to capture the heterogeneity of the treatment effect.

My question is, does the coefficient of interaction term captured how much the difference is (if there is) between Tanzania and Uganda?

For example, from running seperate regression models in each country, there can be similarities in the treatment effect, meaning that the treatment have both positive (or negative) effects in 2 countries. In that case, does the coefficient of interaction term indicates how much the difference is? (depending on the sign of coefficient, i ll conclude the treatment is stronger or weaker in one of the two country)

My second question is, what about insignificant interaction terms? in the separate regression models, in some indicators (let say decision-making over major household expenditure), the treatment effects go in opposite direction, e.g positive effect in Uganda and negative effect in Tanzania. Hence I would expect the interaction term shows that the treatment effect is bigger in Uganda, but i got statistically insignificant of interaction term for that case. What does an insignificant interaction term exactly say?

Thank you so much. I would be very grateful if you could reply soon. My dissertation is due in a couple of days….

Jim Frost says

Hi Lan Chu,

That sounds like very important research you are conducting! Apologies for not replying sooner but I was away traveling.

I find that the coefficient for the interaction term is difficult to interpret by itself–although it is possible. I always prefer to graph them as I do in this blog post.

Is the intervention variable continuous or categorical? That affects the discussion of the interaction term that includes the intervention variable.

Unfortunately, the coefficient of the interaction term is not as simple as capturing the difference between the two countries. The full effect of country is captured by the main effect of country and the interaction effect. And, the interaction effect depends on the value of the other variable in the term (intervention). In fact, the effect of the interaction term alone varies based on the values of both variables and is not one set amount. Ultimately, that’s why I prefer using interaction plots, which takes care of all that!

In simple terms, if the interaction term is significant, you know that the size of the intervention effect depends on the country. It can not be represented by a single number. The intervention effect might be positive in one country or negative in the other. Alternatively, the treatment can be in the same direction in both countries (e.g., positive) but more so in one country compared to the other.

Conversely, if the interaction term is not significant, it indicates that you can conclude that the treatment effect is equal between the countries. Your sample provides insufficient evidence to conclude that the treatment effects in the two countries are different.

I hope that answers your questions. If I missed something, please let me know!

Erick Turner says

Hello, I have the simple (I think) situation with variables A and B that both show significant effects. When the interaction variable A*B is added, it is not significant (P=0.3), and the statistics associated with A and B (beta coefficients, P values) remain essentially unchanged. Would you recommend reporting (a) the full model with the NS interaction or (b) the model with just A and B, adding a comment about what happened (didn’t happen) when the interaction term was added? Thanks.

Jim Frost says

Hi Erick,

Personally, I’d tend to not include the interaction in this case, but you can mention it in the discussion. There might be a few exceptions to that rule of thumb. If the interaction is of particular interest, such as something that you are particularly testing, you might include it. If there are strong theoretical considerations that indicate it should be included in the model despite the lack of significance, you might leave it in.

Generally, if a term is not significant and there is no other reason to include it in the model, I leave it out. Including unnecessary terms that are not significant can actually reduce the precision of the model.

Best of luck with your analysis!

Adil Bhatti says

Greetings, Respected Jim Frost!

I hope you are doing well.

Can I ask a question regarding interaction?

I have question and confusion regarding interaction analysis. what is more important to report regression analysis or scatter plot for interaction?

If regression analysis gives significant p-value (<0.05) but interaction plot does not show proper interaction (parallel lines) so how can we interpret this? Is this interaction considered? only on the basis of p-value.

Sir, I have total of only 612 samples consisting of equal number of cases and controls.

I have only problem that how to explain this, either plots are important or regression analysis (p-value).

I assume that regression analysis just shows significant interaction but scatter plot shows real interaction when lines cross each other.

So, should I explain that p-values are showing significance but plots telling the different (opposite) result- that is the real scenario.

How should I report this type of results? I do not have proper reference to supplement with such type of results. Kindly provide one.

I hope you will respond.

Awaiting for your response.

Thank you for consideration.

Regards!

Adil Bhatti

Jim Frost says

Hi Adil,

I’m not 100% sure that I understand your question correctly. It sounds like you have a significant interaction term in your model but the lines in the interaction plot do not cross?

If that’s the case, there’s not necessarily a problem. Technically, when you have a significant interaction, you have sufficient evidence to conclude that the lines are not parallel. In other words, the null hypothesis is that the lines are parallel, but you can reject that notion with a significant p-value. The difference between the slopes is statistically significant. While you might not see the lines actually cross on the graph, their slopes are not equal. For interaction effects, we often picture the lines making an X shape–but it doesn’t have to be as dramatic as that image. Instead, the lines can both have a positive slope or both have a negative slope, but one line is just a bit steeper than the other. That can still be significant.

Let’s look at the opposite case. If the p-value for the interaction term is not significant, you cannot reject the null hypothesis that the slopes are different. If you look at an interaction plot, you might see that the slopes are not exactly the same. However, in this case, any difference that you observe is likely to be random error rather than a true difference.

The best approach is to use the interaction term p-value in conjunction with the interaction plot. The p-value tells you whether any observed difference in the slopes likely represents a real interaction effect or random error. Technically, the p-value indicates whether you can reject the notion that slopes are the same.

As for references, any textbook that covers linear models should cover this interpretation. My preferred textbook in Applied Linear Statistical Models by Neter et al.

I hope this helps!

Erick Turner says

Thanks, that’s very clear and helpful.

May I follow up with another question, still involving the above-mentioned variables A and B?

In a univariate logistic regression model, A has a highly significant effect and a very large odds ratio. (This finding is expected.)

In another univariate model, B–the “new” variable in the current study–has an effect that is NS (though some might use the controversial word “trend”).

However, using A and B together in a bivariate model, A remains highly significant, and now B becomes highly significant. Also, the odds ratio assoc’d w/ B bumps up quite a bit in magnitude.

As mentioned in our earlier exchange, the A*B interaction was NS (and no one could begin to call that a trend).

What does it mean that B becomes significant only after A is added to the model?

Related question: Would you recommend reporting results from both univariate models as well as the results from the bivariate model?

Thanks again!

Jim Frost says

Hi Erick,

Good to hear from you again!

There are several possibilities–good and not so good. So, you might need to do a little investigation to determine which it is.

First, the good. Remember that when you include a variable in a regression model you are holding it constant or controlling for it. When it’s not in the model, it’s uncontrolled. When you have uncontrolled confounding variables (not in the model), it can either mask a true effect, exaggerate an effect, or create an entirely false effect for the variables in the model. It’s also called omitted variable bias. The variables you leave out can affect the variables that you include. If this is the case for you, then it’s good because, barring other problems, it suggests that you can trust the model where both variables are significant. This problem usually occurs when there is some correlation between the two variables.

In your case, it appears like when you fit the model with only B, the model is trying to attribute counteracting effects to the one variable B, which produces the insignificant results. When you add A, the model can attribute those counteracting effects to each variable separately.

However, there are potential bad scenarios too. The above situation involves correlated predictors, but at a non-problematic level. You should check to make sure that you don’t have too much multicollinearity. Check those VIFs!

There are other possibilities, such as overfitting your model. But, with just two variables, I don’t think–so unless you have a tiny number of observations!

I’m guessing that those two variables are correlated to some degree. Check for that. If they are correlated, be sure it’s not excessive. Then, understanding how they’re correlated (assuming they are), try to figure out a logical reason why having only B without A is not significant. For example, if predictor B goes up, does predictor A tend to move in a specific direction? If so, would the combined movement mask B’s effect when A is not in the model?

Does the direction of the effect for B make sense theoretically?

As for whether to discuss this situation, I’ll assume that the model with both A and B is legitimate. Personally, I would spend most of the time discussing the model with both predictors. Perhaps a bit of an aside about how B is only significant when A is included in the model along with the logic of how leaving A out masks the B’s effect. I wouldn’t spend much time discussing the separate univariate models themselves because if the model with both variables is legit, then the univariate models are biased and not valid. No point detailing biased results when you have a model that seems better!

Your question reminds me that I need to write a blog post about this topic! I’ve got a great example using real data from a study I was in that was similar–and ultimately it made complete sense.

Krzysztof says

Hello. I use SPSS and I have similar results to yours Jim. The p-values are slightly different but in general they look the same (Food has a high non-significant value, others are significant).The coefficient in temperature*pressure is the same.

I think that the slight differences can be an outcome of different algorithms in both softwares. It is the same when I (SPSS) compare my results with my friend (Statistica).

Cheers,

Krzysztof

Jim Frost says

Hi Krzysztof,

Thanks for sharing that information! I guess the methodology must be a bit different, which is a little surprising, but I’m glad the results are similar in nature!

Adil Bhatti says

Hello Dear Jim Frost,

Please respond to my last comment.

Jim Frost says

Hi Adil,

I think I’ve answered everything in your comment. If there is something else you need to know, please ask about it specifically.

ita says

Dear Jim

I have a basic question concerning interactions.

I am looking at possible risk factors for an adverse event. Univariate analysis reveals three variables that are significant (A, B, and C).

In order to evaluate the model (in this case binary logistic regression), there are three possible basic interactions: A*B, B*C, and A*C that could be theoretically introduced into the model.

I have no previous data to support entering any of these possible interactions.

How should I proceed?

Thank you,

Ita

Jim Frost says

Hi Ita,

If there are no theoretical or review of the literature reasons to include those interactions in the model, I still think it’s ok to include them and see if they’re significant. It’s exploratory data analysis. You just have to be aware of that when it comes to the interpretation. You have to be extra aware that if they are significant, you’ll need to repeat studies to replicate the results to be sure that these effects really exist. Keep in mind that all hypothesis tests will produce false positives when the null hypothesis is true. This error rate equals your significance level. But, scientific understanding is built by pushing the boundaries out bit by bit.

There are a couple of things you should be aware of. One, be careful not to fit a model that is too complex for the number of observations. These extra terms in your model require a larger sample size than you’d need otherwise. Read about this in my post about overfitting your model. And, the second thing is that while it’s OK to check on a few things, you don’t want to go crazy and try lots and lots of different combinations. That type of data dredging is bound to uncover correlations that exist only by chance. Read my post on data mining to learn more.

I hope this helps!

ita says

Dear Jim,

First of all I would like to thank you for your answer and for your blog which is really nicely set up and informative.

I would like to expand on what I asked. I am working on two unrelated data sets, one with over 2000 subjects and one with over 100,000 subjects all with complete information on the variables of interest. Both data sets deal with different problems and have slightly different variables but I will unite both into one example to simplify the question.

The dependent variable is mortality. The independent variables are (A) age (years), (B) time from symptom onset to hospital admission (less than one day, more than one day), and (C) time to treatment -from admission till start of antibiotic treatment (hours). As I mentioned in the previous post, there is no clear data on the interactions for this specific topic. However, it makes sense that some interactions exist and here I present three theoretical explanations, one for each interaction + one for all (again – there is no proof that these explanations are correct):

A*B – age may impact how quickly a patient seeks medical advice;

B*C – the manifestation of disease may change with time – if this is true, different manifestation due to longer time till admission may lead to more tests being done before a treatment decision is made;

A*C – the number and type of diagnostic tests may depend on age (CT scans are done more commonly in the elderly and some of these tests take time);

A*B*C – if elderly patients really seek advice late, they may undergo more workup due to their age and also due to different manifestation of disease (difference in manifestation due to either increased age or time elapsed from symptom onset).

So I did some exploratory work on possible interactions to illustrate the impact of these on the model:

No interaction added A*B interaction added A*C interaction added

OR 95%CI OR 95%CI OR 95%CI

A 1.012 1.006,1018 1.018 1.008,1.029 1.010* 1.000,1.021

B 3.697 3.004,4.550 4.665 3.136,6.939 3.698 3.005,4.551

C 1.022 1.011,1.034 1.022 1.011,1.034 1.018 .994,1.042

A*B .991 .979,1.004

A*C 1.000 .999,1.001

B*C

*p=0.048

B*C interaction added A*B and A*C interactions added A*B and B*C interactions added

OR 95%CI OR 95%CI OR 95%CI

A 1.012 1.006,1.018 1.017 1.003,1.031 1.018 1.007,1.029

B 5.306 3.824,7.363 4.657 3.131,6.927 6.496 4.077,10.352

C 1.043 1.025,1.062 1.018 .995,1.043 1.043 1.024,1.062

A*B .991 .979,1.004 .992 .980,1.005

A*C 1.000 .999,1.001

B*C .968 .946,.990 .968 .946,.990

A*C and B*C interactions added A*B, A*C and B*C interactions added

OR 95%CI OR 95%CI

A 1.011 1.001,1.021 1.017 1.003,1.031

B 5.305 3.822,7.363 6.477 4.065,10.322

C 1.040 1.013,1.067 1.040 1.013,1.068

A*B .992 .980,1.005

A*C .968 .946,.990 .999 .999,1.001

B*C 1.000 .999,1.001 .946 .946,.990

I just want to add here that what I think is interesting clinically (though this is a bias, from the statistical point of view) is the impact of variable C on mortality, since this is the only factor we can really improve on in the short term. Age cannot be changed. Time till elderly patients seek advice from symptom onset may be changed but this is extremely difficult. Changing time interval between admission and time treatment is started is the most feasible option. Whether variable C has any impact on mortality is dependent on the interactions that were inserted into the model.

Is it legitimate to say C has no impact on mortality?

Ita

Jim Frost says

Hi Ita,

Unfortunately, the formatting is so bad that I can’t make heads or tails of your results. I know that’s difficult in these comments. I’m going to edit them out of your comment so they don’t take up some much vertical space. But, you can you reply and include them in something that looks better. Maybe just list the odd ratio CI for each variable. I don’t even know which numbers are which in your comment!

As for the rationale, it sounds like you have built up great theoretical reasons to check these interactions!

I’ll be able to say more when I see numbers that make sense!

Thanks!

ita says

Dear Jim,

Once I saw the mess, I sent you the results in a word document to your facebook attached as a message. Maybe you have more control on how the data appears and could embed these in the blog in a way others could appreciate as well. If not I will try again here.

I appreciate your time.

Ita

ita says

Dear Jim,

I paste a link to a table in which I placed the impact of different interactions if these are inserted into the model. I hope this works.

https://photos.google.com/photo/AF1QipPSgRow4k3QM6WDIJRCG7AqZ_LvsQ8N6zjV7KGh

Thanks again,

ita

ita says

or this link if the last one does not work

https://photos.google.com/share/AF1QipOMPXglTk0QhAKIvx3Jvd5jHP6-z7aTyqk2c3qkG87__4wS-pAq3r2twdNsMhwl5g?key=MzJnNTZwUllpRWxhOXFIaW1ZcHVnUTMyMEpqRG5n

ita

ita says

in case the last link does not work, try this one:

https://photos.google.com/share/AF1QipOMPXglTk0QhAKIvx3Jvd5jHP6-z7aTyqk2c3qkG87__4wS-pAq3r2twdNsMhwl5g?key=MzJnNTZwUllpRWxhOXFIaW1ZcHVnUTMyMEpqRG5n

ita

Jim Frost says

Hi Ita,

Sorry for the delay. I have had some extra work. I’ll look at your results soon!

Emily says

Hi Jim,

I have run a univariate GLM in SPSS on these variables:

IV – Condition (experimental vs control)

DV- state-anxiety

Covariate – social anxiety

There is a significant interaction condition*social anxiety on state-anxiety which means I have violated the homogeneity of regression slopes of ANCOVA. However, we predicted an condition*social anxiety interaction to begin with and my supervisor still wants me to use it. Can I still use the ANCOVA and if so would I need to report that this assumption was violated and what post-hoc tests could I use?

Thank you for your time

Jim Frost says

Hi Emily,

This is a weird “assumption” in my book. In fact, I don’t consider it an assumption at all. The significant interaction effect in your analysis indicates that the relationship between condition and anxiety depends on social anxiety. That’s the real description of the relationships in your data (assuming there were no errors conducting the study). In other words, when you know the condition, it’s impossible to predict anxiety unless you also know social anxiety. So, in my book, it’s a huge mistake to take out the interaction effect. I agree with your supervisor about leaving it in. Simply removing the interaction would likely bias your model and cause you to draw incorrect conclusions.

Why is it considered an assumption for ANCOVA? Well, I think that’s really for convenience. If the slopes are parallel, it’s easy to present single average difference, or effect, between the treatment groups. For example, parallel lines let you say something like, group A is an average of 10 points higher than group B for all values of the covariate. However, when the slopes are different, you get different effect sizes based on the value of the covariate.

In your case, you have two lines. One for the control group and the other for the treatment group. Points on a fitted line represent the mean value for the condition given a specified social anxiety value. Therefore, the difference between means for the two groups is the difference between the two lines. When the lines are parallel, you get the nice, single mean difference value. However, when the slopes are not parallel, the difference varies depending on the X-value, which is social anxiety for your study.

Again, that’s not as nice and tidy to report as a single value for the effect, but it reflects reality much more accurately.

What should you do? One suggestion I’ve heard is to refer to the analysis as regression analysis rather than ANCOVA where homogeneity of slopes is not considered an assumption. They’re the same analysis “under the hood,” so it’s not really an assumption for ANCOVA either. But, that might make reviewers happy if that is a concern.

As for what post hoc analysis you can use, I have not used any for this specific type of case, but statistical software should allow you to test for mean differences at specified values of your covariate. For example, you might pick a low value and a high value for social anxiety, and have the software produce adjusted P-values for you based on the multiple testing. In this case, you’d determine whether there was a significant difference between the two conditions at low social anxiety scores. And, you’d also determine whether there was a significant difference between the two conditions at high social anxiety scores. You could also use a middle value if it makes sense.

This approach doesn’t produce the nice and neat single value for the effect, but it does reflect the true nature of your results much more closely because the effect size changes based on the social anxiety score.

Best of luck with your analysis. I hope this helps!

Emily says

Thank you very much for your quick and detailed reply! This has really helped me to understand the assumption isn’t necessary in our case and what our interaction means.

Thanks again for your advice & best wishes

Jim Frost says

Hi Emily,

You’re very welcome. I thought your question was particularly important. It highlights the fact that sometimes the results don’t match your expectations and, in general, it’s best to go with what your data are saying even when it’s unexpected!

hanis sofia says

hai jim. tq for your information and knowledge that u shared here. it help me for my final year project..

Jana says

Hi Jim,

I’ve searched pretty much all of the internet but can’t find a solution for my interaction problem. So I thought maybe you can help.

I have a categorial variable (4 categories, nominal), one contiguous variable (Risk) & a contiguous output (Trust). My hypothesis says that I expect the categories to interact with Risk in that I expect different correlations between risk and trust in the different groups.

I ran a multiple regressions with the groups(as a factor) and risk as predictors and trust as the output in R. I do understand that the interaction terms mean show the difference of the slopes in the groups – but since risk and trust are not measured in the same unit, I have no idea how to get the correlations for each group.

I thought about standardizing risk and trust, because then the predictor in my reference group + the interaction term for each group should be the correlation in that specific group. But that somehow doesn’t work (if I split the data set and just calculate the correlation for each subset I get different correlations) and i can’t find my logical mistake.

Of course I could just use the correlations for the split data sets but I don’t feel like its the “proper” statical way.

Thank you for you time (I hope you understand my problem, its a bit complex and english is not my first language.)

Kind regards,

Jana

Jim Frost says

Hi Jana,

It can be really confusing with various different things going on. Let’s take a look at them.

To start, regression gives you a coefficient, rather than a correlation. Regression coefficients and correlation coefficients both describe a relationship between variables, but in different ways. So, you need to shift your focus to regression coefficients.

For your model, the significant interaction indicates that the relationship between risk and trust depends on which category a subject is in. In other words, you don’t know what that relationship is until you know which group you are talking about.

It’s ok that risk and trust use different units of measurement. That’s normal for regression analysis. To use a different example, you can use a person’s height to predict their weight even though height might be measured in centimeters and weight in kilograms. The coefficient for height tells you the average increase in kilograms for each one centimeter increase in height. For your data, the Risk coefficient tells you the average change in trust given a one unit increase in risk–although the interaction complicates that. See below.

Standardizing your continuous variables won’t do what you trying to get it to do. But, that’s ok because it sounds like you’re performing the analysis correctly. From what you write, it seems like you might need to learn a bit more about how to interpret regression coefficients. Click that link to go to a post that I wrote about that!

Understanding regression coefficients should help you understand your results. The main thing to keep in mind is that the significant interaction tells you that the Risk coefficients in your four groups are different. In other words, each group has its own Risk coefficient. Conversely, if the interaction was not significant, all groups would use the same Risk coefficient. I recommend that you create interaction plots like the ones I made in this blog post. That should help you understand the interaction effect more intuitively.

I hope this helps!

Mohsin says

hi Jim

i hope you are fine

i face problem in interpreting of interaction term between continuous variable military expenditure and terrorism.my dependent variable is capital flight and that model

capital flight= .768(terrorism)+.0854(military expenditure) -.3549(military*terrorism)

coefficient of terrorism and interaction term is significant.

so i am very thankful to you

if you have some time and interpret these results broadly.

or give me any suggestion any related material,,

i am waiting

Jim Frost says

Hi Mohsin,

Interpreting the interaction term is fairly difficult if you just use the equation. You can try plugging in multiple values into the equal and see what outcome values you obtain. But, I recommend using the interaction plots that I show in this blog post. These plots literally show you what is happening and makes interpreting the interaction much easier.

For your data, these plots would show the relationship between military expenditure and capital flight. There would be two lines on the graph that represent that relationship for a high amount of terrorism and a low amount of terrorism. Or, you can display the relationship between terrorism and capital flight and follow the same procedure. Use which ever relationship makes the most sense for your study. These results are consistent and just show the same model from different points of view.

Most statistical software should be able to make interaction plots for you.

Best of luck with your analysis!

kamawee says

Hello Jim!

hope you are doing well.

please help me interpret the following interaction terms. the survey is about the perception. Dependent variable is (customers’ perception) and interaction term is religiosity*location

coefficients Std. Err. T P>ltl [95% confidence interval]

religiosity*location -.0888217 .0374532 -2.37 0.018 -.1625531 -.0150903

i will be really thankful to you.

Jim Frost says

Hi Kamawee,

According to the p-value, your interaction term is significant. Consequently, you know that the relationship between religiosity and perception depends on location. Or, you can say that the relationship between location and perception depends on religiosity. Either is equally valid and depends on what makes the most sense for your study.

For more information, see my reply to Mohsin directly above. Also, this entire post is about how to interpret interaction effect.

Tran Trong Phong says

Hi Jim, can I have questions related to running regression to test interaction effect on SPSS?

In my case, I have independent variables (for example, 6 IVs) and I want to test if there is interaction effect between 6 IVs with a dummy variable. So, I confuse that on SPSS, will I run only 1 regression model which including all 6 IVs and 6 new variables (which are created by 6 IVs time dummy variable), and control variables? or I will run 6 different regression models with all 6 IVs and 1 new interaction variable?

Thank you so much for your help.

taylor says

hello.. how should we treat main effects if there is also an interaction effect? thanks.

Jim Frost says

Hi Taylor,

When you have significant interaction effects, you can’t consider main effects by themselves because you risk drawing the wrong conclusion. You might put chocolate sauce on your hot dog!

You have to consider both effects together. The main effect is what the variable accounts for that is independent of the other variables. The interaction effect is the part that depends on the other variables. The total effect sums the main effect and the interaction effect.

Now, you can do this by entering values into the equation and seeing how the outcomes changes. Or, you can do what I did in this post and create interaction plots, which really brings them to life. These plots include both the main and interaction effects.

I hope this answered your question. You still consider the main effect, but you have to add in the interaction effect.

sophea says

Hi Jim,

Really appreciate if you can help me 🙂

I applied 2 (gender of respondent) x 2 factorial design (review-high/low) in my study. Based on 2 way Annova, both main effects were significant but interaction effect was not significant. the graph showed parallel relationship. can i answer my hypothesis based on the graph (based on groups of mean) even the interaction effect not significant? based on the graph, female higher than male respondents.

2) if main effect; gender significant, review not significant and interaction effect not significant: how can i explain the result?

tq so much for your help 🙂

Jim Frost says

Hi Sophea,

Yes, if the interaction effect is not significant, you can interpret the group means themselves. Assuming the graph is a main effects graph, yes, you can use that by itself as long you check the p-value to make sure it is statistically significant. Sometimes the graphs show a difference that is nothing more than random noise caused by random sampling.

I’m not clear on all of your variables. You mention it’s a two-ANOVA, which means you have two independent variables and a dependent variable. But you only mention a total of two variables. Unfortunately, I can’t fully tell you how to interpret them with incomplete information about your design.

Gender has to be an IV, and maybe review is the DV? If so, you can conclude that the mean difference between the male and female reviews is statistically significant. In other words, women give higher reviews on average. I’m not sure what the other IV is.

Tarek Jaber-Lopez says

Hi Jim,

Thanks a lot for your explanation. It is really helpful. I have a question.

How do we interpret if our depende vairbale is binary (apply to a job or not); one of our dependent variables has 3 categories. For instance, Treatment 1, Treatment 2 and Treatment 3 and our other variable is binary (0=male, 1=female). What is our benhcmark?

Thanks

Jim Frost says

Hi Tarek,

You mention a binary dependent variable but then also a dependent variable with 3 categories. I’m going to assume the later is an independent variable because it has treatment levels.

When you have a binary dependent variable, you need to use binary logistic regression. Using this analysis, you can determine how the independent variables relate to the probability of the outcome (job application) occurring.

The analysis will indicate whether changes in your independent variables are related to changes in the probability of the dependent variable occurring. Predicting human before often produces models that don’t fit the data particularly well (low R-squared values) but can still have significant independent variables. In other words, don’t expect really precise predictions. But, the analysis will tell you if you have sufficient evidence to conclude whether treatment and gender are associated with changes in the probability of applying for a job.

As for benchmarks, you’ll have to conduct subject-area research to find relevant benchmarks for effectiveness. Statistics can determine whether a relationship is statistically significant, but you’ll need to use subject-area knowledge to see if it is practically significant.

kamawee says

thank you so much Jim, what you are doing is really appreciated.

sophea says

Tq Jim for helping me 🙂

“You mention it’s a two-ANOVA, which means you have two independent variables and a dependent variable. But you only mention a total of two variables.”

My iv: gender and review

My dv: trust

tq again Jim…

Tos Rabiu says

Hi Jim,

I am working on a study and it is guided by the question of what effect gender and employment status have on individuals’ political judgment in the form of trust in the government index in African regions. I am using a 2×2 factorial test as the statiscal test. From my ANOVA table result, the main effects and interactions effect are all significant (p<0.05), which implies that I reject my null hypothesis. From my plot, the the slopes are discret, they do not cross. How do I interprete my results?

Thank you.

Tos.

Richard says

Thanks a lot Jim, for your wonderful explanation. I really appreciate your continuos effort to help science.

I have a difficulty interpreting the results of my study. I would be glad to hear your response.

I incubated 3 soils of different fertility gradient with 7 contrasting organic materials for 120 days (7×3 factorial). After the incubation, I analysed dissolved organic carbon and microbial biomass contents.

I did a 2-WAY ANOVA using the three soils and the 7 organic materials as factors. The results revealed a significant interaction effect on the resultant dissolved organic carbon and microbial biomass.

Does it mean that the effects of a given organic material on dissolved soil organic carbon and microbial biomass cannot be generalized across soil types ?

Please, how do I interprete the results of this interaction ? Should it be based on what is common among the soil types ? Thanks in advance

Pablo Isit says

Hi Jim,

Thank you so much for your excellent blog and explanations! I hope you can help me even further.

I am using GLM (in SPSS), and looking at predictors of a specific outcome in a repeated-measures (single group) design. There are 3 time points (baseline, post, follow up). If I run the analysis with main effect of Time, there is a large significant change in the outcome (with reference to T0=Baseline). Now, I want to see whether another variable (lets call this Var1), that was collected also at the same 3 time points, predicts the outcome at post and follow up. To do this, I have included a Var1 by Time interaction in the analysis. Here are my questions:

(1) Should I continue to include the main effect of Time in this model, while assessing whether the Var1 predicts outcome?

(2) Does my Var1 * Time interaction mean that my results are separating both the IV and the DV at each time point (eg, Does Var1 at Timepoint 2 predict outcome at Timepoint 2?), or is it only that my IV is separated by Time, and I am seeing the ‘omnibus’ effect of the outcome (eg, Does Var1 at Timepoint 2 predict the combined outcome at all timepoints?).

(3) If I am interested in whether CHANGE in Var1 at Timepoint 2 is related to CHANGE in outcome at Timepoint 2, and the same for Timepoint 3, how would I go about doing this without producing change scores (which have various issues) and simply correlating them…?

Many thanks in advance!

pablo

Jim Frost says

Hi Pablo,

Yes, you should continue to include the main effect of time. If it is not significant when you add Var1 and the interaction term, you can consider removing it. However, traditionally, statisticians leave the lower-order terms that comprise a higher-order term in the model even when they’re not significant. So, if you include the Var1*Time interaction, you’d typically include Time even if it was not significant. The same applies to the Var1 main effect. If it’s significant, there’s no question that you should definitely leave it in.

For your second question, let’s assume that Time, Var1, and the interaction term are all significant. What this tells you is that for Time, some of it’s effect is independent of Var1. Time has an effect that does not dependent on the value of Var1. This is the main effect. However, some of Time’s effect is in the interaction term, which means that a portion of the effect

doesdepend on the value of Var1. That’s the interaction effect. Time’s total effect is across both terms. The same thing is true with Var1 in this scenario. It has a main effect that doesn’t depend on Time, and an interaction effect that does depend on Time.Assuming both main effects and interaction effects are significant, if you want to make predictions, you’d need to factor in both main effects and interaction effects. I find that’s easier with interaction plots, which I show in this blog post.

As for question three. If you’re using the same subjects, it seems like you should be able to calculate change scores OK? You can also include subject as a variable in your analysis if you’re using the same subjects throughout. Read my post on repeated measures designs for more information about this process along with an example analysis.

Best of luck with your analysis!

Christopher Brancart says

Helpful; I’m a new subscriber.

Pablo Isit says

Dear Jim,

Thank you! Super helpful, clear, and fast! Really appreciate what you do!

So, there is just one aspect that remains unclear for me. The idea of an interaction term makes a lot of intuitive sense to me, until the interaction term includes Time. Then I’m not sure my intuition is correct any longer.

So to reiterate (forgive if not necessary) this is the basic situation:

T0 (baseline): Var1 and Outcome measured

T1 (post treatment): Var1 and Outcome measured

T2 (follow up): Var1 and Outcome measured

So is it correct to say that if I find a main effect of Var1 on outcome, the model is “combining” (averaging?) Var1 at T0, T1, and T2, and then assessing whether it relates to the “combined: Outcome at T0, T1, and T2?

What I’m unclear about (if I have the above correct), is how Var1 and Outcome are separated across Time if I include a Var1 * Time interaction in my model. The way I think of it is in terms of different slopes. Lets say Outcome = Depression score, and without Var1, in general across the group Depression score is improving at T1 and T2 (following treatment). Lets say Var1 is the ratio of abstract vs concrete words used in a task, and that decreases in abstract words (lower Var1 scores) predicts lower depression scores over T1 and T2. So the interaction between Var1 and Time would show a steeper ‘downward’ slope in depression scores over Time than the main effect of Time. So… i guess the simplest way to ask my question is: does the model consider each time point separately? (ie, group mean Depression scores at T0 are multiplied by group mean Abstraction scores at T0 only, and group mean Depression scores at T1 are multiplied by group mean Abstraction scores at T1 only, and group mean Depression scores at T2 are multiplied by group mean Abstraction scores at T2 only). Or alternatively, is the model somehow looking at whether change (slope) of an individual’s Abstraction score over T0, T1, and T2 predicts their average (combined) Depression score over the three time points? Or alternatively, is the model assessing whether the average (combined) Abstraction score over the three timepoints is predicting the change/slope of Depression scores across T0, T1, and T2?

Hope this question makes sense?

Thank you so much,

pablo

Pablo Isit says

Hi Jim,

Perhaps one other follow up question to the previous post: What would you recommend is the best way to assess whether CHANGE in Var1 predicts CHANGE in Outcome, using a Generalized Linear Model (Outcome is count variable, negative binomial distribution; Var1 is continuous) ? Is the above interaction idea the right way to go? Or would you compute change scores? And if the latter, how? Would I see whether Var1 Change score between T0 and T1 correlates with Outcome Change score between T0 and T1, and then do the same for Change scores between T1 and T2? Would seem odd to me to separate this way, and what about change from T0 to T2?

Many thanks again!

pablo

Seren says

This is really helpful, thanks very much!

I have a question: What does it mean if the interaction between two factor variables is insignificant, but the main effects are significant, ( and adding in the interaction causes an increase in the adjusted R^2 value)? The model also has another factor variable and another continuous variable that are both significant.

Jim Frost says

Hi Seren,

Adjusted R-squared will increase anytime the t-value is greater than 1. Consequently, there is a range of t-scores between 1 and ~ 1.96 where adjusted R-squared increases for a model term even though it isn’t significant. This is a grey area in terms of what to do with the term.

Use your subject-area knowledge to help you decide. Use an interaction plot to see if the potential interaction effect fits theory. Does it change other aspects of the model much? The other coefficients and adjusted R-squared? Residual plots? It might not make much of a difference in terms of how well the model fits the data. If it doesn’t affect the other characteristics much, it’s not such an important decision. However, if it changes other the other properties noticeably, it becomes a more important decision.

Best of luck with your analysis!

Seren says

Thank you very much-really helpful!!

Amelia says

Hi Jim,

Thank you very much for your super helpful blog. I was wondering if there is any chance you could help with clarifying an issue that I am currently having (I’ve tried searching an answer for this for a few hours and have not managed to find it).

I’ve conducted a multiple linear regression with 3 categorical (dummy coded) predictors:

Var1 has 4 categories (i.e. 3 dummies [Var1a, Var1b, Var1c, + reference Var1d]);

Var2 is binary (Var2a + reference Var2b); and

Var3 is also binary (Var3a + reference Var3b).

I have also tested for the interactions between Var1 and Var2; and Var1 and Var3. The latter is the one causing issues for me.

Looking at the SPSS GLM output, the overall F-value for “Var1 x Var3” is significant (6.14, p < .001).

However, none of the individual coefficients for the individual dummy coded interaction terms (i.e. Var1a x Var3a, Var1b x Var3a, Var1c x Var3a + reference categories) are significant (p = .95, .73 and .66, respectively).

The constant is significant.

I really don't understand if I should interpret this as meaning that the interaction was significant (as per the F value), or non-significant (as per the coefficients)? Any help would be hugely appreciated!

Jim Frost says

Hi Amelia,

I

thinkI understand what is happening based on your description. To test the collective effect of a categorical that has multiple levels, you need to use dummy (indicator) variables as you accurately describe. So, you have multiple terms in the model that represent the collective effect of one categorical variable. To determine whether that collective effect across multiple indicator terms is statistically significant, your software uses an F-test because that test can handle multiple terms. That test determines whether the difference between the model with that set of indicator variables versus the model without that set is statistically significant. That F-test tells you whether that entire categorical variable across its levels isjointlysignificant.However, when you’re looking at the coefficients for specific levels, those p-values are based on t-tests, which only compares that individual coefficient to zero. It’s an

individualassessment of significance rather than the F-test’sjointassessment of significance. Consequently, these test results might not always agree. In my experience, these tests often do agree–more often than not. However, if they don’t, it’s not necessarily problematic statistically. Although, it limits how many conclusions you can draw from your data.So, what does it mean? It does get more complicated in your case because you’re talking about interaction effects. What I write above is true for the main effects of a categorical term, but also true for interaction effects. In your case, because the interaction term itself is statistically significant, you have sufficient evidence to conclude that the nature of the relationship of between Var1 and the DV depends on the value of Var3. Then, you go to the individual coefficients to determine the nature of how it changes. These coefficients provide the specific details of how the interaction affects the outcome.

Your results are telling you that you have enough evidence to conclude that interaction effect exists but not enough evidence to flesh out the details about the nature of the interaction effect. You can think of it as the F-test combines a little significance from all the different combinations of factor levels and collectively those little bits of significance add up to be statistically significant. However, when you look at each combination of factor levels by itself, there is not enough to be significant. You might need a larger sample size to flesh out those details.

So, yes, the interaction is signficant, but you don’t have enough information to draw more specific conclusions about the detailed nature of that interaction.

I hope this helps!

m says

thanks for information, i want to ask a question what are the techniques to control interaction and main effects. plz explain i will be very thankful to you.

Amelia says

Hi Jim,

Thank you so much for your reply – that is very helpful. I am working with a large sample (over 11,000) as I am working with cohort data, so I am still a bit puzzled about how I might have found a more conclusive result. I wonder if this could be due to quite a low cell size in my reference category. In any case, thank you again for your help!

Jim Frost says

Hi Amelia,

That’s a large sample size! How many are in the reference category? That could be a possibility.

It’s also possible that the effect size is very small.

Ann says

Hi Jim, I just came across your website having spent 3 weeks trying to find a simple, relatable explanation of interactions. I am in the process of completing my assignment now and so have not perused the website in any great detail but I had to take the time out to say thank you. I was beginning to wonder if the issue was me why I could not find any material that I could relate and then I stumbled upon you website. Great job! Thank you.

Jim Frost says

Hi Ann, thanks so much for the nice comment! I’m so glad to hear that it was helpful!

sofea says

Dear Jim.

I have question regarding main and interaction effect.

My main effect (iv: gender and language style) both are significant on language style matching. However there is no interaction effect on both independent variable. my hypothesis is: language style positively impact on language style matching. How i can inteprate this hypothesis?because both main interaction significant but no interaction effect. Do i need to accept null hypothesis?

Thank you so much for helping 🙂

Jim Frost says

Hi Sofea,

What this indicates is that the relationship between each IV and the DV does not depend on the value of the other IV. In other words, if you know the value of one IV, you know its entire effect without needing to know the value of the other IV.

However, if the interaction effect had been significant, then you’d know that a portion of each IVs effect depends on the value of the other IV. In other words, you could not know the entire effect of one of the IVs without knowing the value of the other IV.

Technically, if the p-value for the interaction term is greater than your significance level, you fail to reject the null, which is a bit different than accepting the null. Basically, you have insufficient evidence to suggest that the null is false.

I hope this helps!

Vidal Romero says

Hi Jim, thanks so much for the blog. I have a question. I specified an OLS model with 3 interaction terms. It all works fine, but when I get the model predictions (y hats), for some values, these are out of sample (e.g. the dependent variable goes from 0 to 455, and for some combination of values of my Xs, I get a -10.5).

I ran the predictions in different ways using various commands and by hand step by step, taking care to include only observations in the sample, so I’m confident that it is not the issue.

Is it possible to get out of sample predictions (yhats) because of the interactions?

Thanks. Cheers.

Jim Frost says

Hi Vidal,

It’s not really a surprise that your model will produce predictions that fall outside the range of the data. Error is always part of the predictions. It’s not the interactions that are necessarily causing them to fall outside the range, but the error in your model. Error is just the unexplained variability.

For your real data, is zero a hard limit for the dependent variable? If so, the regression model doesn’t know that. And, again, it’s likely just the inherent error that causes some values to go outside the range of the data.

I’m assuming that your model isn’t biased. However you should check the residual plots to be sure you’re fitting an unbiased model. If it is biased, you’d expect your residuals to be systematically too high or too low for different values of the predictor.

Sandeep Prabhu says

Great explanation