Interaction effects occur when the effect of one variable depends on the value of another variable. Interaction effects are common in regression analysis, ANOVA, and designed experiments. In this blog post, I explain interaction effects, how to interpret them in statistical designs, and the problems you will face if you don’t include them in your model.

In any study, whether it’s a taste test or a manufacturing process, many variables can affect the outcome. Changing these variables can affect the outcome directly. For instance, changing the food condiment in a taste test can affect the overall enjoyment. In this manner, analysts use models to assess the relationship between each independent variable and the dependent variable. This kind of an effect is called a main effect. However, it can be a mistake to assess only main effects.

In more complex study areas, the independent variables might interact with each other. Interaction effects indicate that a third variable influences the relationship between an independent and dependent variable. This type of effect makes the model more complex, but if the real world behaves this way, it is critical to incorporate it in your model. For example, the relationship between condiments and enjoyment probably depends on the type of food—as we’ll see in this post!

## Example of Interaction Effects with Categorical Independent Variables

I think of interaction effects as an “it depends” effect. You’ll see why! Let’s start with an intuitive example to help you understand these effects conceptually.

Imagine that we are conducting a taste test to determine which food condiment produces the highest enjoyment. We’ll perform a two-way ANOVA where our dependent variable is Enjoyment. Our two independent variables are both categorical variables: Food and Condiment.

Our ANOVA model with the interaction term is:

Satisfaction = Food Condiment Food*Condiment

To keep things simple, we’ll include only two foods (ice cream and hot dogs) and two condiments (chocolate sauce and mustard) in our analysis.

Given the specifics of the example, an interaction effect would not be surprising. If someone asks you, “Do you prefer ketchup or chocolate sauce on your food?” Undoubtedly, you will respond, “It depends on the type of food!” That’s the “it depends” nature of an interaction effect. You cannot answer the question without knowing more information about the other variable in the interaction term—which is the type of food in our example!

That’s the concept. Now, I’ll show you how to include an interaction term in your model and how to interpret the results.

## How to Interpret Interaction Effects

Let’s perform our analysis. All statistical software allow you to add interaction terms in a model. Download the CSV data file to try it yourself: Interactions_Categorical.

The p-values in the output below tell us that the interaction effect (Food*Condiment) is statistically significant. Consequently, we know that the satisfaction you derive from the condiment *depends* on the type of food.

But, how do we interpret the interaction effect and truly understand what the data are saying? The best way to understand these effects is with a special type of graph—an interaction plot. This type of plot displays the fitted values of the dependent variable on the y-axis while the x-axis shows the values of the first independent variable. Meanwhile, the various lines represent values of the second independent variable.

On an interaction plot, parallel lines indicate that there is no interaction effect while different slopes suggest that one might be present. Below is the plot for Food*Condiment.

The crossed lines on the graph suggest that there is an interaction effect, which the significant p-value for the Food*Condiment term confirms. The graph shows that enjoyment levels are higher for chocolate sauce when the food is ice cream. Conversely, satisfaction levels are higher for mustard when the food is a hot dog. If you put mustard on ice cream or chocolate sauce on hot dogs, you won’t be happy!

Which condiment is best? It depends on the type of food, and we’ve used statistics to demonstrate this effect.

## Overlooking Interaction Effects is Dangerous!

When you have statistically significant interaction effects, you can’t interpret the main effects without considering the interactions. In the previous example, you can’t answer the question about which condiment is better without knowing the type of food. Again, “it depends.”

Suppose we want to maximize satisfaction by choosing the best food and the best condiment. However, imagine that we forgot to include the interaction effect and assessed only the main effects. We’ll make our decision based on the main effects plots below.

Based on these plots, we’d choose hot dogs with chocolate sauce because they each produce higher enjoyment. That’s not a good choice despite what the main effects show! When you have statistically significant interactions, you cannot interpret the main effect without considering the interaction effects.

Given the intentionally intuitive nature of our silly example, the consequence of disregarding the interaction effect is evident at a passing glance. However, that is not always the case, as you’ll see in the next example.

## Example of an Interaction Effect with Continuous Independent Variables

For our next example, we’ll assess continuous independent variables in a regression model for a manufacturing process. The independent variables (processing time, temperature, and pressure) affect the dependent variable (product strength). Here’s the CSV data file if you want to try it yourself: Interactions_Continuous.

In the regression model, I’ll include temperature*pressure as an interaction effect. The results are below.

As you can see, the interaction term is statistically significant. But, how do you interpret the interaction coefficient in the regression equation? You could try entering values into the regression equation and piece things together. However, it is much easier to use interaction plots!

**Related post**: How to Interpret Regression Coefficients and Their P-values for Main Effects

In the graph above, the variables are continuous rather than categorical. To produce the plot, the statistical software chooses a high value and a low value for pressure and enters them into the equation along with the range of values for temperature.

As you can see, the relationship between temperature and strength changes direction based on the pressure. For high pressures, there is a positive relationship between temperature and strength while for low pressures it is a negative relationship. By including the interaction term in the model, you can capture relationships that change based on the value of another variable.

If you want to maximize product strength and someone asks you if the process should use a high or low temperature, you’d have to respond, “It depends.” In this case, it depends on the pressure. You cannot answer the question about temperature without knowing the pressure value.

## Important Considerations for Interaction Effects

While the plots help you interpret the interaction effects, use a hypothesis test to determine whether the effect is statistically significant. Plots can display non-parallel lines that represent random sample error rather than an actual effect. P-values and hypothesis tests help you sort out the real effects from the noise.

The examples in this post are two-way interactions because there are two independent variables in each term (Food*Condiment and Temperature*Pressure). It’s equally valid to interpret these effects in two ways. For example, the relationship between:

- Satisfaction and Condiment depends on Food.
- Satisfaction and Food depends on Condiment.

You can have higher-order interactions. For example, a three-way interaction has three variables in the term, such as Food*Condiment*X. In this case, the relationship between Satisfaction and Condiment depends on both Food and X. However, this type of effect is challenging to interpret. In practice, analysts use them infrequently. However, in some models, they might be necessary to provide an adequate fit.

Finally, when you have interaction effects that are statistically significant, do not attempt to interpret the main effects without considering the interaction effects. As the examples show, you will draw the wrong the conclusions!

If you’re learning regression, check out my Regression Tutorial!

Neha says

Thank you for amazing posts. the way you express concepts is matchless.

Jim Frost says

You’re very welcome! I’m glad they’re helpful!

Mona says

what does it mean when I have a significant interaction effect only when i omit the main effects of the independent variables (by choosing the interaction effect in “MODEL” in SPSS). it is “legal” to report the interaction effect without reporting the main effects?

Jim Frost says

Hi Mona,

That is a bit tricky.

If you had one model where the main effects are not significant, but the interaction effects are significant, that is perfectly fine.

However, it sounds like in your case you have to decide between the main effects or the interaction effects. Models where the statistical significance of terms change based on the specific terms in the model are always difficult cases. This problem often occurs (but is not limited to) in cases where you multicollinearity–so you might check on that.

This type of decision always comes down to subject area knowledge. Use your expertise, theory, other studies, etc to determine what course of action is correct. It might be OK to do what you suggest. On the other, perhaps including the main effects is the correct route.

Jim

Apple says

what is the command for conintuous by continuous variables interaction plot in stata?

Thanks

Jim Frost says

Hi, I’ve never used Stata myself, but I’ve seen people use “twoway contour” to plot two-way interaction effects in Stata. Might be a good place to start!

Sol says

Hi Jim, thank you very much for your post. My question is how do you interpret an insignificant interaction of a categorical and a continuous variable, when the main effects for both variables are significant? For the sake of simplicity if our logit equation is as follows Friendliness = α + βAge + βDog + βAge*Dog. Where Friendliness and Dog are coded as dummy variables that take the values of either 1 or 0 depending on the case. So if all but the interaction term, βAge*Dog, is significant, does that mean the probability of a dog being friendly is independent of its age?

Jim Frost says

If the Age variable is significant, then you know that friendliness

isassociated with age, and dog is as well if that variable is significant. A significant interaction effect indicates that the effect of one variable on the dependent variable depends on the value of another variable. In your example, lets assume that the interaction effect was significant. This tells you that the relationship between age and friendliness changes based on the value of the dog variable. In that case, it’s not a fixed relationship or effect size. (It’s also valid to say that the relationship between dog and friendliness changes based on the value of age.)Now, in your case, the interaction effect is not significant but the two main effects are significant. This tells you that there is a relationship between age and friendliness and a relationship between dog and friendliness. However, the exact nature of those relationships DO NOT change based on the value of the other variable. Those two variables affect the probability of observing the event in the outcome variable, but one independent variable doesn’t affect the relationship between the other independent variable and the dependent variable.

The fact that you have one categorical variable and a continuous variable makes it easier to picture. Picture a different regression line for each level of the categorical variable. These fitted lines display the relationship between the continuous independent variable and the response for each level of dog. A significant interaction effect indicates that the differences between those slopes are statistically significant. An insignificant interaction effect indicates that there is insufficient evidence to conclude that the slopes are different. I actually show an example of this situation (though not with a logistic model) that should help.

I hope that makes it more clear!

Luka says

Hello,

I am interested how to read for interaction effect if we just have a table of observations, for example

A B C

2 4 7

4 7 8

6 9 13

In the lecture I attended this was explained as “differences between differences” but I didn’t get what this refers to.

Thanks

Jim Frost says

Hi Luka, it’s impossible to for me to interpret those observations because I don’t know the relationships between the variables and there are far too few observations.

In general, you can think of an interaction effect as an “it depends” effect as I describe in this blog post. Suppose you have two independent variables X1 and X2 and the dependent variable Y. If the relationship between X1 and Y changes based on the value of X2, that’s an interaction effect. The size of the X1 effect depends on the value of X2. Read through the post to see how this works in action. The value of the interaction term for each observation is the product of X1 and X2 (X1*X2).

An effect is the difference in the mean value of Y for different values of X. So, if the interaction effect is significant, you know that the differences of Y based on X will vary based on some other variable. I think that’s what your instructor meant by the differences between differences. I tend to think of it more as the relationship between X1 and Y depends on the value of X2. If you plot a fitted line for X1 and Y, you can think of it as the slope of the line changes based on X2. There’s a link in this blog post to another blog post that shows how that works.

I hope this helps!

Syahmi says

Your explanation is really great! Thank you so much. I totally will recommend you to my friends

Jim Frost says

You’re very welcome! Thank you for recommending me to your friends!

Luka says

Thanks for help, I appreciate it!

Yeasin says

Great work Jim! People get very vague idea whenever they look at google to learn the basic about interaction in statistics. Your writing is a must see and excellent work that demonstrated the basic of interaction. Thanks heaps.

Jim Frost says

Hi Yeasin, thank you! That means a lot to me!

Tanikan says

Hi Jim,

Thank for the valuable tutorial.

I have 2 questions as follows:

1. In more complex study areas, the independent variables might interact with each other. What do you mean by complex area? Is it social science?

2. I have run Mancova and observed that results of two-way = interaction. I found that SPSS does not run post-hoc. Can I use the t-test after that?

My model is factorial design (2 levels of X1, 2 levels of X2, and 2 levels of X3) on Y.

I report in paper for two-way and three way interaction on below. Is it ok?

Two-way interaction

Among the X2 level 1 group, the mean of Y among subjects who viewed X3 level 2 (adjusted M = xxx, SE =xxx) is significantly higher than those who viewed X3 level 1 (adjusted M = xxx, SE = xxx) with t(xx) = xx, p < xx.

three-way interaction

Among the subjects who viewed the X3 level 2, the mean of Y of the subjects who expressed X1 level 2 (adjusted M = xxx, SE = xxx) is significantly greater than those who expressed X1 level 1 (adjusted M = xxx, SE = xxx) for those who had X2 level 1 [t(xx) = xxx, p < xxx].

Thank you in advance

Jim Frost says

Hi Tanikan,

Thanks for the great questions!

Regarding more and less complex study areas, in the context of this post, I’m simply referring to subject matter where only main effects are statistically significant as being simpler. And, subject areas where interaction effects are significant as more complex. I’m calling them more complex because the relationship between X and Y is not constant. Instead, that relationship depends on at least one other variable. It’s just not as simple.

I would not use t-tests for that purpose. I’m surprised if SPSS can’t perform post-hoc tests when there are interaction effects–but I use other statistical software more frequently. With your factorial design, there will be multiple groups based on the interactions of your factors. As you compare more groups, the need for controlling the family/joint/simultaneous error rate becomes even more important. Without controlling for that joint error rate, the probability that at least one of the many comparisons will be a false positive increases. T-tests don’t control that joint error rate. It’s important to use a post hoc test.

At least for the two-way interaction effects, I highly recommend using an interaction plot (as shown in this post) to accompany your findings. I find that those graphs are particularly helpful in clarifying the results. Of course, that graph doesn’t tell you which specific differences between groups are statistically significant. The post hoc tests for those groups will identify the significant differences.

I hope this helps!

Alicia says

Hi, Jim!

I have a sort of somehow interaction-related question, but I didn’t know where to post it, so this entry seemed the most adequate to me.

I work with R and I would like to use an ANCOVA to evaluate the effect of a factor (age, for example, with two levels, adult and subadult) in the regression of body length (log transformed, logLCC) and weight (log transformed, logweight). This regression measures body condition of an individual (higher weights at same lenghts indicate a better condition, that is, sort of “fluffyness”).

So, when I run the analysis:

aov(logweight~logLCC*age)

I obtain a significant interaction between logLCC:age (p=0.0068). I understand this means that slopes for each age class are not paralell. However, the factor age alone it’s not significant (p=0.2059).

What does this mean? How is it interpreted?

I have tried deleting the interaction from the model, but it loses a lot of explicative power (p=0.0068). So, what should I do? I am quite lost with this issue…

Thank you so much in advance,

Alicia

Jim Frost says

Hi Alicia!

First, before I get into the interaction effect, a comment about the model in general. I don’t know if you’re analyzing human weight or not. But, I’ve modeled Percent Body Fat and BMI. While I was doing that, I had to decide whether to use Height, Weight, and Height*Weight as the independent variables and interaction effect or should I use body mass index (BMI). I found that both models fit equally as well but I went ahead with BMI because I could graph it. I did have to include a polynomial term because the relationship was curvilinear. I notice that you’re using a log transformation. That might well be just fine and necessary. But, I found that I didn’t need to go that route. Just some food for thought. You can read about this BMI and %body fat model.

Ok, so on to your interaction effect. It’s not problematic at all that the main effect for age is not significant. In fact, when you have a significant interaction you shouldn’t try to interpret the main effect alone anyway. Now, if it had been significant and you wanted determine the entire effect of age, you would’ve had to assess both the main effect and the interaction effect together. Now, you just need assess the interaction effect alone. But, it’s always easiest to interpret interaction effects with graphs, as I do in this blog post.

In the post, I show examples of interaction plots with two factors and another with two continuous variables. However, you can certainly create an interaction plot for a factor * continuous variable. For your model, this type of graph will display two lines–one for each level of the age factor. Because you already know the interaction term is significant, the difference between the two slopes is statistically significant. (If the main effect had been significant, the interaction plot would have included it in the calculations as well–but it is fine that it’s not significant.)

It sounds like you should leave the interaction effect in the model. Some analysts will also include the main effects in the model when they are included in a significant interaction effect even if the main effect is not significant by itself (e.g., age). I could go either way on that issue myself. Just be sure that the interaction makes theoretical/common sense for your study area. But, I don’t see any reason for concern. The insignificant main effect is not a problem.

I hope this helps!

Alicia says

Hi Jim,

first of all… thank you very much for your early response!

And after that… I am so sorry! I forgot to explain that I work with lizards, not with humans. My measurement of body length (logLCC) corresponds to the log-transformed Snout-Vent Length (logSVL, whose acronym in spanish, given that it’s my mothertongue, is LCC; I forgot to translate it!). The relationship among these two variables tend to be linear.

So, in these animals, the regression of logSVL and logweight is a common and standardized method to assess body condition. Residuals from this regression are used to assess body condition; if they’re positive the animal is more “chubby” (better condition) and, if they’re negative, the animal is more “skinny” (worse condition). The aim of my ANCOVA is to compare the effect of age on this regression.

Anyway, following your advice I created an interaction plot which displays two lines, one for each level of the age factor. The two lines cross in a certain middle point, diverging prior and after that point. Thanks to your detailed answer, I understand that this means that age interacts somehow with body length (what sounds logical, as lizards grow together with aging), but I still don’t know how to interpret this in relation to body condition (regression).

Thanks again for your detailed, kind and early response!

Jim Frost says

You’re very welcome! And, subject area knowledge and standards definitely should guide your model building. I always enjoy learning how others use these types of analysis. And, that’s interesting actually using the residuals to assess a specimen’s health!

If you can, and are willing, post the interaction plot, I can take a stab at interpreting it. (I know I can post images in these comments but I’m not sure about other users.) Basically, the relationship between body length and weight depends on the age factor. Or, stated another way, you can’t use body length to predict weight until you know the age.

Alicia says

Hi, Jim!

Thank you again for your willingness! Unfortunately, I can’t /don’t know how to post the plot in the comments… If you are willing, you can contact me by email so I can send it to you, plus the results of the regression or whatever information that could be helpful.

Thank you!

Shruti says

Hi Jim,

Thanks for your explanation! It was really useful. I have a couple of follow-up questions. Let’s suppose a situation with 2 regression models, both of which have the exact same variables, except the second model has an additional interaction term between two variables already in the first model.

1. Now comparing the 2 regression equations, why do coefficients of other variables (apart from the interaction term and the 2 variables used to create the interaction term) change?

2. How do we compare and interpret the change in coefficients of variables which were used to create the interaction term in the first and second models?

Let me know in case it’s better for me to explain with an example here.

Thanks!

Jim Frost says

Hi Shruti,

I think I understand your questions.

1) Any time you add new terms in the model, the coefficients can change. Some of this occurs because the new term accounts for some of the variance that was previously accounted for by the other terms, which causes their coefficients to change. So, some change is normal. The changes can tend to be larger and more erratic when the model terms are correlated. The interaction term is clearly correlated with the variables that are included in the interaction. When you include an interaction term, you can help avoid this by standardizing your continuous variables.

2) I have heard about cases where analysts try to interpret the changes in coefficients when you add new terms. My take on this is that the changes are not very informative. Let’s assume that your interaction term is a valuable addition to the model. In that case, you can conclude the model without the interaction term is not as good of a model and it’s coefficient estimates might well be biased. Consequently, I wouldn’t attribute much meaning to the change in coefficient values other than your new model with the interaction term is likely to better.

However, one caveat, I believe there are fields that do place value in understanding those changes. I’m not sure that I agree, but if your field is one that has this practice, you should probably check with an expert.

I hope I covered everything!

Susanne says

Hello Jim!

Thanks for making such very clear posts. I tutor students with stats and its really tough to find good easy to follow material that EVERYONE can get. So to stumble on such a clear explanation is a breath of fresh air 😀

Now I recently saw in one of my students powerpoints that they are taught they have to redo the ANOVA analysis without the interaction if the interaction is not significant. Maybe i’ve always missed something but I have never heard of this before. Does this sound familiar to you and if so can you explain to me why this is?

thanks!

Susanne

Jim Frost says

Hi Susanne, thanks so much for your kind words. They mean a lot to me–especially coming from a stats tutor!

I have always heard that you should not include the interaction term when it is not significant. The reason being is that when you include insignificant terms in your model, it can reduce the precision of the estimates. Generally, you want to leave as many degrees of freedom for the error as you can.

Courtney Barrs says

Hi Jim,

Thankyou for this post, I found it incredibly helpful.

I am having trouble interpreting my own results of a two-way repeated ANOVA and was wondering if you could help me out.

Participants were exposed to two different videos, controlled with a counter balance. Video 1 consisted of a comedy sketch, while video 2 was of a nature documentary. Every 2 mins the participants had to indicate on a likert scale how Bored they felt at the time. For the analysis I averaged the boredom score over the first and second half of the video.

IV1: Video (Comedy vs Nature)

IV2: Time (Time 1 vs Time 2)

DV: Boredom score

My analysis output reveals a significant main effect of video p<.000, and non significant effect for time p=.192. However I have an effect of interaction for video*time, p<.000.

How would you go about interpreting these results?

Thanks in advance!

Jim Frost says

Hi Courtney,

I’m happy to hear that you found this post helpful!

The first thing that I’d recommend is graphing your results using an interactions plot like I do in this post. That’s the easiest way to understand interactions. It’s great that you’ve done the ANOVA test because you already know that whatever pattern you see in the plot is, in fact, statistically significant. Given the significance, I can conclude the lines on your plot won’t be parallel.

For your results, you can state them one of two ways. Both ways are equally valid from a statistical standpoint. However, one way might make more sense than the other given your study area or what you’re trying to emphasize.

1) The relationship between Video and Boredom depends on Time. Or:

2) The relationship between Time and Boredome depends on Video.

For the sake of illustration, let’s go with #2. You might be able to draw the conclusion along the lines of: As subjects progress from time 1 to time 2, the average boredom score increases more slowly for those who watch comedy compared to those who watch a nature documentary. Of course, you’d have to adapt the wording to match your actual results. That’s the type of conclusion that you can draw, and you’re able to say that it is statistically significant given the p-value for the interaction term.

Given that the interaction term is significant, you don’t need to interpret the main effects terms at all. And, it’s no problem that one of the main effects is not significant.

I hope this helps!

Courtney says

Hi Jim,

Thankyou so much for your quick and helpful response, it really means a lot!

This is what initially confused me when it came to interpreting my results, looking at my interaction graph there was no cross over. Both conditions are more or less parallel with one another, the gradient between time 1 and time 2 for comedy is almost 0. However, there is quite the drop for the nature video in the boredom rating at time 2.

Because the interaction graph does not cross over, does this mean that only in the Nature video does the boredom decrease significantly at Time 2? Will I need to conduct a t-test to check this?

Many thanks!

Courtney

Courtney Barrs says

Hi Jim,

Thankyou for such a quick and helpful response!

Graphing the interaction effect is actually what confused me when it came to interpretting my results. The conditions are actually parallel to one another, there is no cross over. The gradient for the comedy condition is almost zero, whereas, there is a dramatic drop in rating of boredom between time 1 and time 2 for the nature video.

With this in mind does the interpretation then mean: A difference in boredom is found across time depending on condition. Therefore, only if you are watching the nature video will you become significantly more bored at time 2. Will I need to conduct a t-test to conform this?

Many thanks!

Courtney

Jim Frost says

Hi Courtney,

You bet! 🙂

Technically, a significant interaction effect means that the difference in slopes is statistically significant. The lines don’t actually have to cross on your graph–just have different slopes. Well, having different slopes means that the lines must cross at some point theoretically even if that point isn’t displayed on your graph.

As for the interpretation, the zero slope for comedy indicates that as time passes, there is no tendency to become more or less bored. However, for nature videos, as time passes, there is a tendency to become more bored. (I’m assuming that the drop in rating that you mention corresponds to “becomes more bored”.) This difference in tendencies is statistically significant. The significant interaction indicates that the relationship between the passage of time and boredom depends on the type of video the subjects watch.

Again, an interaction effect is an “it depends” effect. Do the subjects become more bored over time? It depends on what type of video they watch! You can’t answer that question without knowing which video they watch.

So, the interaction tells you that the difference in slopes is statistically significant, which is different than the whether the difference between group means are statistically significant. To identify the specific differences between group means that are statistically significant, you’ll need to perform a post hoc test–such as Tukey’s test. These tests control the joint error rate because as you increase the number of group comparisons, the chance of a Type I error (false positive) increases if you don’t control it. I don’t have a blog post on this topic yet but plan to write one.

The interaction term tells you that the relationship changes while the post hoc test tells you whether the difference between specific group means is statistically significant.

Saheeda says

This is one of the best explanations I have read to explain ‘interactions’. Thanks!

Jim Frost says

Thanks so much, Saheeda! Your kind words mean a lot to me! I’m glad it was helpful.

Bill says

Hello. Jim. Thank for your great article.

Sorry in advance for my English. Moreover, my understanding for SPSS and stat is quite limited so some question might be silly.

I’m doing 4×5 factorial ANOVA. One of the test has Sig. interaction effect but I don’t know what exact method should I interpret it. Some told that I need to do simple main effect test, some told that Post Hoc is enough so I’m quite confused.

Another test the graph shown some cross-over line (because there are a lot of levels of iv) but the sig. value is 0.069 = not significant interaction effect right?. However I’ve read that if the line crossed, the interaction is exist. So how should I summarize?

I’m willing to send the information for you if u need.

Thank you.

Bill

Jim Frost says

Hi Bill,

You have some excellent questions!

When you have a significant interaction effect, you know you can’t interpret the main effects without considering the interaction effects. As I show in the post, interaction effects are an “it depends” effect. The interpretation for one factor depends on the value of another factor. If you don’t assess the interaction effect, you might end up putting ketchup on your ice cream!

Assessing the Post Hoc test results can be fine by itself as long as you include the interaction term in the ad hoc test. Taking that approach, you’ll see the groupings based on the interaction term and know which groups are significantly different from each other. I also like to graph the interaction plots (as I do in this post) because it provides a great graphical overview of the interaction effect.

There’s an important point about graphs. They can be very valuable in helping you understand your results. However, they are not a statistical test for your data. An interaction plot can show non-parallel lines even when the interaction effect is not significant. When you work with sample data, there’s always the chance that sample error can produce patterns in your sample that don’t exist in the population. Statistical tests help you distinguish between real effects and sample error. These tests indicate when you have sufficient evidence to conclude that an effect exists in the population.

When you have crossed lines in an interaction plot but the test results are not statistically significant, it tells you that you don’t have enough evidence to conclude that the interaction effect actually exists in the population. Basically, the graph says that the effect exists in the sample data but the statistical test says that you don’t have enough evidence to conclude that it also exists in the population. If you were to collect another random sample from the same population, it would not be surprising if that pattern went away!

I hope this helps!

Bill says

Thanks for your help. I really appreciate.

Might need your help again after I finished the post hoc.

Hope you okay with that. Haha.

Again, THANK YOU.

Sincere,

Bill

Hakim says

Thank Jim, your explanation is very nice to follow, by the way, i have a model e.e. growth=average year of schooling +political stability+average year of schooling*political stability. the stata output gives individual coefficient positive while interactive coefficient negative. unfortunately i been asked by the reviewer to explain why interaction sign is negative any statistical or theoretical explanation please.

Jim Frost says

Hi Hakim, it’s difficult to interpret the coefficients for interaction terms directly. However, I can tell you that there is nothing at all odd about having a negative sign for an interaction term. Interaction terms modify the main effects. Sometimes it adds to them while other times it subtracts. It all depends on the nature of what you’re studying.

I’d suggest creating interaction plots, like I do in this post, because they’re much easier to understand than the interaction coefficients. Look through the plots to see whether they make sense given your understanding of the subject-area. These plots are a graphical representation of the interaction terms. Therefore, if the plots make sense, your model is good to go. If they don’t, then you need to figure out what happened. I think the reviewers will find the plots easier to understand than the coefficient.

I hope this helps!

Hakim says

Thanks Jim for your quick response and comprehensive explanation..

Ting-Chun Chen says

Hi Jim,

May I ask what reference about interaction effect do you suggest to study?

I want to know more about interaction effect in clinical trial.

Thank you.

Sincere,

Ting-Chun

Jim Frost says

Hi Ting-Chun, most any textbook about regression analysis, ANOVA, or linear models in general will explain interaction effects. My preferred source is Applied Linear Regression Models. That’s a huge textbook of 1400 pages, but that’s why I like it! I don’t have a reference specifically for interaction effects, but would recommend something that discusses linear models in all of its aspects.

I hope this helps!

Jim

Ting-Chun Chen says

Thanks for your help and your quick response. I really appreciate.

Again, THANK YOU.

Sincere,

Ting-Chun

demmie says

how does interaction affect my study statistically

Jim Frost says

Hi Demmie, this is the correct post for finding that answer. Read through it and you’ll find the answer you’re looking for. If you have a more specific question, please don’t hesitate to ask!

Anoop says

Hello jim,

What if want to know 1. How does Icecream and hotdog affect enjoyment by itself

2. How does icecream and hotdog affect enjoyment when condiments are included?

In this case, isn’t both the main effect and interaction are equally important for a researcher?

Jim Frost says

Hi Anoop,

Great questions! You can see how ice cream and hot dog affect enjoyment by themselves by looking at the main effects plot. This plot shows the enjoyment level that each food produces is approximately equal.

Yes, understanding main effects like these are important. However, when there are significant interaction effects, you know that the main effects don’t tell the full story. In this case, the main effect for, say hot dog, doesn’t describe the full effect on enjoyment. The interaction term includes hot dog, so you know that some of hot dog’s effect is also in the interaction. If you ignore that, you risk misinterpreting the results. As I point out in the blog, if you go only by main effects, you’ll choose a hot dog . . . with chocolate sauce. You’d pick the chocolate sauce because it’s main effect is larger than mustard’s main effect.

To see how ice cream and hot dogs affect enjoyment when you include the interaction effect, just look at the interaction plot. The four points on that plot show the mean enjoyment for all four possible combinations of hot dog/ice cream with chocolate sauce/mustard. It displays the total effects of main effects plus interaction effects. For example, the interaction plot shows that for hot dogs with mustard, the mean enjoyment is about 90 units (the top-left red dot in the graph). Alternatively, you could enter values into the equation to obtain the same values.

I’d agree that understanding both main effects and interaction effects are important. My point is that when you have significant interaction effects, don’t look at only the main effects because that can lead you astray!

Satu says

Hi Jim!

Thank you very much for your blog site, you explain things well and understandable, thank you for that!!

I would still like to make sure, that I understand correctly what you said before.. I am running a repeated measures ANOVA and I am struggling with interpretations of interactions. So, is it so, that if the interaction effect is not significant, then you should not interpret the multivariate comparisons between groups? I have a model with 5 groups and I am trying to see if there are any differences between them in the change of X variable in two time points. In multivariate tests it shows that the change would be different in one of the groups (also the plot figure shows that), but the overall interaction effect is significant. So what would be the right way to interpret the results? Just say that there were no significant interaction i.e. tha change was similar in all groups, or say that one group was different but the interaction effetc was not (for some reason?).

Thank you already for your answer!

Satu

Michela says

Hi Jim,

This blog post is so useful thank you very much! I have however still fail to interpret one of my statistics output. I carried out a two-way mixed ANOVA analysis and inputted these data:

– between-subject variable is two therapy techniques (MD and RT)

– within-subject variable (Time with 3 levels: pre, mid and post)

– dependant variable was well-being scores.

I ran the analysis and found that for the between-subject variables there were no significant difference between the well-being scores for MD and RT therapies. However when looking at my within-subject variables. The table stated that there was a significant main effect of Time on wellbeing scores but no significant interaction between Time*Therapy on well-being scores.

Am i right in implying that with the significant main effect of time it basically states that over-time, wellbeing scores improved, independent of the therapy techniques. Can i then conclude RT and MD positively improved well-being in general and that not one is better then the other? Or is that wrong? As one of my hypothesis states that MD and RT will have a positive effect on wellbeing scores.

Thank you so much for taking time to read this and helping me !!

Michela

Jim Frost says

Hi Michela,

Your interpretation sounds correct. The data you have suggests that as time passes, well being increases. You don’t have evidence that either treatment works better than the other. Often you’d include a control group (no treatment). It’s possible that there is only a time effect and no treatment effect. A control group would help you establish whether it was the passage of time and/or the treatments.

In other words, right now it could be that both treatments are equally effective. But, it’s also possible that neither treatment is effective and it’s only the passage of time–as the saying goes, time heals all wounds!

Michela says

Hi Jim,

Thanks for your reply. Yes that was one of the problems that was pointed out in my dissertation; was that it did not have a control group that was compared to :/ It was due to the fact that alongside time constraints, the sample size was already so small so it was difficult to get enough people to make 3 separate groups :/ So should am i wrong to accept the hypothesis that both RT and MD has a positive effect on wellbeing levels? Or do i have to reject that as i did not have a control group?

Kind Regards,

Michela

Jim Frost says

Hi Michela,

Unfortunately, it is hard to draw any conclusions about the treatments. It’s possible that both had the same positive effect on well being. However, it’s also possible that neither had an effect and instead it was entirely the passage of time. I definitely understand how it is hard to work with a small sample size!

If other researchers have studied the same treatments, you can evaluate their results. That might lend support towards your notion. But, that’s a tenuous connection without a control group.

Best wishes to you!

Jim

Anoop says

Hi Jim,

I have an interaction significant ( 0.004) for supplement use and physical activity interaction. The nonusers had a Hazard Ratio 0f 0.61(.46-0.80) ( lower risk) where users had a HR 1.40 (.85-2.3) ( high risk). My question is although it looks like a qualitative interaction ( opposite in direction), since the users CI crosses margin of no difference, how do you interpret it? Can we say users had a higher hazard when combined with PA?

Thank you

Jim Frost says

HI Anoop,

I can’t interpret the main effect of supplement use without understanding the interaction effect. Can you share, the hazard ratios for your interaction. In other words, the ratios for the following groups: user/high activity, user/low activity, non-user/high activity, and non-user low-activity.

I don’t know how you recorded activity, so those groups are just an example. Then we can see how to interpret it!

Thanks!

Jim

Anoop says

Hey Jim,

Not sure why ur posting doesn’t show. But it shows in my email.

This is a trial is looking at if physical activity vs Control can reduce physical disability. We are looking at a certain supplement users vs nonusers in the trial. Interaction was significant ( p=.003)

PA C

Users 7.1 6.1 HR 1.40 (.85 – 2.3)

Nonusers 5.4 10.2 HR 0.61(.46 – 0.80)

How do you interpret this result?

Thank you so much. Also you should start a youtube page. We need more people like you in this world 🙂

Jim Frost says

Hi again Anoop,

I checked and I see my comment showing up under yours. I think it might be a browser caching thing that is causing you not to see my reply on the blog post. Refresh might do the trick.

At any rate, this example will also show the importance of several other concerns in statistics–namely understanding the subject area, the variables involved, and statistical vs. practical significance. So, with that said, let’s take a look at your results!

I’m not sure what the dependent variable is, but I’ll assume that higher values represent a greater degree of disability. If that’s not the case, you got really strange results! In the interaction table you provided, I see three group means that are roughly equal and one that stands out. I’m not sure if the differences between any of those three group means (5.4, 6.1, and 7.1) are statistically significant. You can perform a post hoc analysis in ANOVA to check this (I plan to write a blog post about that at some point). Even if they are significant, you have to ask yourself if those differences are practically significant given your knowledge of the subject area and the dependent variable. I don’t know the answer to that.

And, then there is the one group mean (10.2) that is noticeably different than the other three groups. To me, it looks like that subjects in the control group who don’t use the supplement have particularly bad results. And, the other three groups might represent a better outcome. Again, use your subject-area knowledge to determine how meaningful this result is in a practical sense.

If that’s the case, it suggests to me that subjects have better outcomes as long as they use the supplement and/or engage in physical activity. In other words, the worst case is to not do either the activity or use the supplement. If you do one or both of physical activity and supplement usage, you seem to be better off in an approximately equal manner. And, again, I don’t know if the differences between the other three outcomes are statistically significant and practically significant. In other words, those differences could just represent random sample error and/or not be meaningful in a practical sense.

I hope this clarifies things! And, yes, I do plan to start a YouTube channel at some point. I need to finish a book that I’m working on first though!

Take care,

Jim

Anoop says

Thank you for the long post Jim!

I used a cog regression model and the results is hazard ratio’s. The trial is physical activity vs control. And we are doing a subgroup analysis with the supplement.

The above table shows for Users the CI is 1.40 ( .85 to 2.3) and not significant.

For nonusers, the HR shows 0.61 ( 0.46-.80) and significant.

And the interaction between these two is significant. My question is isn’t this an example of qualitative interaction where the direction is opposite for users vs non-users. Like if you plot the forest plot, the lines are on 2 sides of no difference line.?

Jim Frost says

Hi Anoop,

The interesting thing about statistics is that the analyses are use in a wide array of fields. Often, these fields develop their own terminology for things. In this case, I wasn’t familiar with the term qualitative interaction, but it seems to be used in medical research. I’ve read that a qualitative interaction occurs when one treatment is better for some subsets of patients while another treatment is better for another subset of patients. It sounds like a qualitative interaction occurs when there is a change in direction of treatment effects. A non-crossover interaction applies to situations where there is a change in magnitude among subsets but not of direction.

So, I learned something new about how different fields apply different terminology to statistical concepts!

I’m not sure why you’d have only two hazard ratios when you know that the interaction effect is significant? Right there you know that you can’t interpret the main effect for supplement usage without knowing the physical activity level. It seems like you’d need 4 hazard ratios.

As for whether this classifies as a qualitative interaction given the definition above, you’ll first have to determine whether those differences between the three groups I identified before are both statistically significant and practically significant. If the answers to both questions are yes, then it would seem to be a qualatative interaction. However, if either answer is no, then I don’t think it would. And, I’m going by your dependent variable. If you want to answer that using hazard ratios, you’d need four of them as I indicate above. You can’t answer that question with only two ratios.

I hope this helps!

Mei says

Hi Sir. Thank you for this wonderful post as this is very helpful. But I still can’t seem to understand or interpret my interaction plot. My main effects are significant and my interaction effect are also significant but then looking at the regression coefficient (result from SPSS), moderator(IV2) is a negative significant predictor of DV but looking at my interaction plot, they are both positive significant predictor? I’m not sure if you get it because I am also having difficulty explaining the situation because I am just a beginner when it comes to psychological statistics. Thank you in advance, Sir!

Jim Frost says

Hi Mei, I don’t understand your scenario completely. However, there is nothing wrong with having positive coefficients for main effects and negative coefficients for interaction effects. When you have significant interaction effects, then the total effect is the main effect plus interaction effect. In some cases, the interaction effect adds to the main effect but sometimes it subtracts from it. It’s ok either way. I find that assessing the interaction plots is the easiest way to interpret the results when you have significant interaction effects.

Habtamu Tolera says

I do have 20 IV binary or categorical variables and one binary DV. My question is shall I check col linearity first and run bi variate analysis or otherwise. help me please

Habtamu Tolera says

do have 20 IV binary or categorical variables and one binary DV. My question is shall I check col linearity first and run bi variate analysis or otherwise. help me please

Mei says

Thank you for the reply, Sir. I will do my best to interpret the interaction plot. 🙂

Erick Turner says

Jim, like many others here, I love your intuitive explanation.

I thought it would be a good exercise to replicate what you did in your example. (I’m using Stata, and I understand you don’t use that, but the results should still be the same.) Unfortunately, I’m having trouble replicating your results and I don’t know why. Using values of 0 and 1 for each of the IVs, I’m getting significant results for both of them and for the interaction variable, while you got NS results for one of the IVs.

I’ll paste the output below. (Sorry, the formatting got lost.)

. regress enjoyment food_01 condiment_01 food_cond

Source | SS df MS Number of obs = 80

————-+—————————— F( 3, 76) = 212.43

Model | 15974.9475 3 5324.98248 Prob > F = 0.0000

Residual | 1905.09733 76 25.0670701 R-squared = 0.8935

————-+—————————— Adj R-squared = 0.8892

Total | 17880.0448 79 226.329681 Root MSE = 5.0067

——————————————————————————

enjoyment | Coef. Std. Err. t P>|t| [95% Conf. Interval]

————-+—————————————————————-

food_01 | -28.29677 1.583258 -17.87 0.000 -31.45011 -25.14344

condiment_01 | -24.28908 1.583258 -15.34 0.000 -27.44241 -21.13574

food_cond | 56.02826 2.239065 25.02 0.000 51.56877 60.48774

_cons | 89.60569 1.119533 80.04 0.000 87.37594 91.83543

——————————————————————————

Any clue as to what’s I’m doing wrong?

Jim Frost says

Hi Erick, offhand I don’t know what could have happened. As you say, the results

shouldbe the same. I’ll take a closer look and see if I can figure anything out.Michela says

Dear Jim,

I found your blog while trying to find an answer to a reviewer comment to a paper I submitted.

So now I am looking for answers.

One of my hypothesis was on a moderated mediation model.

Considering the moderation I have (measured as continuous variables):

X=job demands

M (moderator)= team identification

Y= workplace bullying

The fact is that when I looked at the results the effect of X on Y is positive; the effect of M on Y is negative but my problem is that I have the interaction term (X*M) that is positive, while I (and especially the reviewer) was expecting a negative effect.

The graph makes sense to me (and partly the reviewer) but he/she is expecting that I am giving him/her some explanation about this positive interaction effect.

I hope you could help me in explaining me why and explain that to the reviewer!

Jim Frost says

Hi Michela,

I seem to have been encountering this question frequently as of late! The answer is that the coefficient for an interaction term really doesn’t mean much by itself. After all, the interaction term is a product of multiple variables in the model and the coefficient. Depending on the combination of variable values and the coefficient, a positive coefficient can actually represent a negative effect (i.e., if the product of the variable values is negative). Additionally, the overall combined effect of the main effect and interaction effect can be negative. It might be that the interaction effect just makes it a little less negative than it would’ve been otherwise. The interaction term is basically an adjustment to the main effects.

Also, realize that there is a bit of arbitrariness in the coefficient sign and value for the interaction effect when you use categorical variables. Linear models need to create indicator variables (0s and 1s) to represent the levels of the categorical variable. Then, the model leaves out the indicator variable for one level to avoid perfect multicollinearity. Suppose you have group A and group B. If the model includes the indicator variable for group A, then 1s represent group A and 0 represents not group A. Or, it could include the indicator variable for group B, then 1s represent group B and 0 represents not group B. If you have only two groups A and B, then the 1s and 0s are entirely flipped depending on which indicator variable the model includes. You can include either indicator variable and the overall results would be the same. However, the coefficient value will change including conceivably the sign! You can try changing which categorical level the software leaves out of the model, which doesn’t change the overall interpretation/significance of the results but can make the interpretation more intuitive.

Finally, it’s really hard to gain much meaning from an interaction coefficient itself for all the reasons above. However, you can see the effect of this term in the interaction plot. As long as the interaction plot makes sense theoretically, I wouldn’t worry much about the specific sign or value of the coefficient. I’d only be worried if the interaction plots didn’t make sense.

I hope this helps!

Erick Turner says

Mystery solved! It wasn’t an issue of the difference in software but rather in the type of model. I had asked Stata to run a regression model and got output that didn’t match up. However, when I ask Stata to run ANOVA (including the interaction term), I got output that matched yours. For other Stata users, the syntax to use is “anova enjoyment food condiment food#condiment”.

Jim Frost says

Hi Erick, thanks so much for the update! I had rerun the analysis to be sure that I hadn’t made a mistake, but it produced the same results in the blog post. I guess this goes this goes to show how crucial it is to know what your statistical software is doing exactly! I still wonder what produced the difference between the regression and the ANOVA model because they both use the same math underneath the hood? In other words, what is different between Stata’s regression and ANOVA model?

Erick Turner says

However, I’m still puzzled as to why I got such different output when I transformed the data to 0/1 dummy variables, created an interaction variable, and then ran regression.

Erick Turner says

I see our replies crossed in cyberspace and are that we are similarly puzzled. I’m assuming you ran an ANOVA routine and that it gives you regression output automatically. Just out of curiosity, what if you were to convert your variables to 0/1 and ask your software to just run regression?

Jim Frost says

I used regression analysis in Minitab and it automatically creates the indicator variables behind the scenes. So, I just told it to fit the model. Depending on which level of each categorical variable that the software leaves out, you’d expect different numeric results (although, they’d tell the same story). You wouldn’t expect differences in what is and is not significant though. I wonder if STATA possibly uses sequential SS for one of it’s analyses? Minitab by default uses adjusted SS. Using Seq. SS could change which variables are significant. I was going to test that but haven’t tried yet.

Dan Mark says

First of all, thank you for the clear explanation. It is hard sometimes to find someone who can explain it in plain English!

Secondly, I still face an issue what to put on my axis in my research. I saw in your explanation that you put the dependent variable, the interaction term and one independent variable on the axis. My question is why you did not put both the independent variables that are in the interaction term, and the interaction term on the axis.

Already many thanks!

Jim Frost says

Hi Dan,

Thanks so much. I work really hard to find the simplest way to explain these concepts yet staying accurate!

Graphing relationships for multiple regression can get tricky. The problem is that the typical two-dimensional graph has only two-axes. So, you have to figure out the best way to arrange these two axes to produce a meaningful graph. This isn’t a problem for simple regression where you have one dependent variable and one independent variable. You can graph those two variables easily on fitted line plots. You have as many variables as you have axes.

Once you get to multiple regression you will have more than two variables (one DV, and at least 2 IVs, and possibly other terms such as interaction terms) than axes. You definitely want to include the dependent variable on an axis (typically the y-axis) because that is the variable you are creating the model for. Then, you can include one IV on the X-axis. At this point, you’ve used up your available axes! The solution is to use separate lines to represent another variable (as shown in the legend). That’s how you get the two IVs into the graph that you need for a two-way interaction. Then you just assess the patterns of those lines.

Instead, if I had put an IV on both X and Y-axes, the graph would not display the value of the DV. The whole point of regression/ANOVA is to explore the relationships with the DV. Consequently, the DV has to be on the graph.

I hope this helps clarify the graphs! The interaction plots I show in this post are the standard form for two-way interactions.

Marlie Greeff says

Dear Jim

Your blog is amazing! Makes everything more understandable for someone with no stats background! Thank you!

Jim Frost says

Hi Marlie, thanks so much for your nice comment. It means a lot to me because that’s my goal for the blog! I’m glad it’s been helpful for you.

Joe R says

Hi Jim,

Thanks for this blog post, really appreciate your efforts to break things down in a simple, intuitive and visual way.

I am a bit confused by the continuous variable example (regarding interactions), specifically your interpretation.

I used your linear model, plotting the coefficients in Excel and manual calculating the Strength for several points of ‘test’ data.

In the article you write – “For high pressures, there is a positive relationship between temperature and strength while for low pressures it is a negative relationship”.

This is what your interaction plot also shows, but plugging actual values in (see below) to the equation – using your coefficients outline above – proves that this is not true.

Test Data

Temperature Pressure Time Temprature*Pressure Predicted Strength Values Difference

95 81 32 7695 3,891

115 81 32 9315 4,258 367

95 63 32 5985 3,477

115 63 32 7245 3,800 323

As you can see, for 2 ‘sets’ of data above, each with a low (63) and high (81) pressure setting, Predicted Strength increases as Temperature Increases.

Am i missing something?

Joe

Jim Frost says

Hi Joe,

I can’t quite tell from your comment how you set up your data. So, I’m unable to figure out how things are not working correctly. However, I can assure you that when you plug the values in the equation, the fitted values behave according to the interpretation (i.e., that the relationship changes direction for low and high values of pressure).

To illustrates how this works, I put together an Excel spreadsheet. In the spreadsheet, there are two tables–one for low pressure and the other for high pressure. Both tables contain the same values for Temperature and Time. However, each table uses a different value for Pressure. The low pressure table uses 63.68 while the high pressure table uses 81.1. I then take these values and plug them into the equation in the Strength column to calculate the fitted values for strength.

As you can see from the numbers in the tables and associated graphs, there is a negative relationship between Strength and Temperature when you use a low pressure but a positive relationship when you use a high pressure.

You can find all of this my spreadsheet with the calculations for the continuous interaction. The two graphs below are also in the spreadsheet.

I hope this helps clarify how this works!

Michela says

Thanks for this, very helpful!

I hope the reviewer will be satisfied as well 🙂

Marieke says

Hi Jim,

I am working on a model which includes an interaction variable. Pro-immigration attitude = educational level + employment (dummy) + educational level * employment . When including the interaction variable, the employment variable becomes insignificant (p=0.83). I was wondering how to interpret this?

Jim Frost says

Hi Marieke,

There are several ways to look at this issue. The first is explaining how the dummy variable goes from being significant to insignificant. When you fit the model without the interaction effect, the model was forced to try to include that effect in with the variables that were included in the model. Apparently, it apportioned enough of the explained variance to the employment variable to make it significant. However, after you added the interaction effect, the model could more appropriately assign the explained variance to that term. Your example illustrates how leaving important terms out of the model (such as the interaction effect) can bias the terms that you do include in the model (the employment dummy variable).

Now, on to the interpretation itself! It’s easiest to picture your results as if you are comparing the constant and slope between two different regression lines–one for the unemployed and the other for the employed. Hypothetically speaking, if the employment dummy variable had been significant, you’d have a case where the constant would tell you the average pro-immigration attitude for someone who is

unemployed(the zero value for the dummy variable) and has no education. You could then add the coefficient for the dummy variable to the constant and you’d know the average pro-immigration attitude for someone who isemployed(the 1 value for the dummy variable) and has no education. In other words, you have sufficient evidence to conclude that there are two different y-intercepts for the two regression lines. However, because your actual p-value for the dummy variable is not significant, you have insufficient evidence to conclude that the y-intercepts for these two lines are different.On the other hand, because the interaction term is significant, you have sufficient evidence to conclude that the

slopeof the line for the employed is different from the slope of the line for the unemployed.I’ve written a post about these ideas, which includes graphs to make it easier to understand. Read my post about comparing regression lines.

I hope this helps!

sarim mohd says

Hi Jim,

Thanks for the wonderful and simple tutorial.

I have a panel dataset that consists of 146 companies for 7 years. My dependent variable is Profit and Independent variables are Board Size, Number of meetings, board dividend decision, CEO duality (it is a dummy variable, 1 if the CEO is also the chairman, 0 otherwise).

Results for non-parametric test indicated that the size of the board is significantly different for firms with CEO duality and for firms with non-duality.

Therefore, after testing for the main effect, I want to test if such differences in the board size of firms with CEO duality and firms with non-duality is getting reflected in the performance. For this purpose I introduced an interaction effect:

Profitability = Board size*Duality + number of meetings + board dividend decision

So, if my interaction is significant (positively), can I interpret it as “the firms with CEO duality are performing better than the firms with non-duality”? Does the coefficient on the interaction is telling, how the coefficient changes when we go from a duality to non-duality?

Also, is interaction is creating any linearity problem for my estimations?

Am I right in doing so?

I hope my question is understandable.

Jim Frost says

Hi Sarim,

Unlike main effects, you typically don’t interpret the coefficients of the interaction effects. Yes, it is possible to plug in values for the variables in the interaction term and then multiply them by the coefficient, and repeat that multiple times, to see what values come out. However, it’s much easier to use interaction plots–as I do in this blog post. Those plots essentially plug in a variety of values into the equation to show you what the interaction effect means. It’s just a whole a lot easier to understand using those plots.

I don’t have enough information to tell you what the interaction means for your case specifically. There’s no way I could say what a positive interaction coefficient represents. But, here is what it means generally. Keep in mind that an interaction effect is basically an “it depends” effect as I describe in this post. In your case:

If the interaction term is significant, you know that the effect of board size on profitability depends on CEO duality. In other words, you can’t know the effect of board size on profitability without also knowing the CEO status. Think of a scatter plot with profitability on the Y-axis and board size on the X-axis. You have two lines on this plot. One line is for Duel CEOs and the other is for non-Dual CEOs. When the interaction term is significant, you have sufficient evidence to conclude that the slopes of those two lines are significantly different. The specific interpretation depends on the exact nature of those two lines–maybe the two slopes are in opposite directions (positive and negative) or maybe one is just steeper than the other in the same direction. That’s what you’ll see on the interaction plot and you can interpret the results accordingly.

If the interaction term is not significant, the effect of board size on profitability does NOT depend on CEO duality. You don’t need to know CEO status in order to understand the predicted effect of board size on profitability. On the graph that I describe, you cannot conclude that the slopes of the two lines are different.

As for correlation among your independent variables, yes, multicollinearity can be a problem when you include interaction terms. If you had an interaction term with two continuous variables, I’d recommend standardizing them, but it might not make much a difference for your interaction between a continuous variable and a binary variable. If you want to read about that, I’ve written about about standardizing the variables in a regression model that can read.

I hope that helps!

Katie says

Hi Jim,

I am interpreting a model with the fixed effects of: diet injection diet*injection group. The P-value for diet*injection is P = 0.09 which would be a tendency. My question is if this is a tendency but not below 0.05 is it appropriate to leave the interaction in the model? When discussing my results is it appropriate to only describe the interaction or the fixed effects of diet and injection?

Jim Frost says

Hi Katie,

This is a tricky to question answer in general because it really depends on the specific context of your study.

First off, I hesitate to call any effect with a p-value of 0.09 a tendency. A p-value around 0.05 really isn’t that strong of evidence by itself. For more information about that aspect, read my post about interpreting p-values. Towards the end of that post I talk about the strength of evidence associated with different p-values.

As for leaving it in the model or taking it out. There are multiple things to consider. You should review the literature, similar studies, etc. and see what results they have found. Let theoretical considerations guide you during the model specification process. If there are any strong theoretical, practical, literature related reasons for either including or excluding the interaction term, take those to heart. Model specification shouldn’t be only by the numbers. I write about this process in my post about specifying the correct model. The part about letting theory guide you is towards the end.

And, one final thought. There is a school of thought that says that if you have doubts about whether you should include or exclude a variable or term, it’s better to include it. If you exclude an important term, you risk introducing bias into your model–which means you might not be able to trust the rest of the results. Adding unnecessary terms can reduce the precision and power of your model, but at least you wouldn’t be biasing the other terms. I’d fit the model with and without the interaction term and see if and how the other terms change.

If the coefficients and/or p-values of the other terms change enough to change the overall interpretation of the model, then you have to really think about which model is better and that probably takes you back to theoretical underpinnings I mention above. If they don’t change noticeably, then whether you include or exclude the interaction term depends on your assessment of the importance of that interaction term specifically in the context of your subject area. And, again that takes you back to theory, other studies, etc but it’s not as broad of question to grapple with compared to the previous case where the rest of the model changes.

That’s all why the correct answer depends on your specific study area, but hopefully that gives you some ideas to consider.

Sir Yiadus says

please assuming that you include an interaction term and all the other variables including the interaction term becomes insignificant though they were significant before introducing the interaction term. Pleases does that mean?

Jim Frost says

Hi, it sounds like the model might be splitting the explanatory power of each term between the main effects and the interaction effects and the result is that there isn’t enough explanatory power for any individual term to be significant by itself. If that’s the case, you might need a larger sample size. Is the overall model significant?

Also, whenever you include interaction terms you’re introducing multicollinearity into the model (correlation among the independent variables). You might gain some power by standardizing your continuous predictors. Read my post about standardizing your variables for more information about how it helps with multicollinearity.

Those would be my top 2 thoughts. You should also review the literature, your theories, etc. and hypothesize the results that you think you should obtain, and then back track from there to look for potential issues. After all, insignificant results might not be a problem if that’s the right answer. And, you should at least consider that possibility.

But, the fact that they’re significant without the interaction term and that goes away when you at the interaction term makes me think there is something more going on.

Jessy Grootveld says

Hi Jim,

I have a question about interpreting output of the MANCOVA.

I myself am conducting research to see whether people’s tech-savviness perceptions have an effect on the effect that assignments to an experimental condition had on peoples brand attitude, purchase intention, and product liking.

In the MANCOVA, my supervisor told me to add the Conditions_All variable as a main effect to the customized model, and Conditions_All*Tech-savviness_perceptions as an interaction effect.

I got the following output:

Conditions_All p = .013

Conditions_All*Tech-savviness perceptions p = .011

How do I interpret these p-values? What does the significance of the first p-value on Conditions_All tell me? And how is that related to the significance of the interaction effect of Conditions_All and Tech-savviness perceptions?

Thank you in advance for your help.

Kind regards,

Jessy Grootveld

Jim Frost says

Hi Jessy,

Your output indicates that both the main effect and interaction effects are statistically significant assuming that you’re using a significance level of 0.05.

The main effect for Conditions_All is the portion of the effect that is independently explained by that variable. If you know the value of Conditions_All, then you know that portion of its effect without needing to know anything else about the other variables in the model.

However, because the interaction effect is also statistically significant and that term includes Conditions_All, you know that the main effect is only a portion of the total effect. Some of Conditions_All’s effect is included in the interaction term. However, to understand this portion of the effect, you need to know the value of the other variable (Tech-Saviness).

To understand the complete effect of Conditions_All, you need to sum the main effects (the portion that is independent from the other variables in the model) and the interaction effect (the portion that depends on the other variable).

I hope this helps!

Sir Yiadus says

Thank you very much. I am grateful

Redina says

Hello Jim,

I am a master student and I have included interaction terms in my thesis. the problem is that the main effects are significant and the interaction term is insignificant. moreover, the interaction term has an opposite sign to what was expected. The problem is that I have a very theoretical part that supports that there actually is an interaction term between my variables. what might be an answer to this?

Thank you in advance for your help,

Redina

Jim Frost says

Hi Redina,

There are a couple of things you should realize about your results.

The first thing is that insignificant results do not necessarily suggest that an effect doesn’t exist in the population. Keep in mind that you fail to reject the null hypothesis, which is very different than accepting the null hypothesis.

For your study, your results aren’t necessarily suggesting that the interaction effect doesn’t exist in the population. Instead, you have insufficient evidence in your sample to conclude the the interaction effect exists in the population. That’s very different even though it might sound the same. Remember that you can’t prove a negative. Consequently, your results don’t necessarily contradict theory.

In other words, the interaction effect may well exist in the population but for some reason your sample and analysis failed to detect it. I can think of four key reasons offhand.

1) The sample size is too small to detect the effect.

2) The sample variability is high enough to reduce the power of the test. If the variability is inherent in the population (rather than say measurement error or some other variability that you can reduce), then increasing the sample size is the easiest way to address this problem.

3) Sampling error by chance produced a fluky sample that doesn’t exhibit this effect. This would be a Type II error where you fail to reject a null hypothesis that is false. It happens.

4) There was some issue in your design that caused the experimental conditions to not match the conditions for which the theory applies.

I think exploring those options, and possibly others, would be helpful, and probably useful discussion for your thesis.

As for the sign being the opposite of what you expected, I have a couple of thoughts. For one thing, you don’t typically interpret the signs and coefficients for interaction terms. Given the way the values in interaction terms are multiplied, the signs and coefficients often are not intuitive to interpret. Instead, use graphs to understand the interaction effects and see if those make theoretical sense.

Additionally, because your interaction term is not significant, you have insufficient evidence to conclude that the coefficient is different from zero. So, you cannot say that the coefficient is negative for the population. In other words, the CI for the interaction effect includes zero along with both positive and negative values. I hope that makes sense. Again the CI is not ruling out the possibility that the coefficient could be positive, which is what you expect. But, you don’t have enough evidence to support concluding that it is either positive or negative

I hope this helps!

Redina says

Thank you a lot! I’m grateful.

Naman says

Hi Jim,

Thank you for that super useful explanation. I am doing my thesis and have a few questions. I would be grateful if you can answer these within 24 hrs as my thesis is due in 2 days.

I am doing a time series cross section fixed effects regression. The theory on the topic suggests an interaction between main independent variable (N- dummy variable) and S(continuous). I have included them in an interaction in one of the models. I also have another interaction between main independent variable (N- dummy variable) and A(continuous variable). I have also included them in an interaction in a separate model.

However, I also need a main model in which these interactions are not there, so that I can get the exact impact of the scheme N, my question is do I include the independent and control variables S and A in that main model ? If yes, won’t the thesis defense committee ask me why do you have N in interaction with S and A in one model each and not in interaction in the main model?

The previous studies would have different analysis with analysing the impact of the interactions and they would have some kind of main model with a few different IV’s without any interactions.

I have to include S and A in the main model because they are the control variables but I don’t know if I should include their interaction terms in that main model as well or not. Won’t that be too much ?

Thanks so much in advance,

Naman

Jim Frost says

Hi Naman,

I think I understand your analysis, and I have a couple of thoughts.

One, I don’t understand why you want to produce separate models that leave out significant effects? When you omit an important effect, you risk biasing your model. Why not present one final model that represents the best possible model that describes all of the significant effects? Separate models with only some of the significant effects in each doesn’t seem like a good idea.

Two, you want to gain the exact impact of N. However, you won’t gain this by removing the interaction terms. In fact, you’d be specifically removing some of N’s effect by doing that.

Both the main effect and interaction effect for N are significant. The main effect is the independent effect of N. That is the portion of N’s effect that does not depend on the other variables in the model. However, because the interaction term is significant, you know that some of N’s effect does depend on the other variables in the model. So, some of N’s effect is independent of those other variables while some of it depends on those other variables. That’s why both the main effect and interaction effect are significant.

By excluding the interaction you are excluding some of N’s effect. Is this important? Well, reread this post and see how trying to interpret the main effects without factoring in the interaction effects can lead you to the wrong conclusions. You might end up putting mustard and your ice cream sundae! When you have significant interaction effects, it’s crucial that you don’t attempt to interpret the main effects in isolation.

Consequently, I would include the interaction effects in your main model. The results might not seem as clean and clear cut, but they are more accurate. They reflect the true nature of the study area.

I hope this helps!

Victoria says

Hi Jim

This page is very helpful. I was wondering about a particular scenario I have with my data. A have a predictor that is positively correlated with an outcome in a bivariate correlation. In a linear regression model including a control variable, the predictor is no longer significant. However, when I explore interactions between the control variable and the predictor in a regression model, both the interaction term and the predictor by itself are significant.

My first question is – can I “trust” the model with the interaction term (model 2), even though in the model without the interaction term (model 1) the predictor was not significant?

I should add that the interaction is theoretically sound (which is why I explored it in the first place).

My second question is – what if the same scenario occurs for predictors that were not even correlated with the outcome in initial exploratory bivariate correlations? I am wondering if I should even be entering these into a model in the first place. However, again, I am looking at these particular predictors because there is a theory that says they should relate to the outcome, and again, the interaction can be explained by the theory.

Thank you very much for your time and sorry if my query is a bit confusing!

Victoria (UK)

Jim Frost says

Hi Victoria,

I’m glad you found this helpful! I think I understand your question. And, it reminds me that I need to write a blog post about omitted variable bias and trying to model an outcome with too few explanatory variables!

I think part of the confusion is the difference between how pairwise correlations and multiple regression model the relationships between variables. Pairwise correlations only assess whether a pair of variables are correlated. It does not account for any other variables. Multiple regression accounts for all variables that you include in the model and holds them constant while evaluating the relationship between each independent variable and the dependent variable. Because multiple regression factors in a lot more information than pairwise correlation, the results can differ.

This issue is particularly problematic when there is a correlation structure amongst the independent variables themselves. When you leave out important variables from the analysis, this correlation structure can either strengthen or weaken the observed relationship between a pair of variables. This is known as omitted variable bias. This can happen in regression analysis when you leave an important variable out of the model. It can also happen in pairwise correlation because that procedure only assesses two variables at a time and can leave out important variables. I think this might explain why you observe different results between pairwise correlation and your multiple regression analysis. Check for a correlation between your control variable and predictor. If there is one, it probably at least partly explains what is going on.

As for whether you can trust the significant interaction term. Given that it fits theory and that it is significant after you add the other variables, I’d lean towards saying that yes you can trust it. However, as is always the case in statistics, there are caveats. One, I of course don’t know what you’re studying it’s hard to give any blanket advice. You should be sure that you have a sufficient number of observations to support your model. With two independent variables and an interaction term, you’d need around 30 observations. If you have notably fewer, you might be overfitting your model, which can produce unreliable results. Also, be sure to check those residual plots because that can help you avoid an underspecified model. And, as discussed earlier, if you omit an important variable, it can bias the results. If you leave out any important variables from your regression model, it can bias the variables and interaction terms in your model.

Regarding the other variables that don’t appear to have any correlation with the outcome variable, you can certainly consider adding them to the model to see what happens. Although, if you’re adding them just to check, it’s a form of data mining that can lead to its own problems of chance correlations. You can also check the pairwise correlations between all of these potential predictors. Again, if they are correlated with predictors, that correlation structure can bias their apparent correlation with the outcome variable. If they are correlated with any of the predictors in the model or with the response, there’s some evidence that you should include them. Ideally you should have a theoretical reason to include them as well.

I’d also recommend reading my post about regression model specification because it covers a lot of these topics.

I hope this helps!

ahmed says

Thank you for astonishing posts.

From understanding to statistics, it can explain the following cases

1) The factors under study are significant and the interaction is not significant?

This is because the main factors have separated effects from each other. That means that factor A has an effect on the character under study ( Ex. Root Yield) separate from the effect of factor B. The meaning of the interaction is not significant, under different levels of factor A that factor B gives the same results. (As a hypothetical example and not true).

Nitrogen fertilizer is used at different rates and potassium fertilizer at other rates. For example, the effect of nitrogen fertilization increases the yield by increasing the concentration of nitrogen and potassium reduces the yield. At each nitrogen concentration, the different levels of potassium reduce the yield and vice versa at each concentration of potassium, the different levels of nitrogen increase the yield

2) The factors under study are insignificant and the interaction is significant?

This means that the factors under study had the different influences for each level from other factor. For example levels of nitrogen and varieties of plants, under each level of nitrogen arrangement of varieties of plants is different. For example, at the high concentration the order of the varieties is ABC,

ACB for medium concentration and CAB for low concentration

What do you think of this interpretation?

With complement

Prof. Dr. Ahmed Ebieda

Jim Frost says

Hi Ahmed, thank you for you kind words about my posts! I really appreciate that!

Yes, your interpretations sound correct to me. I’d just add another case where both the main effects and interaction effects are significant. In that case, some proportion of the effects are separate or independent from the other factor while some proportion depends on the value of the other factor.

Fergal says

Hi Jim,

I have found both your initial piece on interaction effects, and the forum section to be extremely helpful.

Just looking to bounce something off you very quickly please.

I’m completing my MSc dissertation and for my stat analysis, I’ve carried out 2 (Gender: Male & Female) x 2 (Status: Middle & Low) between-between ANOVA.

For all my 5 dependent variables, there have been either main effects of Gender or Status, however there have been no interaction effects.

My 3 main questions are:

1. Although there was no main interaction effect, is it still possible to run a post hoc test (using a Bonferroni correction on Gender*Status) and report on some of the findings if they come up as significant?

Otherwise, all I’ll be reporting on is the main effect(s) (**as below) which I’m conscious may leave my analysis rather shallow…

2. In William J. Vincent’s ‘Statistics in Kinesiology’, he states that if either the main effects or interaction are significant, then further analysis is appropriate. He advocates conducting ‘a simple ANOVA’ across Gender at each of the levels of status and vice-versa.

Firstly, excuse my ignorance, I’m not exactly sure what’s meant by ‘simple ANOVA’ or how to do one, and apparently Jamovi (my stat analysis software), doesn’t have the facility to conduct one as of yet.

The question, can I just go straight into my post hoc tests instead of conducting the simple ANOVA as from what I gather, they’re basically running the same ??

3. I’m planning on reporting the results of my 2 x 2 ANOVA as: mean ± standard deviation, and the p values (significance accepted at p<.05). Is this acceptable/sufficient or is it best practice to include the f value as well?

A rough example of what I'm on about is something like this:

**

Figure …. shows the ……. Standard Scores. There was a main effect of Gender (p=0.009), whereas no Status effect was detected (p=0.108). There was no interaction effect between Gender and Status (p=0.0.669). Females scored significantly better than males in the ….. test (7.62±2.13 vs. 6.66±2.21, p=0.009), whereas the Low and Middle group scores were statistically unchanged at 6.83±2.07 to 7.44±2.31 (p=0.108) respectively. These standard scores equate to a 4.8% difference between females and males, and 3.05% difference between Middle and Low group participants.

(graphs will be included)

Does this seem sufficient or should/can I dig further into the Gender main effect?

The post hoc tests (Gender*Status) are what will enable me to do that, if it's a thing you deem them acceptable to conduct.

Once again, this whole page has been of huge help to me. Thanks very much in advance for your time and apologies if the query is rather confusing.

Regards,

Fergal.

ahmed says

Hi Jim

Thanks a lot for your fast replay and your explanations.

But, I have the simple question?

Can I write recommendation for all three cases (Factors=significant & Interaction Not , Factors=significant & Interaction Not and Factors=Not significant & Interaction Significant) or some of them it can’t recommend?.

Please, explain by examples for each case (This is one example from my results )

My example:

2 Factors (3 levels of nitrogen & 3 levels of Potassium)

Increasing Nitrogen and Potassium increase the root yield

( also in case one factor increase root yield and other decrease it)

For each case what is the recommendation?

Because some friends said: if interaction is not significant, there is no recommendation.

I think this is not true?

Please, what is your opinion?

Jim Frost says

Hi Ahmed, yes, when there is an interaction, you can make a recommendation. You just need the additional information. I explain this process in this post. For example, in the food and condiment sample, to make a recommendation to maximize your enjoyment, you can make a condiment recommendation, but you need to know what the food is first. That’s how interactions work. Apply that approach to your example. It helps if you graph the interactions as I do.

Aidan says

Hi Jim,

Firstly, I can’t believe I have only found this site today – it’s awesome, thanks!

I’m trying to interpret some results and having read your blog, can you please tell me if i’m correct in my understanding regarding main effects and interactions?

I’ve performed an 2-way mixed-model ANOVA (intervention x time) to assess the effects of three interventions on the primary outcomes (weight-loss).

There was a significant main effect for weight-loss but when I perform post-hoc analysis, there is no significant result.

My understanding of this is that, over time, weight-loss was significant as an entire group however, no one intervention was better than the other?

Any input from anyone would be welcomed!

Thanks

Jim Frost says

Hi Aidan, I’m glad you’ve found my website to be helpful!

Which main effect was significant? Was the interaction effect significant?

Sometimes the hypothesis test results can differ from the post-hoc analysis results. Usually that happens when the results are borderline significant. However, I can’t suggest an interpretation without knowing the other details.

Tom says

Hi Jim: thank you for this post. I am working on a couple of hypotheses to test both direct and interaction effects…results are a bit more nuanced than examples above, so I would be interested in your advice…I am using PLA-SEM…direct effect of X on Y (Beta = 0.19) is not significant (t statistic greater than 1.96). Nevertheless I still have to run second hypothesis to determine if a third variable moderates relationship between X and Y. When adding the interaction term, R2 did increase on Y. however, interaction effect was also not significant. it seems I fail to reject null hypothesis. This being said I am shaky on how I would interpret this, for the results were not as anticipated…it is exploratory research, if that matters…thoughts? Tom

Tom says

Rather t statistic less than 1.96…my mistake

Jim Frost says

Hi Tom,

It sounds like neither your main effects nor interaction effect are significant? Is that the case?

If so, you need to remember that failing to reject the null does not mean that the effect doesn’t exist in the population. Instead, your sample provided insufficient evidence to conclude that the effect exists in the population. There’s a difference. It’s possible that the effects do exist but for a number of possible reasons, your hypothesis test failed to detect it.

These potential reasons include random chance causing the sample to underestimate the effect, the effect size being too small to detect, too much variability in the sample that obscures the effect, or a sample size that is too small. If the effect exists but you fail to reject the null hypothesis, it is known in statistics as a Type II error. For more information about this error, read my post Types of Errors in Hypothesis Testing.

I hope this helps!

Tom says

Thank you, Jim…this is very helpful…of course I was hoping for a better outcome…but I am guessing the predictor variable is not quite nuanced enough to produce a noticeable effect…thanks again..and I will definitely check the source material you provided…tom