Repeated measures designs, also known as a within-subjects designs, can seem like oddball experiments. When you think of a typical experiment, you probably picture an experimental design that uses mutually exclusive, independent groups. These experiments have a control group and treatment groups that have clear divisions between them. Each subject is in only one of these groups.

These rules for experiments seem crucial, but repeated measures designs regularly violate them! For example, a subject is often in all the experimental groups. Far from causing problems, repeated measures designs can yield significant benefits.

In this post, I’ll explain how repeated measures designs work along with their benefits and drawbacks. Additionally, I’ll work through a repeated measures ANOVA example to show you how to analyze this type of design and interpret the results.

## Drawbacks of Independent Groups Designs

To understand the benefits of repeated measures designs, let’s first look at the independent groups design to highlight a problem. Suppose you’re conducting an experiment on drugs that might improve memory. In a typical independent groups design, each subject is in one experimental group. They’re either in the control group or one of the treatment groups. After the experiment, you score them on a memory test and then compare the group means.

In this design, you obtain only one score from each subject. You don’t know whether a subject scores higher or lower on the test because of an inherently better or worse memory. Some portion of the observed scores is based on the memory traits of the subjects rather than because of the drug. This example illustrates how people introduce an uncontrollable factor into the study.

Imagine that a person in the control group scores high while someone else in a treatment group scores low, not due to the treatment, but due to differing baseline memory capabilities. This “fuzziness” makes it harder to assess differences between the groups.

If only there were some way to know whether subjects tend to measure high or low. We need some way of incorporating each person’s variability into the model. Oh wait, that’s what we’re talking about—repeated measures designs!

## How Repeated Measures Designs Work

As the name implies, you need to measure each subject multiple times in a repeated measures design. Shocking! However, there’s more to it. The subjects usually experience all of the experimental conditions, which allow them to serve as experimental blocks or as their own control. What does that mean? Let me break this down one piece at a time.

The effects of the controllable factors in an experiment are what you really want to learn. However, as we saw in our example above, there can also be uncontrolled sources of variation that make it harder to learn about those things that we can control.

Experimental blocks explain some of the uncontrolled variability in an experiment. While you can’t control the blocks, you can include them in the model to reduce the amount of unexplained variability. By accounting for more of the uncontrolled variability, you can learn more about the controllable variables that are the entire point of your experiment.

Let’s go back to our drug test. We saw how subjects are an uncontrolled factor that makes it harder to assess the effects of the drugs. However, if we took multiple measurements from each person, we gain more information about their personal outcome measures under a variety of conditions. We might see that some subjects tend to score high or low on the memory tests. Then, we can compare their scores for each treatment group to their general baseline.

And, that’s how repeated measures designs work. You understand each person better so that you can place their personal reaction to each experimental condition into their particular context.

## Benefits of Repeated Measures Designs

In statistical terms, we say that experimental blocks reduce the variance and bias of the model’s error by controlling for factors that cause variability between subjects. The error term contains only the variability within-subjects and *not* the variability between subjects. The result is that the error term tends to be smaller, which produces the following benefits:

**Greater statistical power**: As we saw, by controlling for differences between subjects, this type of design can be much more powerful. If an effect exists, your statistical test is more likely to detect it.

**Requires a smaller number of subjects:** Because of the increased power, you can recruit fewer people and still have a good probability of detecting an effect that truly exists. If you’d need 20 people in each group for a design with independent groups, you might only need a total of 20 for repeated measures.

**Faster and less expensive: **The time and costs associated with administering repeated measures designs can be much lower because there are fewer people to recruit, train, and compensate.

**Time-related effects: **As we saw, an independent groups design collects only one measurement from each person. By collecting data from multiple points in time for each subject, repeated measures designs can assess effects over time. This tracking is particularly useful when there are potential time effects, such as learning or fatigue.

## Managing the Challenges of Repeated Measures Designs

Repeated measures designs have some great benefits, but there are a few drawbacks that you should consider. The largest downside is the problem of order effects, which can happen when you expose subjects to multiple treatments. These effects are associated with the treatment order but are not caused by the treatment.

Order effects can impede the ability of the model to estimate the effects correctly. For example, in a wine taste test, subjects might give a dry wine a lower score if they sample it after a sweet wine.

You can use different strategies to minimize this problem. These approaches include randomizing or reversing the treatment order and providing sufficient time between treatments. Don’t forget, using an independent groups design is an efficient way to eliminate order effects.

## Crossover Repeated Measures Designs

I’ve diagramed a crossover repeated measures design, which is a very common type of experiment. Study volunteers are assigned randomly to one of the two groups. Everyone in the study receives all of the treatments, but the order is reversed for the second group to reduce the problems of order effects. In the diagram, there are two treatments, but the experimenter can add more treatment groups.

Studies from a diverse array of subject areas use crossover designs. These areas include weight loss plans, marketing campaigns, and educational programs among many others. Even our theoretical memory pill study can use it.

Repeated measures designs come in many flavors, and it’s impossible to cover them all here. You need to look at your study area and research goals to determine which type of design best meets your requirements. Weigh the benefits and challenges of repeated measures designs to decide whether you can use one for your study.

## Repeated Measures ANOVA Example

Let’s imagine that we used a repeated measures design to study our hypothetical memory drug. For our study, we recruited five people, and we tested four memory drugs. Everyone in the study tried all four drugs and took a memory test after each one. We obtain the data below. You can also download the CSV file for the Repeated_measures_data.

In the dataset, you can see that each subject has an ID number so we can associate each person with all of their scores. We also know which drug they took for each score. Together, this allows the model to develop a baseline for each subject and then compare the drug specific scores to that baseline.

How do we fit this model? In your preferred statistical software package, you need to fit an ANOVA model like this:

- Score is the response variable.
- Subject and Drug are the factors,
- Subject should be a random factor.

Subject is a random factor because we randomly selected the subjects from the population and we want them to represent the entire population. If we were to include Subject as a fixed factor, the results would apply only to these five people and would not be generalizable to the larger population.

Drug is a fixed factor because we picked these drugs intentionally and we want to estimate the effects of these four drugs particularly.

## Repeated Measures ANOVA Results

After we fit the repeated measures ANOVA model, we obtain the following results.

The P-value for Drug is 0.000. This low P-value indicates that all four group means are not equal. Because the model includes Subjects, we know that the Drug effect and its P-value accounts for the variability between subjects.

Below is the main effects plot for Drug, which displays the fitted mean for each drug.

Clearly, drug 4 is the best. Tukey’s multiple comparisons (not shown) indicate that Drug 4 – Drug 3 and Drug 4 – Drug 2 are statistically significant.

Have you used a repeated measures design for your study?

Jaime says

Hi Jim,

I am getting conflicting advice.

I ran a: pre-test, intervention, post-test study. Where I had 4 groups (3 experimental and one control). I tested hamstring strength. In my repeated measures ANOVA I had an effect of time but NO interaction effect. I have been told due to no interaction effect I do NOT run a post-hoc analysis. Is this correct as someone else has told me the complete opposite (I only run a post-hoc analysis when I do not have an interaction effect)?

Jim Frost says

Hi Jaime,

The correct action to do depends on the specifics of your study, which might be why you’re getting conflicting advice!

As a general statistical principle, it’s perfectly fine to perform post-hoc tests regardless of whether the interaction effect is significant or not. The only time that it makes no sense to perform a post hoc test is when no terms in your model are statistically significant. Although, even in that case, post hoc tests can sometimes detect statistical significance–but that’s another story. But, in a nutshell, you can perform post hoc tests whether or not your interaction term is significant.

However, I suspect that the real question is whether it makes sense the pre-test post-test nature of your study. You have measurements before and after the intervention. If the intervention is effective, you’d expect the differences to show up after the intervention but not before. Consequently, that is an interaction effect because it depends on the time of measurement. Read my blog post about interaction effects to see how these are “it depends” effects. So, if your interaction effect is not significant, it might not make sense to analyze your data further.

If the main effect for the treatment group variable is significant but not the interaction effect, it’s a bit difficult because it says that the treatment groups cause a difference between group means even in the pre-test measurement! That might represent only the differences between the subjects within those groups–it’s hard to say. You really want that interaction term to be significant!

If only the time effect is significant and nothing else, it’s probably not worth further investigation.

One thing I can say definitively is that the person who said that you can only perform a post-hoc analysis when the interaction is not significant is wrong! As a general principle, it’s OK to perform post-hoc analyses when an interaction term is significant. For your study, you particularly want a significant interaction term!