When it comes to hypothesis testing, statistics help you avoid opinions about when an effect is large and how many samples you need to collect. Opinions about these things can be way off—even among those who regularly perform experiments and collect data! This can lead you to draw the incorrect conclusions. Always perform the correct hypothesis tests so you understand the strength of your evidence.
In my house, we’re all big fans of the Mythbusters. This fun show tests whether different myths and urban legends could have really happened. Along the way, they perform experiments in a controlled and repeatable manner and collect data. This involves lots of planning, custom equipment, reducing potential sources of variation, and a large number of explosions. All good stuff. However, they’re not always the best when it comes to statistical analysis and hypothesis testing.
Don’t get me wrong. I think the Mythbusters are great because they make science fun and place a high value on using data to make decisions. It’s a great way to bring science to life for kids! Unfortunately, they occasionally draw incorrect conclusions from their data because they don’t use statistics.
One of the things I love about statistics is that hypothesis testing helps you objectively evaluate the evidence. You set the significance level before the study, analyze the data, and then make a decision based on the p-value. You don’t have to worry about a subjective assessment about whether an effect appears to be large enough while simultaneously trying to factor in the sample size and sample variability!
In this post, I’ll detail their investigation into the myth that yawns are contagious and show how they would have benefited from using statistical analysis to perform hypothesis testing and to estimate a good sample size.
Are Yawns Contagious?
I think we’ve all heard that yawns are contagious. If you see someone yawn, it sure seems like you’re more likely to yawn too. The Mythbusters decided they were going to test this myth. They recruited 50 people under the pretense that they were looking for people to appear on the show.
The recruiter spoke to each subject one-on-one and intentionally either yawned or did not yawn during the session. After listening to the recruiter, the subjects were left by themselves in a small room for a fixed amount of time. The Mythbusters secretly observed the subjects and recorded whether the subject yawned or not.
The Mythbusters recorded these data:
- Recruiter did not yawn (control group): 4 out 16 (25%) of the subjects yawned.
- Recruiter did yawn (treatment group): 10 out of 34 (29%) of the subjects yawned.
When it came time to determine the results of their experiment, Jamie Hyneman said that the data confirmed the myth. Yawns are contagious. He stated that the difference of 4% is significant thanks to the large sample size (n=50). Unfortunately, this conclusion was based on intuition rather than a statistical test. I’m going analyze this more meticulously to see if hypothesis testing agrees with Jamie!
Using the Two Proportions Hypothesis Test to Assess Yawns
The data contain proportions for two groups, so we’ll use the two proportions hypothesis test. Specifically, we’ll use a one-tailed test to determine whether the treatment group has a proportion that is greater than the control group. You can do this in your own preferred statistical software without a dataset. Just use the summary statistics for the two groups—10/34 and 4/16.
The two proportions hypothesis test produces the following results for the yawn data:
There are two P values and we’ll use the one for Fisher’s exact test. This test is for small samples and the note indicates that our sample is small. The P value of 0.513 is well above any standard significance level.
We fail to reject the null hypothesis. The sample does not contain sufficient evidence to conclude that the subjects who were exposed to yawns tended to yawn more frequently themselves. Additionally, the output indicates that the sample size is small! When you’re working with categorical data, you often need larger sample sizes than is typical for continuous data.
Unfortunately, Jamie was wrong about both the statistical significance and having a large sample size!
Assess Statistical Power to Estimate the Correct Sample Size
When the Mythbusters conclude that a myth isn’t true, they often find the extreme conditions that can force the myth to occur. Usually, this involves an explosion. I’d love to include an explosion in this blog post but I don’t want to damage your device!
Instead, I’ll produce a figurative bang by estimating how many subjects the Mythbusters should have recruited. I’ll perform a power and sample size calculation to figure what sample size is required so that a hypothesis test has a respectable chance of detecting an effect if one actually exists. Hint: The answer is bound to prompt Adam Savage’s to wave his arms around in his characteristic manner!
In many fields, a good benchmark power value to aim for is 0.8. At this level, a hypothesis test has an 80% probability of detecting a difference if it exists.
The study estimated an effect of 0.04, which was not statistically significant. For the power analysis, I’m going find the sample size that yields a statistical power of 0.8 for a difference of 0.10 (rather than 0.04) for a two proportions hypothesis test. After all, if the difference really is 0.04, that’s so tiny that it’s not practically significant in the real world even if a study found it to be statistically significant. I’ll calculate power using a one-tailed test.
If the true population difference between the groups is 10 percentage points (25% vs 35%), the Mythbusters need to recruit 329 subjects per group (648 total)! Well, they were only off by 600 subjects!
The sample size is so large because the effect size is still fairly small and because hypothesis tests for categorical data require larger samples than tests for continuous data.
Related post: Estimating a Good Sample Size for Your Study Using Power Analysis
The Mythbusters Need Statistics and Hypothesis Testing!
Using the two proportions hypothesis test and power calculation, we learned a couple of things:
- The sample data do not support the hypothesis that yawns are contagious.
- The sample size was too small to provide adequate statistical power.
I have a lot of research experience working in labs at a university. Based on this experience, I don’t see the Mythbusters experiment as a failure at all. Instead, I see it as a pilot study. For an experiment, you often need to conduct a small pilot study to work the kinks out and develop the initial estimates. It helps you avoid costly mistakes by not going straight to a large-scale experiment where things might not go as planned.
That’s how the scientific method works. You state the hypothesis, design and set up the controlled conditions for an experiment, and then evaluate the data with a statistical hypothesis test. You assess those results and, if necessary, make adjusts to improve the next the study.
If this study occurred in the research arena, the researchers would be asking themselves whether it’s worth conducting additional research on the subject. Are the potential benefits worth the costs? In this case, the benefits of learning whether yawns are contagious are small in comparison to the costs associated with a study of 650 subjects. It’s probably not going to happen!
Even though the results of this study are not statistically significant, we still learned something important!
We are still big fans of the Mythbusters! This study just reconfirms that science, research, and statistical analysis are tricky. Sometimes your intuition can lead you astray. Statistics can help keep you grounded by providing an objective assessment of your data with hypothesis testing. After all, the Mythbusters went to a lot of effort to collect their data. They ought to know what the data are really telling them!
Do you have any stories of surprising results or tricky data?
Be sure to read my other Mythbusters related post where I use hypothesis tests to bust myths about the battle of the sexes!
What is problematic about this study is also that the subjects might have suppressed their yawning since they were being recruited, and generally the effect should be observed immediately (maybe not minutes after the first yawning). Another thing is also the genuineness of the yawn, to my understanding the recruiters faked the yawning. We should also be defining the contagiousness well.