Causation indicates that an event affects an outcome. Do fatty diets cause heart problems? If you study for a test, does it cause you to get a higher score?
In statistics, causation is a bit tricky. As you’ve no doubt heard, correlation doesn’t necessarily imply causation. An association or correlation between variables simply indicates that the values vary together. It does not necessarily suggest that changes in one variable cause changes in the other variable. Proving causality can be difficult.
If correlation does not prove causation, what statistical test do you use to assess causality? That’s a trick question because no statistical analysis can make that determination. In this post, learn about why you want to determine causation and how to do that.
Relationships and Correlation vs. Causation
The expression is, “correlation does not imply causation.” Consequently, you might think that it applies to things like Pearson’s correlation coefficient. And, it does apply to that statistic. However, we’re really talking about relationships between variables in a broader context. Pearson’s is for two continuous variables. However, a relationship can involve different types of variables such as categorical variables, counts, binary data, and so on.
For example, in a medical experiment, you might have a categorical variable that defines which treatment group subjects belong to—control group, placebo group, and several different treatment groups. If the health outcome is a continuous variable, you can assess the differences between group means. If the means differ by group, then you can say that mean health outcomes depend on the treatment group. There’s a correlation, or relationship, between the type of treatment and health outcome. Or, maybe we have the treatment groups and the outcome is binary, say infected and not infected. In that case, we’d compare group proportions of the infected/not infected between groups to determine whether treatment correlates with infection rates.
Through this post, I’ll refer to correlation and relationships in this broader sense—not just literal correlation coefficients. But relationships between variables, such as differences between group means and proportions, regression coefficients, associations between pairs of categorical variables, and so on.
Why Determining Causality Is Important
What is the big deal in the difference between correlation and causation? For example, if you observe that as one variable increases, the other variable also tends to increase—isn’t that good enough? After all, you’ve quantified the relationship and learned something about how they behave together.
If you’re only predicting events, not trying to understand why they happen, and do not want to alter the outcomes, correlation can be perfectly fine. For example, ice cream sales correlate with shark attacks. If you just need to predict the number of shark attacks, ice creams sales might be a good thing to measure even though it’s not causing the shark attacks.
However, if you want to reduce the number of attacks, you’ll need to find something that genuinely causes a change in the attacks. As far as I know, sharks don’t like ice cream!
There are many occasions where you want to affect the outcome. For example, you might want to do the following:
- Improve health by using medicine, exercising, or flu vaccinations.
- Reducing the risk of adverse outcomes, such as procedures for reducing manufacturing defects.
- Improving outcomes, such as studying for a test.
For intentional changes in one variable to affect the outcome variable, there must be a causal relationship between the variables. After all, if studying does not cause an increase in test scores, there’s no point for studying. If the medicine doesn’t cause an improvement in your health or ward off disease, there’s no reason to take it.
Before you can state that some course of action will improve your outcomes, you must be sure that a causal relationship exists between your variables.
Confounding Variables and Their Role in Causation
How does it come to be that variables are correlated but do not have a causal relationship? A common reason is a confounding variable that creates a spurious correlation. A confounding variable correlates with both of your variables of interest. It’s possible that the confounding variable might be the real causal factor! Let’s go through the ice cream and shark attack example.
In this example, the number of people at the beach is a confounding variable. A confounding variable correlates with both variables of interest—ice cream and shark attacks in our example.
In the diagram below, imagine that as the number of people increases, ice cream sales also tend to increase. In turn, more people at the beach cause shark attacks to increase. The correlation structure creates an apparent, or spurious, correlation between ice cream sales and shark attacks, but it isn’t causation.
Confounders are common reasons for associations between variables that are not causally connected.
Related post: Confounding Variables Can Bias Your Results
Causation and Hypothesis Tests
Before moving on to determining whether a relationship is causal, let’s take a moment to reflect on why statistically significant hypothesis test results do not signify causation.
Hypothesis tests are inferential procedures. They allow you to use relatively small samples to draw conclusions about entire populations. For the topic of causation, we need to understand what statistical significance means.
When you see a relationship in sample data, whether it is a correlation coefficient, a difference between group means, or a regression coefficient, hypothesis tests help you determine whether your sample provides sufficient evidence to conclude that the relationship exists in the population. You can see it in your sample, but you need to know whether it exists in the population. It’s possible that random sampling error (i.e., luck of the draw) produced the “relationship” in your sample.
Statistical significance indicates that you have sufficient evidence to conclude that the relationship you observe in the sample also exists in the population.
That’s it. It doesn’t address causality at all.
Related post: Understanding P-values and Statistical Significance
Hill’s Criteria of Causation
Determining whether a causal relationship exists requires far more in-depth subject area knowledge and contextual information than you can include in a hypothesis test. In 1965, Austin Hill, a medical statistician, tackled this question in a paper* that’s become the standard. While he introduced it in the context of epidemiological research, you can apply the ideas to other fields.
Hill describes nine criteria to help establish causal connections. The goal is to satisfy as many criteria possible. No single criterion is sufficient. However, it’s often impossible to meet all the criteria. These criteria are an exercise in critical thought. They show you how to think about determining causation and highlight essential qualities to consider.
Studies can take steps to increase the strength of their case for a causal relationship, which statisticians call internal validity. To learn more about this, read my post about internal and external validity.
Strength
A strong, statistically significant relationship is more likely to be causal. The idea is that causal relationships are likely to produce statistical significance. If you have significant results, at the very least you have reason to believe that the relationship in your sample also exists in the population—which is a good thing. After all, if the relationship only appears in your sample, you don’t have anything meaningful! Correlation still does not imply causation, but a statistically significant relationship is a good starting point.
However, there are many more criteria to satisfy! There’s a critical caveat for this criterion as well. Confounding variables can mask a correlation that actually exists. They can also create the appearance of correlation where causation doesn’t exist, as shown with the ice cream and shark attack example. A strong relationship is simply a hint.
Consistency and causation
When there is a real, causal connection, the result should be repeatable. Other experimenters in other locations should be able to produce the same results. It’s not one and done. Replication builds up confidence that the relationship is causal. Preferably, the replication efforts use other methods, researchers, and locations.
In my post with five tips for using p-values without being misled, I emphasize the need for replication.
Specificity
It’s easier to determine that a relationship is causal if you can rule out other explanations. I write about ruling out other explanations in my posts about randomized experiments and observational studies. In a more general sense, it’s essential to study the literature, consider other plausible hypotheses, and, hopefully, be able to rule them out or otherwise control for them. You need to be sure that what you’re studying is causing the observed change rather than something else of which you’re unaware.
It’s important to note that you don’t need to prove that your variable of interest is the only factor that affects the outcome. For example, smoking causes lung cancer, but it’s not the only thing that causes it. However, you do need to perform experiments that account for other relevant factors and be able to attribute some causation to your variable of interest specifically.
For example, in regression analysis, you control for other factors by including them in the model.
Temporality and causation
Causes should precede effects. Ensure that what you consider to be the cause occurs before the effect. Sometimes it can be challenging to determine which way causality runs. Hill uses the following example. It’s possible that a particular diet leads to an abdominal disease. However, it’s also possible that the disease leads to specific dietary habits.
The Granger Causality Test assesses potential causality by determining whether earlier values in one time series predicts later values in another time series. Analysts say that time series A Granger-causes time series B when significant statistical tests indicate that values in series A predict future values of series B.
Despite being called a “causality test,” it really is only a test of prediction. After all, the increase of Christmas card sales Granger-causes Christmas!
Temporality is just one aspect of causality!
Biological Gradient
Hill was a biologist, hence the focus on biological questions. He suggests that for a genuinely causal relationship, there should be a dose-response type of relationship. If a little bit of exposure causes a little bit of change, a larger exposure should cause more change. Hill uses cigarette smoking and lung cancer as an example—greater amounts of smoking are linked to a greater risk of lung cancer. You can apply the same type of thinking in other fields. Does more studying lead to even higher scores?
However, be aware that the relationship might not remain linear. As the dose increases beyond a threshold, the response can taper off. You can check for this by modeling curvature in regression analysis.
Plausibility
If you can find a plausible mechanism that explains the causal nature of the relationship, it supports the notion of a causal relationship. For example, biologists understand how antibiotics inhibit microbes on a biological level. However, Hill points out that you have to be careful because there are limits to scientific knowledge at any given moment. A causal mechanism might not be known at the time of the study even if one exists. Consequently, Hill says, “we should not demand” that a study meets this requirement.
Coherence and causation
The probability that a relationship is causal is higher when it is consistent with related causal relationships that are generally known and accepted as facts. If your results outright disagree with accepted facts, it’s more likely to be correlation. Assess causality in the broader context of related theory and knowledge.
Experiments and causation
Randomized experiments are the best way to identify causal relationships. Experimenters control the treatment (or factors involved), randomly assign the subjects, and help manage other sources of variation. Hill calls satisfying this criterion the strongest support for causation. However, randomized experiments are not always possible as I write about in my post about observational studies. Learn more about Experimental Design: Definition, Types and Examples.
Related posts: Randomized Experiments and Observational Studies
Analogy
If there is an accepted, causal relationship that is similar to a relationship in your research, it supports causation for the current study. Hill writes, “With the effects of thalidomide and rubella before us we would surely be ready to accept slighter but similar evidence with another drug or another viral disease in pregnancy.”
Determining whether a correlation also represents causation requires much deliberation. Properly designing experiments and using statistical procedures can help you make that determination. But there are many other factors to consider.
Use your critical thinking and subject-area expertise to think about the big picture. If there is a causal relationship, you’d expect to see consistent results that have been replicated, other causes have been ruled out, the results fit with established theory and other findings, there is a plausible mechanism, and the cause precedes the effect.
Laurence D. Robinson says
I believe there is a logical flaw in the movie “Good Will Hunting”. Specifically, in the scene where psychologist Dr. Sean Maguire (Robin Williams) tells Will (Matt Damon) about the first time he met his wife, there seems to be an implied assumption that if Sean had gone to “the game” (Game 6 of the World Series in 1975), instead of staying at the bar where he had just met his future wife, then the very famous home run hit by Carlton Fisk would still have occurred. I contend that if Sean had gone to the game, the game would have played out completely differently, and the famous home run which actually occurred would not have occurred – that’s not to say that some other famous home run could not have occurred. It seems to be clear that neither characters Sean nor Will understand this – and I contend these two supposedly brilliant people would have known better! It is certainly clear that neither Matt Damon nor Ben Affleck (the writers) understand this. What do you think?
Owen Davis says
Hi Jim Thanks for the great site and content. Being new to statistics I am finding it daunting to understand all of these concepts. I have read most of the articles in the basics section and whilst I am gaining some insights I feel like I need to take a step back in order to move forward. Could you recommend some resources for a rank beginner such as my self? Maybe some books that you read when you where starting out that where useful. I am really keen to jump in and start doing some statistics but I am wondering if it is even possible for someone like me to do so. To clearly define my question where is the best place to start?? I realize this doesn’t really relate to the above article but hopefully this question might be useful to others as well. Thanks.
Jim Frost says
Hi Owen,
I’m glad that my website has been helpful! I do understand your desire to get the pick picture specifically for starting out. In just about a week, September 3rd to be exact, I’m launching a new ebook that does just that. The book is titled Introduction to Statistics: An Intuitive Guide for Analyzing Data and Unlocking Discoveries. My goal is to provide the big picture about the field of statistics. It covers the basics of data analysis up to larger issues such as using experiments and data to make discoveries.
To be sure that you receive the latest about this book, please subscribe to my email list using the form in the right column of every page in my website.
Thanks!
Anjan kumar Sinha says
Jim , I am new to stats and find ur blog very useful. Yet , I am facing an issue of very low R square values , as low as 1 percent, 3 percent… do we still hold these values valid? Any references on research while accepting such low values .
request ur valuable inputs please.
Jim Frost says
Hi Anjan,
Low R-squared can be a problem. It depends on several other factors. Are any independent variables significant? Is the F-test of overall significance significant?
I have posts about this topic and answers those questions. Please read: Low R-squared values and F-test of overall significance.
If you have further questions, please post them in the comments section of the relevant post. It helps keep the questions and answers organized for other readers. Thanks!
Mario says
Hello Jim
Thank you so much for your website. It has helped me tremendously with my stats, particularly regression. I have a question concerning correlation testing. I have a continuous dependent variable, quality of life, and 3 independent variables, which are categorical (education = 4 levels, marital status = 3 levels, stress = 3 levels). How can I test for a relationship among the dependent and independent variables? Thank you Jim.
Jim Frost says
Hi Mario,
You can use either ANOVA or OLS regression to assess the relationship between categorical IVs to a continuous DV.
I write about this in my ebook, Regression Analysis: An Intuitive Guide. I recommend you get that ebook to learn about how it works with categorical IVs. I discuss that in detail in the ebook. Unfortunately, I don’t have a blog post to point you towards.
Best of luck with your analysis!
jjames says
great post, Jim. Thanks!
khouiled says
Useful post
Just Bayle says
Very nice and interesting post. And very educational. Many thanks for your efforts!
Jim Frost says
Thank you very much! I appreciate the kind words!