What is a Case Control Study?
A case control study is a retrospective, observational study that compares two existing groups. Researchers form these groups based on the existence of a condition in the case group and the lack of that condition in the control group. They evaluate the differences in the histories between these two groups looking for factors that might cause a disease.
Medical and epidemiological researchers frequently use case-control studies to identify potential risk factors for diseases and other medical conditions. In this context, scientists create the case and control groups based on the presence and absence of the condition, respectively. The researchers strive to create otherwise similar groups outside the condition and the risk factors they’re explicitly evaluating.
For example, medical researchers study disease X and use a case-control study design to identify risk factors. They create two groups using available medical records from hospitals. Individuals with disease X are in the case group, while those without it are in the control group. If the case group has more exposure to a risk factor than the control group, that exposure is a potential cause for disease X. However, case-control studies establish only correlation and not causation. Be aware of spurious correlations!
Case-control studies are observational studies because researchers do not control the risk factors—they only observe them. They are retrospective studies because the scientists create the case and control groups after the outcomes for the subjects (e.g., disease vs. no disease) are known.
This post explains the benefits and limitations of case-control studies, controlling confounders, and analyzing and interpreting the results. I close with an example case control study showing how to calculate and interpret the results.
Learn more about Experimental Design: Definition, Types, and Examples.
Benefits of a Case Control Study
A case control study is a relatively quick and simple design. They frequently use existing patient data, and the experimenters form the groups after the outcomes are known. Researchers do not conduct an experiment. Instead, they look for differences between the case and control groups that are potential risk factors for the condition. Small groups and individual facilities can conduct case-control studies, unlike other more intensive types of experiments.
Case-control studies are perfect for evaluating outbreaks and rare conditions. Researchers simply need to let a sufficient number of known cases accumulate in an established database. The alternative would be to select a large random sample and hope that the condition afflicts it eventually.
A case control study can provide rapid results during outbreaks where the researchers need quick answers. They are ideal for the preliminary investigation phase, where scientists screen potential risk factors. As such, they can point the way for more thorough, time-consuming, and expensive studies. They are especially beneficial when the current state of science knows little about the connection between risk factors and the medical condition. And when you need to identify potential risk factors quickly!
Cohort studies are another type of observational study that are similar to case-control studies, but there are some important differences. To learn more, read my post about Cohort Studies.
Limitations of a Case Control Study
Because case-control studies are observational, they cannot establish causality and provide lower quality evidence than other experimental designs, such as randomized controlled trials. Additionally, as you’ll see in the next section, this type of study is susceptible to confounding variables unless experimenters correctly match traits between the two groups.
A case-control study typically depends on health records. If the necessary data exist in sources available to the researchers, all is good. However, the investigation becomes more complicated if the data are not readily available.
Case-control studies can incorporate biases from the underlying data sources. For example, researchers frequently obtain patient data from hospital records. The population of hospital patients is likely to differ from the general population. Even the control patients are in the hospital for some reason—they likely have serious health problems. Consequently, the subjects in case-control studies are likely to differ from the general population, which reduces the generalizability of the results.
A case-control study cannot estimate incidence or prevalence rates for the disease. The data from these studies do not allow you to calculate the probability of a new person contracting the condition in a given period nor how common it is in the population. This limitation occurs because case-control studies do not use a representative sample.
Case-control studies cannot determine the time between exposure and onset of the medical condition. In fact, case-control studies cannot reliably assess each subject’s exposure to risk factors over time. Longitudinal studies, such as prospective cohort studies, can better make those types of assessment.
Related post: Causation versus Correlation in Statistics
Use Matching to Control Confounders
Because case-control studies are observational studies, they are particularly vulnerable to confounding variables and spurious correlations. A confounder correlates with both the risk factor and the outcome variable. Because observational studies don’t use random assignment to equalize confounders between the case and control groups, they can become unbalanced and affect the results.
Unfortunately, confounders can be the actual cause of the medical condition rather than the risk factor that the researchers identify. If a case-control study does not account for confounding variables, it can bias the results and make them untrustworthy.
Case-control studies typically use trait matching to control confounders. This technique involves selecting study participants for the case and control groups with similar characteristics, which helps equalize the groups for potential confounders. Equalizing confounders limits their impact on the results.
Ultimately, the goal is to create case and control groups that have equal risks for developing the condition/disease outside the risk factors the researchers are explicitly assessing. Matching facilitates valid comparisons between the two groups because the controls are similar to cases. The researchers use subject-area knowledge to identify characteristics that are critical to match.
Note that you cannot assess matching variables as potential risk factors. You’ve intentionally equalized them across the case and control groups and, consequently, they do not correlate with the condition. Hence, do not use the risk factors you want to evaluate as trait matching variables.
Statistical Analysis of a Case Control Study
Researchers frequently include two controls for each case to increase statistical power for a case-control study. Adding even more controls per case provides few statistical benefits, so studies usually do not use more than a 2:1 control to case ratio.
For statistical results, case-control studies typically produce an odds ratio for each potential risk factor. The equation below shows how to calculate an odds ratio for a case-control study.
Notice how this ratio takes the exposure odds in the case group and divides it by the exposure odds in the control group. Consequently, it quantifies how much higher the odds of exposure are among cases than the controls.
In general, odds ratios greater than one flag potential risk factors because they indicate that exposure was higher in the case group than in the control group. Furthermore, higher ratios signify stronger associations between exposure and the medical condition.
An odds ratio of one indicates that exposure was the same in the case and control groups. Nothing to see here!
Ratios less than one might identify protective factors.
Let’s bring this to life with an example!
Example Odds Ratio in a Case-Control Study
The Kent County Health Department in Michigan conducted a case-control study in 2005 for a company lunch that produced an outbreak of vomiting and diarrhea. Out of multiple lunch ingredients, researchers found the following exposure rates for lettuce consumption.
|Exposure to Lettuce||53||33|
By plugging these numbers into the equation, we can calculate the odds ratio for lettuce in this case-control study.
The study determined that the odds ratio for lettuce is 11.2.
This ratio indicates that those with symptoms were 11.2 times more likely to have eaten lettuce than those without symptoms. These results raise a big red flag for contaminated lettuce being the culprit!