What is an Odds Ratio?
An odds ratio (OR) calculates the relationship between a variable and the likelihood of an event occurring. A common interpretation for odds ratios is identifying risk factors by assessing the relationship between exposure to a risk factor and a medical outcome. For example, is there an association between exposure to a chemical and a disease?
To calculate an odds ratio, you must have a binary outcome. And you’ll need either a grouping variable or a continuous variable that you want to relate to your event of interest. Then, use an OR to assess the relationship between your variable and the likelihood that an event occurs.
When you have a grouping variable, an odds ratio interpretation answers the question, is an event more or less likely to occur in one condition or another? It calculates the odds of an outcome occurring in one context relative to a baseline or control condition. For example, your grouping variable can be a subject’s exposure to a risk factor—yes or no—to see how that relates to disease status.
With a continuous variable, calculating an odds ratio can determine whether the odds of an event occurring change as the continuous variable changes.
In this post, learn about ORs, including how to use the odds ratio formula to calculate them, different ways to arrange them for several types of studies, and how to interpret odds ratios and their confidence intervals and p-values.
What Are Odds in Statistics?
Before you can calculate and interpret an odds ratio, you must know what the odds of an event represents. In common usage, people tend to use odds and probability interchangeably. However, in statistics, it has an exact definition. It is a specific type of probability.
Odds relate to a binary outcome where the outcome either occurs or does not occur. For example, study subjects were either infected or not infected. A person graduates or does not graduate from college. You win a game, or you lose.
Odds definition: The probability of the event occurring divided by the probability of the event not occurring.
As you can see from the formula, it tells you how likely an event is to occur relative to it not happening. For example, imagine playing a die-rolling game where a six is very good. Your odds of rolling a six are the following:
Your odds of rolling a six is 0.20 or 1 in 5. Because the number of die outcomes is a constant six, you can replace 1/6 and 5/6 in the formula with a 1 and 5 to derive the same answer (1/5 = 0.20). I’ll use that format in the examples throughout this post.
Imagine you’re playing a game. If your odds of winning are 2 (or 2 wins to 1 loss), that indicates you are twice as likely to win as to lose. On the other hand, if your odds of winning are 0.5 (or 1 win to 2 losses), you’re half as likely to win as to lose.
As you can see, the odds of an event occurring is a ratio itself. Therefore, an OR is a ratio of two ratios.
Related posts: Probability Fundamentals and Understanding Ratios
Odds Ratios Interpretation for Two Conditions
Odds ratios with groups quantify the strength of the relationship between two conditions. They indicate how likely an outcome is to occur in one context relative to another.
The odds ratio formula below shows how to calculate it for conditions A and B.
The denominator (condition B) in the odds ratio formula is the baseline or control group. Consequently, the OR tells you how much more or less likely the numerator events (condition A) are likely to occur relative to the denominator events.
If you have a treatment and control group, the treatment will be in the numerator while the control group is in the denominator of the formula. This calculation of an odds ratio indicates how your treatment group fares compared to the controls.
For example, a study assesses infections in a treatment and control group. Infections are the events for the binary outcome. By calculating the following OR, analysts can determine how likely infections are in the treatment group relative to the control group.
The interpretation of this odds ratio is that when the treatment is effective, the odds of infections in the treatment group will be lower than the control group, producing an OR of less than one.
Let’s move on to more interpretation details!
Related post: Control Groups in Experiments
How to Interpret Odds Ratios
Due to the odds ratio formula, the value of one becomes critical during interpretation because it indicates both conditions have equal odds. Consequently, analysts always compare their OR results to one when interpreting the results. As the OR moves away from one in either direction, the association between the condition and outcome becomes stronger.
Odds Ratio = 1: The ratio equals one when the numerator and denominator are equal. This equivalence occurs when the odds of the event occurring in one condition equal the odds of it happening in the other condition. There is no association between condition and event occurrence.
Odds Ratio > 1: The numerator is greater than the denominator. Hence, the event’s odds are higher for the group/condition in the numerator. This is often a risk factor.
Odds Ratio < 1: The numerator is less than the denominator. Hence, the probability of the outcome occurring is lower for the group/condition in the numerator. This can be a protective factor.
In the hypothetical infection experiment, the researchers hope that the OR is less than one because that indicates the treatment group has lower odds of becoming infected than the control group.
Caution: ORs are a type of correlation and do not necessarily represent causal relationships!
Odds ratios are similar to relative risks and hazard ratios, but they are different statistics. Learn more about Relative Risks and Hazard Ratios.
For a broader look at various types of risk, read my post Risk Calculations: Relative vs. Absolute & Risk Reduction.
How to Calculate an Odds Ratio
The equation below expands the earlier odds ratio formula for calculating an OR with two conditions (A and B). Again, it’s the ratio of two odds. Hence, the numerator and denominator are also ratios.
In the infection example above, we assessed the relationship between treatment and the odds of being infected. Our two conditions were the treatment (condition A) and the control group (B). On the right-hand side, we’d enter the numbers of infections (events) and non-infections (non-events) from our sample for both groups.
Example Odds Ratio Calculations for Two Groups
Let’s use data from an actual study to calculate an odds ratio. The North Carolina Division of Public Health needed to identify risk factors associated with an E. Coli breakout. We’ll calculate the OR for one risk factor, but they assessed multiple possibilities in their study.
In this study, the event is an exposure to a risk factor for E. coli infection. Our two conditions are those who are sick versus not sick. It’s an example of a case-control study, which analysts use to identify candidate risk factors using odds ratios.
Got Sick (Cases) | Did Not Get Sick (Controls) | |
Visited Petting Zoo | 36 | 64 |
Did Not Visit | 9 | 123 |
By plugging these numbers into the odds ratio formula, we can calculate the odds ratio to assess the relationship between visiting a petting zoo and becoming infected by E. coli. In case-control studies, all infected cases go in the numerator while the uninfected controls go in the denominator. The next section explains why.
The interpretation of the odds ratio is that those who became infected with E. coli (cases) were 7.7 times more likely to have visited the petting zoo than those without symptoms (controls). That’s a big red flag for the petting zoo being the E. coli source!
This study also assessed whether awareness of the disease risk from contacting livestock was a protective factor. For this factor, the study calculated the odds ratio:
For interpreting the odds ratio, the value of 0.1 indicates that those who became infected with E. coli were only one-tenth as likely to be aware of the disease risk from contacting livestock as those who were not infected. Knowledge is power! Presumably, those who were aware of the risk took precautions!
Related post: Case-Control Studies
Different Arrangements
You might have noticed differences between the treatment and control group experiment and the case-control study’s OR arrangements. Different types of studies require specific types of ORs.
For the experiment, we put the treatment group in the numerator and the control group in the denominator of the odds ratio formula. Both odds in the ratio relate to infections and divide the number of infections by the number of uninfected. This arrangement allows you to calculate the odds ratio of disease in the treatment group compared to the control group.
However, in case-control studies, you put only the cases (sick) in the numerator and the controls (healthy) in the denominator of the odds ratio formula. Both odds in the ratio relate to exposure rather than illness. To calculate the odds ratio, you take the number of exposures and divide it by the non-exposures for both the case and control groups. Case-control studies use this arrangement because they start with the disease outcome as the basis for sample selection, and then the researchers need to identify risk factors.
Odds Ratios for Continuous Variables
When you perform binary logistic regression using the logit transformation, you can obtain ORs for continuous variables. Those odds ratio formulas and calculations are more complex and go beyond the scope of this post. However, I will show you how to interpret odds ratios for continuous variables.
Unlike the groups in the previous examples, a continuous variable can increase or decrease in value. Fortunately, the interpretation of an odds ratio for a continuous variable is similar and still centers around the value of one. When an OR is:
- Greater than 1: As the continuous variable increases, the event is more likely to occur.
- Less than 1: As the variable increases, the event is less likely to occur.
- Equals 1: As the variable increases, the likelihood of the event does not change.
Interpreting Odds Ratios for Continuous Variables
In another post, I performed binary logistic regression and obtained ORs for two continuous independent variables. Let’s interpret those odds ratios!
In that post, I assess whether measures of conservativeness and establishmentarianism predict membership in the Freedom Caucus within the U.S. House of Representatives in 2014.
Here’s how you interpret these scores:
- Conservativeness: Higher scores represent more conservative viewpoints.
- Establishmentarianism: Higher scores represent viewpoints that favor the political establishment.
For this post, I’ll focus on the ORs for this binary logistic model. For more details, read the full post: Statistical Analysis of the Republican Establishment Split.
The odds ratio interpretation for conservativeness indicates that for every 0.1 increase (the unit of change) in the conservativeness score, a House member is ~2.7 times as likely to belong to the Freedom Caucus.
Conversely, the odds ratio interpretation for establishmentness indicates that for every 0.1 increase in the establishmentarianism score, a House member is only ~73% as likely to belong to the Freedom Caucus.
Taking both results together, House members who are more conservative and less favorable towards the establishment make up the Freedom Caucus.
Interpreting Confidence Intervals and P-values for Odds Ratios
So far, we’ve only looked at the point estimates for odds ratios. Those are the sample estimates that are a single value. However, sample estimates always have a margin of error thanks to sampling error. Confidence intervals and hypothesis tests (p-values) can account for that margin of error when you’re using samples to draw conclusions about populations (i.e., inferential statistics). Sample statistics are always wrong to some extent!
As with any hypothesis test, there is a null and alternative hypothesis. In the context of interpreting odds ratios, the value of one represents no effect. Hence, these hypotheses focus on that value.
- Null Hypothesis: The OR equals 1 (no relationship).
- Alternative Hypothesis: The OR does not equal 1 (relationship exists).
If the p-value for your odds ratio is less than your significance level (e.g., 0.05), reject the null hypothesis. The interpretation is that difference between your sample’s odds ratio and one is statistically significant. Your data provide sufficient evidence to conclude that a relationship between the variable and the event’s probability exists in the population.
Alternatively, you can use the confidence interval to interpret an odds ratio and draw the same conclusions as using the p-value. If your CI excludes 1, your results are significant. However, if your CI includes 1, you can’t rule out 1 as a likely value. Consequently, your results are not statistically significant.
The confidence intervals for the two Freedom Caucus odds ratios both exclude 1. Hence, they are statistically significant.
Additionally, the width of the confidence interval indicates the precision of the estimate. Narrower intervals represent more precise estimates.
Related posts: Descriptive vs. Inferential Statistics and Hypothesis Testing Overview
Dean Morbeck says
Thanks Jim!
Dean Morbeck says
Thanks for the prompt reply, Jim! I’ll use a published study to illustrate what I see. Here’s the reference: Rose RD et al. The BlastGen study: a randomized controlled trial of blastocyst media supplemented with granulocyte-macrophage colony-stimulating factor. Reprod Biomed Online. 2020 May;40(5):645-652. doi: 10.1016/j.rbmo.2020.01.011.
Table 2 provides livebirth rates (n=50/group) for two treatments. The odds of LB are .34 and .22, with the former the control and the latter the treatment. OR calculated as a fraction of these two odds is 0.65. Per the results, unadjusted OR is 0.55 and adjusted is 0.59.
I believe unadjusted (or crude) is applying only 1 IV, whereas adjusted applies additional IVs (3 in this case).
The above case as cited is quite common. I’ve considered your suggestions and nothing turns up. What am I missing?
Jim Frost says
Hi Dean,
I haven’t reviewed the study but I wouldn’t be surprised if one or more of the possibilities I mentioned were responsible. You wouldn’t necessarily be able to evaluate all of them using the typical journal article.
I can’t determine the cause from the summary you provide. You’d need the raw data and dig into the nuts and bolts of the analysis procedure to really know what is going on. However, I’m not really that surprised. The same type of thing happens in ANOVA when you have raw data means and fitted means that differ. These differences you’re seeing seem like the same type of thing. Typically, you prefer the fitted values from the model assuming you have a good model
Unlike calculating the simple OR directly from the data, the crude and adjusted ORs are a product of the model. So, it’s an entirely different calculation method for starters. But also if the model with the single IV isn’t adequately fitting the data, you might find discrepancies. Indeed, that’s probably why they need several other IVs for the adjustments. So, my top guess would be the model isn’t fitting the data that well with just one IV.
But the other possibilities I mention could play a role as well.
Dean Morbeck says
Hi Jim-
The examples you give are helpful. I have one question: when I perform a crude OR with only my variable of interest without any confounders, an OR is returned that is not the same as calculating a simple odds 1 divided by odds 2 as in your examples. I’ve looked at other studies and see the same. Can you explain why this occurs?
Thanks!
Jim Frost says
Hi Dean,
Background information for other readers. Typically, a crude OR is from a logistic regression model that has only 1 independent variable (IV). The formulas I show in this post are simple ORs. They are calculated using different methods.
For your scenario without confounders (1 IV), you’d theoretically expect crude and simple ORs to be the same.
However, they can be different for various reasons. The following possibilities assumes that your model contains only one IV. If it contains more than one IV, whether you call it a confounder or not, that would likely explain the differences right there. When you have more than one IV in a logistic regression model, it produces adjusted ORs.
First and foremost, your model might be misspecified and not adequately fitting the data. Perhaps using just one IV is insufficient? Check those assumptions!
There could be other issues such as a zero count in the table, which can cause software to treat it differently. There could also be missing data that is being handled differently between the two different methods. Is there a chance that the model has a continuous variable that is being represented dichotomously in the table?
It’s impossible for me to know what is causing the difference. But those are some of the top possibilities.
I hope that helps!
Sanghamitra Satpathi says
Hi, I have a doubt. I am clear about finding out the odds ratio between the exposed and control group when they are expressed in actual numbers. Can we find out the odds ratio between two groups when they are expressed in mean and standard deviation?
Jim Frost says
Hi Sanghamitra,
If you can convert your observations to a probability (p), you can then use the odds formula: p / (1 – p).
Now, if you’re talking about a mean and standard deviation, those are summaries for an entire dataset–a distribution of values. Odds apply to a specific event. So, if you want to know the odds of obtaining less than a particular value (say 115), in a normal distribution that has a mean and standard deviation of 100 and 15, respectively. You could then use the mean and standard deviation to find the odds for obtaining a value less than 115. Note that we have to specify a range (such as less than or greater than) because we’re now working in a continuous distribution where any specific value (e.g., 115) has a probability of zero.
In that case, using a z-table, I know that the probability of obtaining less than 115 in a normal distribution (100, 15) is 0.841. Consequently, the odds of obtaining less that 115 in that distribution is 0.841 / 1 – 0.841 = 5.289.
You could then repeat that for the other group in your study to calculate its odds. Then use the ratio of those two odds.
So, yes, you can use the mean and standard deviation to help find the odds ratio, but they supply the parameters for the distribution as I’ve shown above.
Nebyu M says
Terrific explanations. But
The interpretations above are a bit flawed. As the odds is a unit by itself, it should be interpreted as odds just like relative risk is treated as a unit. If we are comparing the odds of ‘getting sick’ among visitors and non-visitors of the Zoo, it should be interpreted as – The odds of being infected is 7.7 times higher among people who visited the Zoo compared to the odds of those who did not visit.
While interpreting odds ratios below 1, we should try to make sense while being in the bounds of statistics. Rather than taking the result as is i.e. 0.1, subtracting it from one could do the trick. Thus, 1 – 0.1=0.9 then we can transform 0.9 to a percentage by multiplying it with 100 which will give 90%. 0.9 or 90% tells us the amount or the percentage of odds respectively that the result is lower compared to the control (In the above 7.7 was higher). Our interpretation takes a similar shape – The odds of disease risk awareness among people who are sick is 90% lower compared to the odds of people who are healthy. (NB: in this example the outcome variable is awareness not health status as in the earlier one.) It is not right to treat awareness as the result of getting sick.
Jim Frost says
Hi Nebyu,
Thanks for the comment. Unfortunately, I think you’re confusing odds and odds ratios. They are related but different concepts. Also, you seemed to have a misunderstanding about the nature of the two example studies, particularly the second example about livestock disease awareness.
Your interpretation of the odds ratios relating to the petting zoo example is correct and it matches my own wording. However, you describe it as the odds rather than odds ratio.
For the other example, your interpretation is incorrect. Transforming it as you’ve done is incorrect. The correct interpretation is to say that livestock disease awareness is a protective factor because those with awareness were only one-tenth as likely to become infected as those without the awareness. Changing it to 90% is incorrect.
Additionally, this type of analysis identifies risk and protective factors. The outcome is health status (contrary to what you say) while livestock disease awareness is the factor that is being assessed as potentially protective. You can think of that odds ratio as being similar to a regression coefficient. It is associated with a specific factor in your analysis (disease awareness) and indicates how the factor relates to the outcome (infection status).
Both examples were included in a real NIH study that assessed an outbreak. For more information about this type of study, read my post about Case-Control studies.