## What is Base Rate Fallacy?

Base rate fallacy is a cognitive bias that occurs when a person misjudges an outcome by giving too much weight to case-specific details and overlooks crucial probability information that applies to all cases in a population. That vital probability is the outcome’s base rate of occurrence in the population.

In essence, a person misinterprets an outcome because they get tripped up by specific details and overlook the overall frequency of occurrence. People tend to make predictions using similarity rather than statistical likelihoods. Hence, this bias is also known as base rate neglect.

But why does this happen? It’s a tale of our brain’s preference for narrative over numbers, for compelling details over cold, hard statistics. Let’s see an example of that in action!

Imagine you’re at a local park and notice someone using a telescope and making detailed notes in a notebook. They’re dressed casually but have a badge with astronomical symbols. Based on these specific details, you conclude that this person is an astrophysicist or a professional astronomer. However, this assumption could be an instance of the base rate fallacy.

While the equipment and the badge suggest a rare and specific career in astronomy, the base rate of encountering a professional astronomer in a general population is extremely low. This person could be an amateur astronomy enthusiast or a teacher preparing for a class. By focusing on the specific observable details and ignoring the base rate, your conclusion might be far from the truth.

In this blog post, we’ll explore the concept of the base rate fallacy, why it occurs, and how this cognitive bias influences our everyday decision-making. Finally, I include a worked base rate fallacy example that poses and solves the probabilities for medical testing.

Learn about other Types of Cognitive Bias.

## Understanding Why the Base Rate Fallacy Occurs

Why do we fall prey to this fallacy? The answer lies in our cognitive wiring. Our brains are wired to love stories and specific details; they engage us emotionally and are easier to recall than abstract probabilities. While often helpful, this preference can lead us to ignore the statistical realities that should guide our decisions.

Let’s define “base rate.” Imagine it as the backdrop of probability against which we should weigh any new information. For instance, if only 1 in 1000 people have a rare disease, that’s the base rate. Simple, right? Yet, when faced with specific, often vivid details, our brains tend to put this crucial statistical context in the backseat.

The base rate fallacy illustrates a common cognitive challenge: integrating specific situational details with broader, more generalized data. People typically rely on general base-rate information when it’s the only data available but struggle when both specific and general information are present.

Our minds prefer situational details over the statistical probabilities! In this sense, the base rate fallacy is similar to the conjunction fallacy.

Learn more in Probability Definition and Fundamentals.

## Effects of Base Rate Fallacy

This cognitive quirk can affect our judgments in various contexts, from medical diagnosis to financial forecasting, legal decision-making, and everyday life choices. The base rate fallacy can lead us astray, causing misjudgments and inefficiencies, as it skews our perception of risk and likelihood.

In our social interactions, for instance, the base rate fallacy might cause us to overlook how someone has behaved in similar past scenarios (the base rate), leading us to form judgments based on immediate, observable traits. This approach can oversimplify complex human behavior.

In the financial setting, an investor might make decisions based on recent events rather than long-term trends (the base rate).

In the legal system, the base rate fallacy can significantly influence courtroom decisions. Judges and juries, for example, may overly focus on the compelling details of a specific case or testimony while neglecting broader statistical data, such as crime rates or the likelihood of certain events occurring. This imbalance can lead to misinterpretation of the evidence and potentially unjust verdicts.

Perhaps you’ve heard that the majority of COVID infected people were vaccinated against the disease? If that leads you to think the vaccines are ineffective, welcome to another example of the base rate fallacy!

In all these examples, you need to correctly balance case-specific and general information to obtain a more accurate understanding of complex scenarios. This approach helps you avoid the base rate fallacy trap.

In the following worked example, you’ll see how healthcare professionals can fall into this trap by overly focusing on individual test results while overlooking broader statistical data. And I’ll show you how to find the correct answer!

## Base Rate Fallacy Worked Example

Consider a base rate fallacy that is common in medical testing. What does a positive test result really mean?

Suppose a patient gets a positive test result for a serious condition. Can you successfully merge the case specific and base rate information to find the correct answer?

Here are the details:

**Case specific**: Positive test result using a test that is 95% accurate. 95% of the time, the test produces a positive result when a person has the disease.**Base Rate**: 1 in 1,000 people have this condition.

Given this information, what is the probability that the patient with the positive test result has the condition?

The most common answer is 95%, which sounds logical given the test’s high accuracy.

However, the correct answer is about 2%. If that’s a big surprise, it’s because you fell victim to the base rate fallacy!

Specifically, you failed to incorporate the base rate of the disease’s occurrence in the population, which has a probability of 0.001.

Let’s solve this problem! Here is a simple Excel worksheet that calculates the answer and allows you to change the parameters: Base Rate Fallacy calculations.

Now, I’ll walk you through the calculations incorporating the base rate.

## Solving the Base Rate Fallacy

The trick to avoiding the base rate fallacy is to correctly evaluate the case-specific information within the context of the population’s overall probability.

For the medical test example, the test is very accurate for specific cases (95%), but we need to interpret that in a context where the medical condition is rare—a base rate of only 0.001. When the overall likelihood is low, we need to worry about the role of false positives.

To answer the base rate fallacy question, consider the following:

We know that in the population, 0.001 have the condition, so 1 – .0001 = 0.999 don’t have it.

Also, the test has a 0.95 true positive rate, meaning it has a 1 – 0.95 = 0.05 false positive rate.

Now, let’s apply that information to a population of 1 million to find the numbers of true positives and false positives.

### Cases & True Positives

Let’s take our population of 1,000,000 and multiply it by the condition’s base rate to find the number of cases: 1,000,000 * 0.001 = 1,000.

Now, we’ll take our 1000 cases and multiply it by the test’s accuracy rate to find the number of true positives: 1000 * 0.95 = 950.

We’d expect to obtain 950 true positives in our population.

### Non-Cases & False Positives

Now, we’ll find the number of non-cases using its base rate: 1,000,000 * 0.999 = 999,000.

999,000 don’t have the condition, but when they take the test, there’s a false positive probability. Let’s use that to calculate the number of false positives: 999,000 * 0.05 = 49,950.

### Putting It All Together

Most of the 1,000 people with the condition obtained true positive test results (950). However, for those without the condition, a whopping 49,950 get false positive results! While the false positive rate is low (0.05), so many people don’t have the condition that the test produces far more false positives than true positives.

Wow! There are far more false positives than true positives, explaining why the base rate fallacy gives us an extremely biased idea!

Types of Positive Results |
Number |

True |
950 |

False |
49,950 |

Total |
50,900 |

Hence, only 950 out of 50,900 total positives are true: 950 / 50,900 = 1.87%.

In the example, our calculations show that the probability of actually having the disease given a positive test result is about 1.87%. The low base rate (1 in 1000) dramatically impacts the likelihood of having the disease with a positive result, even when the test is 95% accurate.

The base rate fallacy causes most people to totally misjudge the meaning of a positive result for this test due to the rarity of the disease.

Several notes. If you change the population’s size, the answer remains the same. There is a more complex solution using Bayesian probabilities, but the results are the same. This approach better emphasizes how the base rate affects the ratio of true to false positives.

## Avoid the Base Rate Fallacy!

In conclusion, the base rate fallacy is a pervasive cognitive bias that significantly impacts our decision-making. As we’ve seen, this fallacy occurs when we give too much weight to specific details of a scenario while overlooking the general probability or base rate of an event happening in a population.

Understanding the base rate fallacy is crucial for making more informed, rational decisions. Remember, the next time you face a decision or form an opinion, pause and think about the base rate. Try to combine the specific and general information correctly.

## Reference

Kahneman & Tversky, On the psychology of prediction. Psychological review, 1973, 80, 237-257.

rick says

Thanks for the prompt reply Jim, I think my confusion came from thinking about the impact of a false negative result, because that would mean an individual with a positive result, was not counted. If it was a very high false negative % rate, it feels intuitively it would impact the results overall. I think I will have to play with your spreadsheet to see this visually so I can get my head around it! Cheers, Rick

Jim Frost says

Hi Rick, you’re very welcome!

It is a bit confusing and it does seem like you’d need to know about negative results, but you don’t. A person who receives a false negative result gets a negative result (although they don’t know it’s false at first). And the example asks, what is the meaning of a positive result? So, the information about negative results are entirely unrelated!

There are other questions that would use that information. Obviously, if we asked, what is the meaning of a negative result? Or, what percentage of people get the correct results positive or negative? The former requires only the information about true and false negatives. The latter requires the complete information about both the positive and negative results.

So, there are definitely scenarios where you need both types of information. But the example scenario isn’t one of them. That example is from the point of view of someone who gets a positive test result and wants to know what that means given true and false positives and the population prevalence. Because they got the positive result, they don’t need to worry about negative results! In other words, a true or false negative doesn’t apply to someone who got a positive result.

rick says

Hi Jim, in your example and in the spreadsheet, you assume a 95% true positive rate, and therefore a 5% false positive rate, but it seems to me a critical piece of data that is missing is the rate of true & false negatives? I realise it would complicate the calculations but I think its critical to calculating the actual probability that a positive result is true?

Jim Frost says

Hi Rick,

That’s a great question and, perhaps surprisingly, the answer is that when interpreting a positive test result, the true and false negative rates arenâ€™t necessary. Keep in mind that the example focuses only on interpreted a positive result.

The true positive rate (sensitivity) and the false positive rate are specifically relevant to scenarios where the test result is positive. The true positive rate tells us how often the test correctly identifies someone with the condition, while the false positive rate tells us how often the test incorrectly identifies someone without the condition as having it.

When using Bayesâ€™ Theorem to update our beliefs based on a positive test result, we only use the likelihood of testing positiveâ€”whether it’s truly positive (true positive rate) or falsely positive (false positive rate) along with the population prevalence rate. This enables us to calculate the probability that a person actually has the disease given a positive test result.

The true negative rate (specificity) and false negative rate are critical when analyzing negative test resultsâ€”they tell us about the test’s ability to correctly identify those without the disease and those with the disease who are mistakenly identified as disease-free, respectively. However, these rates do not influence the calculation or interpretation of a positive test result.

Put more simply, you don’t need to know the negative results information when you’re assessing only positive results. The true and false positive rates provide all the necessary information.

Now, if the example assessed the interpretation of a negative result, then we’d need the true and false negative rates.

I hope this clarifies why we don’t need the negative results information for this example!

Martry Shudak says

I was sensing I had it misapplied! Thank you, Jim!!

Jim Frost says

You’re very welcome!

Marty Shudak says

This is one of the best I’ve read of your blogs, Jim! Reason being it is so close to home. I took what you had here and did the same with an issue about which my son is highly concerned about. He’s 22 and weighs 130 lbs but has high cholesterol (inherited from his mom and grandfather) and is considering statins (at least by the time he’s 40).

Now here’s my line of thought. I found 805,000 heart attacks in the US this year (assuming all are over 40). This year we have about 155,000,000 adults over 40 are in the US. From Internet searches, cholesterol level as a predictor is about 75% accurate (75% of heart attacks occur in people with high levels). I multiplied this by a 95% cholesterol test accuracy rate to get an overall accuracy rate of 71.25%.

Following your procedure above, I arrive at a 1.28% probability of having a heart attack given a high cholesterol test result. Am I understanding the process and is this accurate?

Jim Frost says

Hi Marty

I’m so glad you appreciated this blog post!

Unfortunately, your son’s scenario is different than the type of situation in the blog post. The calculation of risk in this case should not be directly equated to the false positive rate in a medical test scenario. In the base rate fallacy example with medical testing, the focus is on the probability of having a condition given a positive test result, considering the overall prevalence of the condition and the accuracy of the test.

Cholesterol levels are a known risk factor with casual association with heart attacks. So, it’s a different ballgame.

You need to evaluate the increased risk associated with the higher cholesterol level. However, when assessing heart attack risk due to high cholesterol, several factors come into play. It’s not just about the accuracy of cholesterol level as a predictor or the accuracy of the cholesterol test. The actual risk of a heart attack involves a more complex interplay of various risk factors, including but not limited to cholesterol levels, such as age, family history, lifestyle, and other health conditions.

They way you could do this statistically would require having say a binary logistic regression model. You could then enter the values of the risk factors to estimate the probability of a heart attack. Unfortunately, I don’t have a model like that!

Therefore, while it’s insightful to consider the prevalence of heart attacks and the relationship between cholesterol levels and heart attacks, the method oversimplifies the risk assessment. The probability of having a heart attack, especially in individual cases, requires a comprehensive evaluation of all risk factors, ideally in consultation with a healthcare professional.

Can Justin Kiessling says

A great overview, the phenomena hear describe positive & negative predictive values of a test. (PPV, NPV). As Jim points out, PPV and NPV are greatly influence by the prevalence of disease, lower prevalence of disease generally result in lower PPVs. That is, if the test is positive, what is the probability that you are actually positive for that disease. Remember, PPV = TP / (TP + FP). Doctors use sensitivity and specificity to determine which tests should be used for the patient, patients should be concerned about NPV and PPV of the test to get an idea of what their results mean for them. People generally assume that 99% sensitivity and specificity give sure results, but that’s not necessarily true if you factor in disease prevalence.