Test accuracy

Incidence

In epidemiology, health research, and statistical analysis, incidence refers to the occurrence of new cases in a population over a specified time. Analysts report it either as a raw count, a proportion (cumulative incidence), or a rate (incidence rate).

While prevalence captures all existing cases, incidence focuses strictly on newly occurring cases. It helps researchers and public health professionals understand how rapidly a condition is spreading, whether interventions are working, and how risks differ across groups.

Key Characteristics of Incidence

It is typically calculated as:

Incidence = (Number of new cases during a time period) ÷ (Number of people at risk during that period)

Requires a clearly defined time frame (e.g., new cases per year).
Includes only people who were initially at risk (i.e., did not already have the condition).
Reflects rate of occurrence, not an overall total.

There are two main types:

Cumulative incidence: The proportion of people who develop the condition over a set time frame.
Incidence rate (or density): Accounts for person-time and allows for varying follow-up periods among individuals.

It is especially useful for identifying emerging health threats, comparing risk across populations, and evaluating the impact of preventive strategies. Unlike prevalence, it is not affected by how long the condition lasts—only how often it appears.

Example

For example, in a year-long study of 5,000 people who did not have diabetes at the start, 200 are newly diagnosed during the year. The incidence is:

200 ÷ 5,000 = 0.04, or 4% per year

This means that 4% of the at-risk population developed diabetes during the one-year study period.

Prevalence

By Jim Frost

What Does Prevalence Mean?

In epidemiology, health research, and statistical analysis, prevalence refers to the proportion of individuals in a population who have a specific condition or characteristic at a given time. It is used to describe how widespread a disease, risk factor, or trait is within a defined population.

In classification and diagnostic testing, prevalence is also known as the base rate and represents the proportion of actual positive cases in the dataset. This base rate is critical for interpreting the meaning of test results and evaluating model performance.

Prevalence is a type of proportion and is typically expressed as a percentage, decimal, or fraction. It answers the question: “How common is this condition in the population right now?”

Key Characteristics of Prevalence

It is calculated as:

Prevalence = (Number of people with the condition) ÷ (Total number of people in the population)

It reflects existing cases, not new ones. This distinguishes it from incidence, which tracks only new cases over a defined time period.
It includes both newly diagnosed and long-standing cases present at the time of measurement.
Prevalence depends on both the rate of new cases and how long the condition lasts.

Prevalence plays a crucial role in interpreting the results of diagnostic tests. It directly affects measures such as positive predictive value (PPV) and negative predictive value (NPV). For example, even a highly accurate test can produce many false positives if the condition has a low base rate in the population. Learn more about the Base Rate Fallacy and the False Positive Paradox.

Example

In a health survey of 10,000 people, 250 are found to have high blood pressure. The prevalence is:

250 ÷ 10,000 = 0.025, or 2.5%

This means that at the time of the study, 2.5% of the population had high blood pressure.

Confusion Matrix

By Jim Frost

What is a Confusion Matrix?

A confusion matrix is a 2X2 table that summarizes the accuracy of a classification model or diagnostic test by comparing predicted outcomes to actual outcomes. It shows where the model or test made correct predictions and where it was wrong. This summary helps you understand whether it confuses positive and negatives, helping you improve its accuracy.

The matrix organizes outcomes into four mutually exclusive categories based on whether the results were positive or negative and whether they were correct or incorrect:

Each cell in the confusion matrix represents the following:

True Positive (TP): The model correctly predicts a positive case.
False Positive (FP): The model incorrectly predicts a positive case when it is actually negative.
False Negative (FN): The model incorrectly predicts a negative case when it is actually positive.
True Negative (TN): The model correctly predicts a negative case.

Common Metrics from a Confusion Matrix

A confusion matrix is the foundation for calculating many useful performance metrics in both diagnostic testing and classification models. Each metric describes a different way of evaluating how well the model performs based on the values in the matrix.

Sensitivity (True Positive Rate)

Sensitivity measures how well the confusion matrix captures actual positives. It is the proportion of true positives out of all people who actually have the condition.

Sensitivity = TP / (TP + FN)

For example, a COVID-19 test with a sensitivity of 90% correctly identifies 90% of infected individuals. High sensitivity is important when missing positive cases is costly or dangerous.

Specificity (True Negative Rate)

Specificity measures how well the confusion matrix accounts for actual negatives. It is the proportion of true negatives among all people who do not have the condition.

Specificity = TN / (TN + FP)

A cancer screening test with 95% specificity correctly rules out cancer in 95% of healthy individuals, reducing false positives and unnecessary follow-up procedures.

Learn in-depth about Sensitivity and Specificity: Definition, Formulas & Interpreting.

Positive Predictive Value (Precision)

Positive predictive value, also known as precision, is the proportion of positive predictions in the confusion matrix that are actually correct.

PPV = TP / (TP + FP)

If an email spam filter has a PPV of 80%, then 80% of emails it flags as spam truly are spam. This metric helps evaluate how trustworthy a positive result is.

Learn in-depth about Positive Predictive Value: Meaning, Formula, & Interpreting.

Negative Predictive Value (NPV)

Negative predictive value shows how often a negative prediction from the confusion matrix is accurate. It’s the proportion of true negatives among all negative predictions.

NPV = TN / (TN + FN)

A pregnancy test with an NPV of 92% gives a correct negative result 92% of the time. This builds confidence when the test result is negative.

Accuracy

Accuracy is the overall proportion of correct predictions, both positive and negative, based on the full confusion matrix.

Accuracy = (TP + TN) / (TP + FP + FN + TN)

If a classification model has 94% accuracy, it gets the right answer 94% of the time. However, accuracy can be misleading if most outcomes fall into one category. The confusion matrix helps reveal whether high accuracy reflects balanced performance or a skewed dataset.

F1 Score

The F1 score combines sensitivity and precision into one metric by taking their harmonic mean. It’s especially useful when there’s class imbalance or when both false positives and false negatives matter.

F1 Score = 2 × (Precision × Sensitivity) / (Precision + Sensitivity)

A fraud detection model with an F1 score of 0.75 is doing a solid job at both catching fraudulent transactions and avoiding false alarms. The confusion matrix provides the values needed to calculate this score and assess that balance.

Post-Test Probability

By Jim Frost

What is a Post-Test Probability?

Post-test probability is the probability that a person has (or does not have) a condition after receiving the results of a diagnostic test. It provides a personalized estimate based on both the individual’s pre-test probability (such as their symptoms, risk factors, or local disease prevalence) and the accuracy of the test.

Unlike sensitivity, specificity, and likelihood ratios, which are fixed characteristics applicable only to the test itself, the post-test probability varies with the pre-test probability and applies to individual patients. It answers the question, “Given this test result, how likely is it that this person has the condition?”

Clinicians estimate post-test probability using Bayes’ theorem, which updates the pre-test probability based on the test’s positive or negative likelihood ratio. A positive test result uses the positive likelihood ratio (LR⁺) to raise the probability of disease. A negative test result uses the negative likelihood ratio (LR⁻) to lower it.

How to Calculate the Post-Test Probability

To calculate post-test probability, you first convert the pre-test probability into odds:

Then apply the appropriate likelihood ratio (LR⁺ or LR⁻) depending on the test result:

Finally, convert the post-test odds back into a probability:

A higher post-test probability suggests the condition is more likely after a positive test result. A lower value suggests the condition is less likely after a negative test result.

Example Calculations

A patient has a 30% pre-test probability of having strep throat based on symptoms and local prevalence. The test result is positive, and the rapid strep test has a positive likelihood ratio (LR⁺) of 4.5.

First, convert the pre-test probability to odds:

Multiply by LR⁺ to obtain the post-test odds:

Convert to post-test probability:

The post-test probability is about 66%, meaning that after the positive result, the patient is more likely than not to have strep, though it’s not certain.

Negative Likelihood Ratio [LR⁻]

By Jim Frost

What is the Negative Likelihood Ratio (LR⁻)?

The negative likelihood ratio (LR⁻) is a diagnostic testing assessment that indicates how much less likely a negative test result is in someone with the condition compared to someone without it. A lower LR⁻ value means a stronger ability to rule out the disease. It does not tell you the probability that a person is disease-free if they test negative—that’s the negative predictive value, which incorporates disease prevalence.

The negative likelihood ratio formula expresses the ratio of two probabilities: the chance of a false negative result in someone with the condition, divided by the chance of a true negative result in someone without it. It is calculated using both sensitivity and specificity:

The negative likelihood ratio tells you how much less likely a person without the condition is to test negative compared to a person with the condition. For instance, a value of 0.2 means someone without the disease is only 20% as likely to test negative than someone with the disease.

The lower the LR⁻, the more informative a negative result is. A value of 1 means the test result provides no diagnostic value, while values below 0.1 are often considered strong evidence to rule out the condition.

Like sensitivity and specificity, the negative likelihood ratio reflects the inherent ability of the test to distinguish between those with and without the condition. It does not depend on how common the condition is in the population.

However, the likelihood ratio serves as a bridge between test accuracy and clinical decision-making. You can use it with a patient’s pre-test odds to calculate their post-test odds using Bayes’ theorem. Because pre-test odds typically reflect the condition’s prevalence in the relevant population, this approach incorporates prevalence into the interpretation, providing a more personalized assessment of the test result for that individual.

For the related measure that applies to positive test results, see the positive likelihood ratio (LR⁺).

LR⁻ Example Calculation and Interpretation

A test for influenza has a sensitivity of 95% and a specificity of 70%. The negative likelihood ratio is:

This value means that if a person has the disease, they are only about 7.1% as likely to test negative as someone without the disease. In other words, negative results are far more common in people who don’t have the condition. You can apply this likelihood ratio to a pre-test probability based on clinical symptoms or known prevalence to estimate the post-test probability that the person truly does not have the condition.

To express this in more intuitive terms, you can take the reciprocal:

This reciprocal of the negative likelihood ratio indicates that a person without influenza is about 14 times more likely to test negative than a person with influenza.

Positive Likelihood Ratio [LR⁺]

By Jim Frost

What is the Positive Likelihood Ratio (LR⁺)?

The positive likelihood ratio (LR⁺) is a diagnostic testing assessment that indicates how much more likely a positive test result is in someone with the condition compared to someone without it. A higher LR⁺ value means a stronger ability to rule in the disease. It does not directly tell you the probability that a person has the disease if they test positive—that’s the positive predictive value, which incorporates disease prevalence.

The positive likelihood ratio formula expresses the ratio of two probabilities: the chance of a true positive result in someone with the condition, divided by the chance of a false positive result in someone without it. It is calculated using both sensitivity and specificity:

The positive likelihood ratio tells you how much more likely a person with the condition is to test positive compared to a person without the condition. For instance, a value of 2 means a person with the disease is twice as likely to test positive as someone without the disease.

The higher the LR⁺, the more informative a positive result is. A value of 1 means the test result provides no diagnostic value, while values above 10 are often considered strong evidence to rule in the condition.

Like sensitivity and specificity, the positive likelihood ratio reflects the inherent ability of the test to distinguish between those with and without the condition. It does not depend on the prevalence of the condition in the population.

For the complementary measure that describes how to interpret negative test results, see the negative likelihood ratio (LR⁻).

LR⁺ Example Calculation and Interpretation

A test for strep throat has a sensitivity of 90% and a specificity of 80%. The positive likelihood ratio is:

This result indicates that a person with strep throat is 4.5 times more likely to test positive than someone who does not have it. You can apply this positive likelihood ratio to a pre-test probability based on clinical symptoms or prevalence to estimate the post-test probability that the person truly has the condition.

False Positive Rate [FPR]

By Jim Frost

False Positive Rate (FPR) is a testing accuracy measure that describes the likelihood of incorrectly identifying a condition when it is not actually present. It is the proportion of people who do not have the condition but test positive anyway. In other words, it is the rate of false alarms generated by the test. A low FPR indicates that the test rarely produces misleading positives, which is especially important when false positives can lead to unnecessary treatments, anxiety, or follow-up tests. This statistic assesses a test’s inherent accuracy and does not incorporate real-world disease prevalence.

False Positive Rate is a measure derived from a confusion matrix, which records the number of true positives, false positives, true negatives, and false negatives. FPR is calculated using the number of false positives and true negatives.

The false positive rate formula is the following:

FPR = False Positives / (False Positives + True Negatives)

FPR expresses false alarms as a proportion. The numerator is the number of false positives—people who don’t have the condition but receive a positive result. The denominator includes all people without the condition: those correctly identified as negative (TN) and those falsely identified as positive (FP).

Because it reflects how the test behaves across all people without the condition, the false positive rate is not affected by disease prevalence—making it useful for assessing the inherent tendency of a test to produce false alarms. If you want to understand the meaning of an individual’s test result that incorporates disease prevalence, use measures such a positive predictive value (PPV) and negative predictive value (NPV) instead.

Additionally, the false positive rate is the complement of specificity, meaning:

FPR = 1 – Specificity

Suppose a test evaluates 1,000 people who do not have a disease. If 950 are correctly identified as negative and 50 are incorrectly flagged as positive, then:

FPR = 50 / (50 + 950) = 0.05 or 5%

This result indicates 5% of healthy individuals received a false positive result.

Negative Predictive Value [NPV]

By Jim Frost

Negative Predictive Value (NPV) is a measure of a diagnostic test’s accuracy that represents the probability a person who tests negative truly does not have the condition. NPV focuses on how trustworthy a negative result is in real-world testing scenarios. Hence, it is best for interpreting an individual negative test result.

Conversely, if you’re evaluating a test’s inherent accuracy, it’s better to use sensitivity and specificity because a disease’s prevalence in the real world does not affected them.

Negative Predictive Value is one of several measures calculated from a confusion matrix, which categorizes test results into four mutually exclusive outcomes: true positives (TP), false positives (FP), true negatives (TN), and false negatives (FN). NPV uses the counts of true negatives and false negatives to determine how often a negative test result truly indicates the absence of the condition.

NPV is calculated using this formula:

NPV = True Negatives / (True Negatives + False Negatives)

This calculation finds the proportion of all negative test results that are correct. Like positive predictive value (PPV), the NPV depends on the prevalence of the condition in the population being tested. NPV tends to be higher when the condition is rare.

For example, imagine a disease screening test applied to 1,000 people, where 900 are truly disease-free and test negative, and 30 people with the disease mistakenly test negative. The NPV would be:

NPV = 900 / (900 + 30) = 0.9677 or 96.8%

This result indicates there is a 96.8% chance that a person with a negative result truly does not have the disease.

Learn about the positive version of this statistic in my post: Positive Predictive Value: Mean, Formula, and Interpretation.

True Negative [TN]

By Jim Frost

A true negative occurs when a test correctly identifies the absence of a condition. That is, the individual does not have the condition, and the test appropriately returns a negative result. It is one of the four mutually exclusive outcomes in a confusion matrix, which categorizes test results into: true positives (TP), false positives (FP), true negatives (TN), and false negatives (FN).

True negatives contribute to several important accuracy measures, including specificity, negative predictive value (NPV), and overall accuracy. These metrics help evaluate how reliably a test avoids false alarms and confirms the absence of a condition.

For example, if a tuberculosis test correctly returns a negative result for someone who does not have the disease, that person avoids unnecessary anxiety, treatment, and follow-up testing. This outcome builds confidence in the test’s ability to rule out the condition.

False Negative [FN]

By Jim Frost

A false negative occurs when a test or model fails to identify a case where the condition or outcome is actually present. It is one of the four mutually exclusive outcomes in a confusion matrix, which categorizes test results into: true positives (TP), false positives (FP), true negatives (TN), and false negatives (FN).

False negatives directly affect measures like sensitivity, negative predictive value (NPV), and false negative rate (FNR). These metrics help determine how often the test fails to detect a condition that is truly present—an especially serious issue in high-stakes settings like medicine or safety.

For example, a false negative result on a mammogram means a person with breast cancer is told they don’t have it. This can delay treatment, allow the disease to progress, and significantly reduce the chance of a successful outcome.

True Positive [TP]

By Jim Frost

A true positive occurs when a test correctly identifies a positive case—that is, the test detects the condition when it is truly present. It is one of the four mutually exclusive outcomes in a confusion matrix, which categorizes test results into: true positives (TP), false positives (FP), true negatives (TN), and false negatives (FN).

True positives are essential for calculating test performance metrics such as sensitivity, positive predictive value (PPV), and overall accuracy. These measures assess how effectively a test detects real cases of the condition it aims to identify.

For example, a true positive result in a COVID-19 test means the person who actually has the virus was correctly identified. This is beneficial because it allows for timely isolation and treatment, which can reduce spread and improve health outcomes.

False Positive [FP]

By Jim Frost

A false positive occurs when a test or classification system incorrectly labels a negative case as positive. In other words, the test indicates the presence of a condition or outcome that isn’t actually there. It is one of the four outcomes in a confusion matrix, which categorizes test results into: true positives (TP), false positives (FP), true negatives (TN), and false negatives (FN).

False positives play a key role in calculating several accuracy measures, including specificity, positive predictive value (PPV), and false positive rate (FPR). Each of these metrics helps evaluate a different aspect of a test’s accuracy.

For example, a false positive result on a screening test for colon cancer means that a healthy person is incorrectly told they might have cancer. This can lead to unnecessary emotional distress, invasive follow-up procedures like colonoscopies, and increased healthcare costs—all without any medical benefit.