What is a Negative Binomial Distribution?
The negative binomial distribution describes the number of trials required to generate an event a particular number of times. When you provide an event probability and the number of successes (r), this distribution calculates the likelihood of observing the Rth success on the Nth attempt. Statisticians also refer to this discrete probability distribution as the Pascal distribution.
Use the negative binomial distribution for binary outcomes, which have only two possible values that are mutually exclusive.
For example, the negative binomial distribution can answer the following questions. What is the probability of the following:
- Rolling the 5th six on the 20th roll of a die?
- Getting the 10th defective item on the 1000th item inspected?
- Selecting the 10th woman as the 15th participant?
Statisticians refer to it as the negative binomial distribution because it models the number of failures, unlike the binomial distribution, which models the number of successes. It models the failures that occur before the number of successes you specify. This distribution is an example of a Probability Mass Function (PMF) because it calculates likelihoods for discrete random variables.
In this post, learn when to use the negative binomial distribution, its formula, and how to calculate negative binomial probabilities by hand. I also include a negative binomial calculator to help you practice what you learn.
For an overview of other distributions you can use with binary data, read my posts, Maximize the Value of Your Binary Data, the Bernoulli, Binomial, Geometric, and the Hypergeometric Distribution.
Negative Binomial Probabilities
The negative binomial distribution models the probabilities for the rth success occurring on the Nth trial when you know the event probability. Let’s bring it to life with an example!
I’ll start by using statistical software to find negative binomial probabilities and create distribution graphs. This approach will help you understand what you can learn from it.
Imagine you’re playing a game where rolling sixes on a die is beneficial. You want to find the probability of getting the 5th six on the 20th roll. In this example, the number of successes is 5 (r), the number of trials is 20 (N), and the success probability (p) is 1/6 = 0.1667.
The software indicates that the likelihood is:
The negative binomial distribution calculates a probability of 0.0323655 for rolling the 5th six on the 20th roll.
That’s interesting but perhaps not useful by itself. We’d also like to know the chances for rolling the 5th six on other rolls. Seeing the distribution of probabilities for a range of trials is much more helpful.
Negative Binomial Distribution Graph
Negative binomial probabilities are helpful because they provide the probability of the Rth success occurring on a specific trial (N). Expanding upon this approach, a negative binomial distribution graph displays the probabilities of the Rth success occurring on each attempt over a range of trials.
For example, the distribution graph below displays the likelihood of rolling the fifth six on the 6th, 7th, 8th, etc., die rolls. On the chart, each bar indicates the likelihood of rolling the 5th six on the specified number of die rolls. The fifth roll is theoretically your first chance for getting the 5th six, but the probability is too low to display. Instead, the negative binomial distribution graph starts with the sixth roll. The graph stops at 81 rolls because the probability of obtaining the 5th six after that is too low to display.
On the negative binomial distribution graph, I’ve highlighted in red the bar that corresponds to the previous statistical output for the probability of rolling the 5th 6 on the 20th roll. More generally, the chart indicates that the maximum likelihood (0.03563) of rolling the fifth six happens on the 24th roll, which is the tallest bar. Before 24 rolls, your probability of throwing the 5th 6 increases for each successive roll. After 24 rolls, the likelihood for each roll decreases. On the declining portion of the curve, you’ve had so many rolls that you’ve probably already rolled five 6s.
Related post: Understanding Probability Distributions
Cumulative Distribution Function
The negative binomial distribution is excellent for understanding the probability of the Rth success occurring on the Nth trial. However, you’re frequently not interested in the chances for only one particular attempt. Instead, you might want to learn the total probability of the Rth success occurring over a range of trials.
For example, imagine that rolling five sixes indicates you’re doing well. You might be interested in the range of rolls where you have a 50% chance of rolling the 5th six.
The negative binomial cumulative distribution function can help you out!
Technically, the negative binomial cumulative probability calculates the likelihood of obtaining the number of successes in less than or equal to N trials. When you need to get a ≥ chance, use the inverse cumulative distribution. Modern statistical software usually allows you to choose the direction of the cumulative function.
For our example, we’d like to find the number of rolls in which we have a ~50% chance of throwing the 5th six. Below, the shaded region shows the cumulative probability of rolling the 5th six in 27 or fewer die rolls.
In the negative binomial distribution graph, the shaded area indicates that the cumulative probability of obtaining 5th six in the first 27 rolls is nearly 0.5.
Learn more about Cumulative Distribution Functions: Uses, Graphs & vs PDF.
Negative Binomial Distribution Assumptions and Notation
The negative binomial distribution models the probabilities for obtaining exactly R successes on the Nth trial. However, your data must satisfy the following assumptions for the negative binomial distribution to be valid.
- Only two possible outcomes per trial. For example, pass or fail, sale or no sale, defective or not defective, etc.
- Independent trials. One trial’s result does not affect the following trial. For example, a coin toss doesn’t affect the next coin toss. Learn more about Independent Events.
- The probability remains constant over time. In some contexts, this assumption is valid due to the physical attributes of the trials, such as coin tosses and die rolls. However, the probability won’t necessarily be steady in other areas. For example, a manufacturing plant’s chances of producing defective parts can vary over time. If the likelihood can change, use a P chart (a control chart) to assess this assumption.
Learn about independent and identically distributed (IID) data, the assumption relating to items #2 and #3.
Parameters and Notation
The negative binomial distribution has three parameters, r, n, and p.
- r: number of successes.
- n: number of trials.
- n—r: number of failures
- p: the event or success probability.
You denote a negative binomial distribution as nb(r,p).
Alternatively, you can write X∼NB(r,p), which means that your random variable X follows a negative binomial distribution with r successes and an event probability of p.
The die rolling example assesses probabilities for rolling five 6s in a series of die rolls. In this scenario, rolling a six is a success, and failure is anything else. In this case, the success probability is 1/6 = 0.1667.
For the example, rolling sixes on a die is our random variable X, we specify 5 successes, and the probability is 0.1667. The negative binomial distribution notation for this scenario is the following:
X∼nb(5,0.1667)
Negative Binomial Distribution Calculator
Use this calculator to find negative binomial probabilities.
Let’s use this calculator to solve the previous die example. In the calculator, enter n (number of events) = 20, r (number of successes) = 5, and Probability of one success = 0.1667. The calculator displays a probability of 0.03237, matching our results above to within rounding error.
Now, try one yourself. Imagine you’re drawing a random sample of 13 individuals for jury duty. Assume that females comprise 50% of the population. What is the probability of selecting the 7th woman on the 13th selection?
Find the answer at the end of this post.
Now, let’s proceed to the formula for those who want to calculate the probabilities manually.
Negative Binomial Distribution Formula
Usually, you’ll use online calculators or statistical software to find the probabilities for the negative binomial distribution. However, here is the negative binomial distribution formula to calculate them manually. Additionally, I’ll work through an example calculation using the formula.
The negative binomial distribution formula is the following:
Where:
- n = number of trials.
- r = the number of successes
- p = the probability of a success.
C (n-1, r-1) is the binomial coefficient which finds the number of ways to obtain a set of trials with the specified number of successes when the order of outcomes does not matter—except the final attempt must be a success. Technically, it’s the number of combinations without repetition. Learn more in my post about Finding Combinations.
The negative binomial distribution formula takes the number of combinations, multiplies that by the success probability raised by the successes, and multiplies that by the failure probability raised by the failures.
This equation is similar to the one for binomial probabilities. The differences between the two are the following:
Binomial | Negative Binomial |
You specify the probability and number of trials, and the distribution finds the chances for a range of successes. | You specify the probability and the number of successes, and the distribution finds the chances for a range of trials. |
The binomial coefficient is larger because there are more possible combinations for r successes in n trials when the final attempt’s outcome can be a success or failure. | The binomial coefficient is smaller because there are fewer possible combinations for r successes in n trials when the final attempt must be a success. |
Working through a calculation will bring the formula to life!
Worked Example Using the Formula
We’ll use the negative binomial distribution formula to calculate the probability of rolling the 5th six on the 20th die roll. Enter these values into the formula:
- n = 20
- r = 5
- p = 0.1667
For the number of combinations, we have:
Now, let’s enter our values into the negative binomial distribution formula.
This hand calculation verifies the statistical software solutions within rounding error.
If you need to find the cumulative negative binomial probability for a range of trials, calculate the probability for each attempt and then sum the probabilities for all trials of interest.
For example, to calculate the likelihood of getting the 5th six sometime within 20 rolls, calculate the probability for the 5th six on the 5th, 6th, 7th, . . ., 20th rolls. Then sum that set of probabilities.
In the calculator example, there is a probability of 0.11279 for picking the 7th woman on the 13th selection.
David Van Camp says
Hi Jim, many thanks for this excellent blog!
I found this post after months of searching for the best function for what you describe: the probability of success on the last of N trials with R successes. THANK YOU!
However, I wanted to note that most implementations (it seems), including the EXCEL Negbinom.Dist Function, Wolfram Mathworld, the PascalDistribution class in the Apache Commons Java library, etc., provide an alternate formula which calculates using the number of k failures before finding r successes (so N=R+K, if I understand this correctly.)
This implementation is described as Alternate Formulation #1 while the above is given as #2 in http://en.wikipedia.org/wiki/Negative_binomial_distribution.
I offer this information in case any of your readers might be as confused about this as I was.
thanks again!
David Van Camp