## What is a Negative Binomial Distribution?

The negative binomial distribution describes the number of trials required to generate an event a particular number of times. When you provide an event probability and the number of successes (r), this distribution calculates the likelihood of observing the R^{th} success on the N^{th} attempt. Statisticians also refer to this discrete probability distribution as the Pascal distribution.

Use the negative binomial distribution for binary outcomes, which have only two possible values that are mutually exclusive.

For example, the negative binomial distribution can answer the following questions. What is the probability of the following:

- Rolling the 5
^{th}six on the 20^{th}roll of a die? - Getting the 10
^{th}defective item on the 1000^{th}item inspected? - Selecting the 10
^{th}woman as the 15^{th}participant?

Statisticians refer to it as the *negative* binomial distribution because it models the number of failures, unlike the binomial distribution, which models the number of successes. It models the failures that occur before the number of successes you specify. This distribution is an example of a Probability Mass Function (PMF) because it calculates likelihoods for discrete random variables.

In this post, learn when to use the negative binomial distribution, its formula, and how to calculate negative binomial probabilities by hand. I also include a negative binomial calculator to help you practice what you learn.

For an overview of other distributions you can use with binary data, read my posts, Maximize the Value of Your Binary Data, the Bernoulli, Binomial, Geometric, and the Hypergeometric Distribution.

## Negative Binomial Probabilities

The negative binomial distribution models the probabilities for the r^{th} success occurring on the N^{th} trial when you know the event probability. Let’s bring it to life with an example!

I’ll start by using statistical software to find negative binomial probabilities and create distribution graphs. This approach will help you understand what you can learn from it.

Imagine you’re playing a game where rolling sixes on a die is beneficial. You want to find the probability of getting the 5^{th} six on the 20^{th} roll. In this example, the number of successes is 5 (r), the number of trials is 20 (N), and the success probability (p) is 1/6 = 0.1667.

The software indicates that the likelihood is:

The negative binomial distribution calculates a probability of 0.0323655 for rolling the 5^{th} six on the 20^{th} roll.

That’s interesting but perhaps not useful by itself. We’d also like to know the chances for rolling the 5^{th} six on other rolls. Seeing the distribution of probabilities for a range of trials is much more helpful.

## Negative Binomial Distribution Graph

Negative binomial probabilities are helpful because they provide the probability of the R^{th} success occurring on a specific trial (N). Expanding upon this approach, a negative binomial distribution graph displays the probabilities of the R^{th} success occurring on each attempt over a range of trials.

For example, the distribution graph below displays the likelihood of rolling the fifth six on the 6^{th}, 7^{th}, 8^{th}, etc., die rolls. On the chart, each bar indicates the likelihood of rolling the 5^{th} six on the specified number of die rolls. The fifth roll is theoretically your first chance for getting the 5^{th} six, but the probability is too low to display. Instead, the negative binomial distribution graph starts with the sixth roll. The graph stops at 81 rolls because the probability of obtaining the 5th six after that is too low to display.

On the negative binomial distribution graph, I’ve highlighted in red the bar that corresponds to the previous statistical output for the probability of rolling the 5^{th} 6 on the 20th roll. More generally, the chart indicates that the maximum likelihood (0.03563) of rolling the fifth six happens on the 24^{th} roll, which is the tallest bar. Before 24 rolls, your probability of throwing the 5^{th} 6 increases for each successive roll. After 24 rolls, the likelihood for each roll decreases. On the declining portion of the curve, you’ve had so many rolls that you’ve probably already rolled five 6s.

**Related post**: Understanding Probability Distributions

## Cumulative Distribution Function

The negative binomial distribution is excellent for understanding the probability of the R^{th} success occurring on the N^{th} trial. However, you’re frequently not interested in the chances for only one particular attempt. Instead, you might want to learn the total probability of the R^{th} success occurring over a range of trials.

For example, imagine that rolling five sixes indicates you’re doing well. You might be interested in the range of rolls where you have a 50% chance of rolling the 5^{th} six.

The negative binomial cumulative distribution function can help you out!

Technically, the negative binomial cumulative probability calculates the likelihood of obtaining the number of successes in less than or equal to N trials. When you need to get a ≥ chance, use the inverse cumulative distribution. Modern statistical software usually allows you to choose the direction of the cumulative function.

For our example, we’d like to find the number of rolls in which we have a ~50% chance of throwing the 5^{th} six. Below, the shaded region shows the cumulative probability of rolling the 5^{th} six in 27 or fewer die rolls.

In the negative binomial distribution graph, the shaded area indicates that the cumulative probability of obtaining 5^{th} six in the first 27 rolls is nearly 0.5.

Learn more about Cumulative Distribution Functions: Uses, Graphs & vs PDF.

## Negative Binomial Distribution Assumptions and Notation

The negative binomial distribution models the probabilities for obtaining exactly R successes on the N^{th} trial. However, your data must satisfy the following assumptions for the negative binomial distribution to be valid.

**Only two possible outcomes per trial**. For example, pass or fail, sale or no sale, defective or not defective, etc.**Independent trials**. One trial’s result does not affect the following trial. For example, a coin toss doesn’t affect the next coin toss. Learn more about Independent Events.**The probability remains constant over time**. In some contexts, this assumption is valid due to the physical attributes of the trials, such as coin tosses and die rolls. However, the probability won’t necessarily be steady in other areas. For example, a manufacturing plant’s chances of producing defective parts can vary over time. If the likelihood can change, use a P chart (a control chart) to assess this assumption.

Learn about independent and identically distributed (IID) data, the assumption relating to items #2 and #3.

### Parameters and Notation

The negative binomial distribution has three parameters, r, n, and p.

**r**: number of successes.**n**: number of trials.**n****—r**: number of failures**p**: the event or success probability.

You denote a negative binomial distribution as nb(r,p).

Alternatively, you can write X∼NB(r,p), which means that your random variable X follows a negative binomial distribution with r successes and an event probability of p.

The die rolling example assesses probabilities for rolling five 6s in a series of die rolls. In this scenario, rolling a six is a success, and failure is anything else. In this case, the success probability is 1/6 = 0.1667.

For the example, rolling sixes on a die is our random variable X, we specify 5 successes, and the probability is 0.1667. The negative binomial distribution notation for this scenario is the following:

X∼nb(5,0.1667)

## Negative Binomial Distribution Calculator

Use this calculator to find negative binomial probabilities.

Letâ€™s use this calculator to solve the previous die example. In the calculator, enter n (number of events) = 20, r (number of successes) = 5, and Probability of one success = 0.1667. The calculator displays a probability of 0.03237, matching our results above to within rounding error.

Now, try one yourself. Imagine youâ€™re drawing a random sample of 13 individuals for jury duty. Assume that females comprise 50% of the population. What is the probability of selecting the 7^{th} woman on the 13^{th} selection?

Find the answer at the end of this post.

Now, letâ€™s proceed to the formula for those who want to calculate the probabilities manually.

## Negative Binomial Distribution Formula

Usually, youâ€™ll use online calculators or statistical software to find the probabilities for the negative binomial distribution. However, here is the negative binomial distribution formula to calculate them manually. Additionally, Iâ€™ll work through an example calculation using the formula.

The negative binomial distribution formula is the following:

Where:

- n = number of trials.
- r = the number of successes
- p = the probability of a success.

C (n-1, r-1) is the binomial coefficient which finds the number of ways to obtain a set of trials with the specified number of successes when the order of outcomes does not matterâ€”except the final attempt must be a success. Technically, itâ€™s the number of combinations without repetition. Learn more in my post about Finding Combinations.

The negative binomial distribution formula takes the number of combinations, multiplies that by the success probability raised by the successes, and multiplies that by the failure probability raised by the failures.

This equation is similar to the one for binomial probabilities. The differences between the two are the following:

Binomial |
Negative Binomial |

You specify the probability and number of trials, and the distribution finds the chances for a range of successes. | You specify the probability and the number of successes, and the distribution finds the chances for a range of trials. |

The binomial coefficient is larger because there are more possible combinations for r successes in n trials when the final attemptâ€™s outcome can be a success or failure. | The binomial coefficient is smaller because there are fewer possible combinations for r successes in n trials when the final attempt must be a success. |

Working through a calculation will bring the formula to life!

## Worked Example Using the Formula

Weâ€™ll use the negative binomial distribution formula to calculate the probability of rolling the 5^{th} six on the 20^{th} die roll. Enter these values into the formula:

- n = 20
- r = 5
- p = 0.1667

For the number of combinations, we have:

Now, let’s enter our values into the negative binomial distribution formula.

This hand calculation verifies the statistical software solutions within rounding error.

If you need to find the cumulative negative binomial probability for a range of trials, calculate the probability for each attempt and then sum the probabilities for all trials of interest.

For example, to calculate the likelihood of getting the 5th six sometime within 20 rolls, calculate the probability for the 5th six on the 5th, 6th, 7th, . . ., 20th rolls. Then sum that set of probabilities.

In the calculator example, there is a probability of 0.11279 for picking the 7^{th} woman on the 13^{th} selection.

David Van Camp says

Hi Jim, many thanks for this excellent blog!

I found this post after months of searching for the best function for what you describe: the probability of success on the last of N trials with R successes. THANK YOU!

However, I wanted to note that most implementations (it seems), including the EXCEL Negbinom.Dist Function, Wolfram Mathworld, the PascalDistribution class in the Apache Commons Java library, etc., provide an alternate formula which calculates using the number of k failures before finding r successes (so N=R+K, if I understand this correctly.)

This implementation is described as Alternate Formulation #1 while the above is given as #2 in http://en.wikipedia.org/wiki/Negative_binomial_distribution.

I offer this information in case any of your readers might be as confused about this as I was.

thanks again!

David Van Camp