What is a Binomial Distribution?
The binomial distribution is a discrete probability distribution that calculates the probability an event will occur a specific number of times in a set number of opportunities. Use the binomial distribution when your outcome is binary. Binary outcomes have only two possible values that are mutually exclusive.
For example, the binomial distribution can answer the following questions. What is the probability of getting:
- Six heads when you toss the coin ten times?
- 12 women in a sample size of 20?
- Three defective items in a batch of 100?
- Two flu infections over 20 years?
In this post, learn how to use the binomial distribution and its cumulative form, when you can use it, its formula, and how to calculate binomial probabilities by hand. I also include a binomial calculator that you can use with what you learn.
Related post: Understanding Probability Distributions
Using the Binomial Distribution
The binomial distribution models the probabilities for exactly X events occurring in N trials when the probability of an event is known. Let’s get into some examples because that brings it to life!
I’ll start by using statistical software to calculate the binomial distribution probabilities and create distribution plots. This process will help you understand what you can learn from it.
Suppose you’re playing a game where rolling sixes on a die is really good. You want to know the probability of rolling exactly three sixes in ten die rolls. In this example, the number of events is 3 (X), the number of trials is 10 (N), and the probability (p) is 1/6 = 0.1667.
My software tells me that the likelihood is:
The binomial distribution calculates a probability of 0.155095 for rolling precisely three sixes in ten rolls.
That’s interesting but perhaps not so helpful by itself. We’re also interested in the probabilities for rolling other numbers of sixes.
Graphing the Full Distribution of Outcomes
The binomial distribution is beneficial because it can describe the probability of all possible numbers of successes (Xs) out of the total number of trials (N). In the graph below, the distribution finds the probability of rolling exactly no sixes, 1 six, 2 sixes, 3 sixes, . . ., and up to 10 sixes in the ten die rolls. Using this approach, the distribution covers the complete range of possible successes up to the total number of trials.
In the chart, each bar represents the probability of rolling a specific number of sixes out of ten die rolls. The graph does not show the chances for seven and higher because the likelihoods of that many sixes in just ten rolls are too low to display on the chart.
The graph indicates the probability of rolling no sixes is about 16%. The highest chance is rolling one six (32%). Although, rolling two sixes occurs almost as frequently. Probabilities drop off quickly starting with three sixes. Additionally, the bar for three sixes matches our earlier result of 0.155095.
Binomial Cumulative Distribution Function
The binomial distribution is excellent for understanding the probability of obtaining an exact number of events (X) within a certain number of trials (N). However, many times you’re not interested in just one specific value. For example, in the die rolling example above, you might know from experience that rolling three or more sixes within ten rolls means you’re doing well. So, you actually want to learn the probability of rolling at least three sixes.
Let me introduce you to the binomial cumulative distribution function.
Technically, the binomial cumulative probability calculates the likelihood of obtaining less than or equal to X events in N trials. If you need to obtain a ≥ probability, use the inverse cumulative distribution. These days, statistical software will generally let you specify the direction of the cumulative function for the binomial distribution from the start.
For our example, we want to know the chances of rolling ≥ 3 sixes in 10 rolls. Below, the shaded region shows the inverse cumulative probability of rolling at least three sixes in ten die rolls.
The probability for rolling three or more sixes in ten rolls is 0.2249, not quite 1 in 4.
For a real-world example, see how I’ve used the binomial distribution to model the number of flu infections (X) for the vaccinated vs. unvaccinated over 20 years (N).
Binomial Distribution Assumptions and Notation
The binomial distribution models the probabilities for exactly X successes occurring in N trials. However, your data must satisfy the following requirements for the binomial distribution to be appropriate.
- There must be only two possible outcomes per trial. For example, defective or not defective, sale or no sale, pass or fail, etc.
- The trials are independent. One trial’s outcome does not affect the subsequent trial. For instance, one coin toss doesn’t affect the result of the following coin toss.
- The probability remains constant over time. In some areas, this assumption is true due to the physical characteristics of the process, such as coin tosses and die rolls. However, the probability won’t necessarily remain constant in other contexts. For example, the likelihood that a manufacturing process creates defective parts can change over time. If the probability can change, use the P chart (a control chart) to confirm this assumption.
Typically, you’ll use the binomial distribution when you have Bernoulli Trials, also known as Binomial Experiments. These trials involve variables that satisfactorily follow the binomial distribution assumptions above. In these trials, analysts label one of the possible outcomes as a success and the other outcome a failure.
A Bernoulli trial contains a set number of trials where the probability of a success is constant. The experiment counts the number of successes (X) out of the total number of trials (N).
You can think of the binomial distribution as modeling the number of successes (X) in a sample size of N.
Parameters and Notation
The binomial distribution has two parameters, n and p.
- n: the number of trials.
- p: the event or success probability.
You denote a binomial distribution as b(n,p).
Alternatively, you can write X∼b(n,p), which means that your random variable X follows a binomial distribution with n trials and an event probability of p.
The previous examples assess probabilities corresponding with rolling sixes in a series of 10 die rolls. In this scenario, success is rolling a six, while a failure is rolling anything other than a six. The probability of rolling a six is 1/6 = 0.1667.
If rolling sixes is our random variable X, and we roll the die ten times, we can use the following notation for the binomial distribution:
Binomial Distribution Calculator
Use this binomial distribution calculator to calculate the binomial probabilities and cumulative probabilities. Note that it uses “events” to indicate the number of trials (n).
Let’s use this calculator to recreate the preceding die examples. In the calculator, enter Number of events (n) = 10, Probability of success per event (p) = 16.67%, choose exactly r successes, and Number of successes (r) = 3. The calculator displays a probability of 15.51%, matching our results above for this specific number of sixes.
Next, change exactly r successes to r or more successes. The calculator displays 22.487, matching the results for our example with the binomial inverse cumulative distribution.
Now, try one yourself. Imagine you’re drawing a random sample of 20 from a population where 10% are statisticians. You’re hoping that your study will have 3 or fewer statisticians because they’ll gang up and ask too many pesky questions about your study design. What is the probability of obtaining ≤ 3 statisticians?
See the correct answer at the end of this post. Next, onto the formula for those who want to calculate the probabilities manually.
Binomial Distribution Formula
Typically, you’ll use statistical software or online calculators to calculate the probabilities for the binomial distribution. However, I’ll show you the binomial distribution formula to calculate them manually. Additionally, I’ll walk you through the formula with a worked example.
Mean of the Binomial Distribution
Let’s start with the formula for the mean of the binomial distribution.
n * p
Multiply the number of trials by the success probability. This value represents the average or expected number of successes.
For example, we roll the die ten times, and the probability of rolling a six is 0.1667.
10 * 0.1667
The mean for this binomial distribution is 1.667. On average, we’d expect to roll that many sixes in ten rolls. Of course, the actual counts of successes will always be either zero or a positive integer.
Binomial Distribution Probabilities
The binomial distribution formula is the following:
- n is the number of trials.
- X is the number of successes
- p is the probability of a success.
nCx is the number of ways to obtain samples with the specified number of successes occurring within the set number of trials where the order of outcomes does not matter. Specifically, it’s the number of combinations without repetition. For more information, read my post about Finding Combinations.
The binomial distribution formula takes the number of combinations, multiplies that by the probability of success raised by the number of successes, and multiplies that by the probability of failures raised by the number of failures.
Let’s work through an example calculation to bring the formula to life!
Worked Example of Finding a Binomial Distribution Probability
We’ll use the binomial distribution formula to calculate the probability of rolling exactly three sixes in ten die rolls for this example. Here are the values to enter into the formula:
- n = 10
- X = 3
- p = 0.1667
For the number of combinations, we have:
Now, let’s enter our values into the binomial distribution formula.
This calculation by hand confirms the previous statistical software results within rounding error.
If you need to calculate the cumulative binomial probability for a range of outcomes, calculate the probability for each individual outcome and then sum the probabilities for all outcomes of interest.
For example, if you want to calculate the probability of ≥ 3 sixes in 10 rolls, calculate the probability for three sixes, four sixes, etc., on up to ten sixes. Then sum that set of probabilities.
In the calculator example, there is an 86.7% chance of having ≤ 3 statisticians in your sample of 20 people.
Finally, the binomial and beta distributions are closely related. Click the link to learn more!