What is a Binomial Distribution?
The binomial distribution is a discrete probability distribution that calculates the likelihood an event will occur a specific number of times in a set number of opportunities. Use this distribution when you have a binomial random variable. These variables count how often an event occurs within a fixed number of trials. They have only two possible outcomes that are mutually exclusive.
For example, the binomial probability distribution can answer the following questions. What is the probability of getting:
- Six heads when you toss the coin ten times?
- 12 women in a sample size of 20?
- Three defective items in a batch of 100?
- Two flu infections over 20 years?
This distribution is an example of a Probability Mass Function (PMF) because it calculates likelihoods for discrete random variables.
In this post, learn how to use the binomial distribution and its cumulative form, when you can use it, its formula, and how to calculate binomial probabilities by hand. I also include a binomial calculator that you can use with what you learn. I’ll walk you through the formulas for calculating the mean, variance, and probabilities for the binomial probability distribution.
For more information about how to use binary data, read my posts, Maximize the Value of Your Binary Data, the Negative Binomial Distribution, the Geometric Distribution, and the Hypergeometric Distribution.
Binomial Probabilities
The binomial distribution models the probabilities for exactly X events occurring in N trials when the probability of an event is known for a binomial random variable. Let’s get into some examples because that brings it to life!
I’ll start by using statistical software to calculate the binomial probabilities and create distribution plots. This process will help you understand what you can learn from it. Then we’ll move on to the binomial distribution formula.
Suppose you’re playing a game where rolling sixes on a die is really good. You want to know the probability of rolling exactly three sixes in ten die rolls. In this example, the number of events is 3 (X), the number of trials is 10 (N), and the probability (p) is 1/6 = 0.1667.
My software tells me that the likelihood is:
The binomial probability distribution calculates a likelihood of 0.155095 for rolling precisely three sixes in ten rolls.
That’s interesting but perhaps not so helpful by itself. We’re also interested in the chances for rolling other numbers of sixes. Seeing the distribution of probabilities for different numbers of sixes is much more helpful.
Binomial Distribution Graph
The binomial distribution graph is useful because it displays the probability of differing numbers of successes (Xs) out of the total number of trials (N). In the graph below, the distribution plot finds the likelihood of rolling exactly no sixes, 1 six, 2 sixes, 3 sixes, . . ., and up to 10 sixes in the ten die rolls. Using this approach, the binomial distribution graph covers the complete range of possible successes up to the total number of trials.
I like these graphs because they emphasize how we’re working with a distribution, and it’s easy to see which values happen more frequently.
In the chart, each bar represents the probability of rolling a specific number of sixes out of ten die rolls. The graph does not show the chances for seven and higher because the likelihoods of that many sixes in just ten rolls are too low to display on the chart.
The binomial distribution graph indicates the probability of rolling no sixes is about 16%. The highest chance is rolling one six (32%). Although, rolling two sixes occurs almost as frequently. Probabilities drop off quickly starting with three sixes. Additionally, the bar for three sixes matches our earlier result of 0.155095.
Related post: Understanding Probability Distributions
Binomial Cumulative Distribution Function
The binomial probability distribution is excellent for understanding the likelihood of obtaining an exact number of events (X) within a certain number of trials (N). However, many times you’re not interested in just one specific value for a binomial random variable. For example, in the die rolling example above, you might know from experience that rolling three or more sixes within ten rolls means you’re doing well. So, you actually want to learn the probability of rolling at least three sixes.
Let me introduce you to the binomial cumulative distribution function.
Technically, the binomial cumulative probability calculates the likelihood of obtaining less than or equal to X events in N trials. If you need to obtain a ≥ probability, use the inverse cumulative distribution. These days, statistical software will generally let you specify the direction of the cumulative function for the binomial distribution from the start. I’ll use the binomial distribution graph again to show you how it works.
For our example, we want to know the chances of rolling ≥ 3 sixes in 10 rolls. Below, the shaded region shows the inverse cumulative probability of rolling at least three sixes in ten die rolls.
The likelihood for rolling three or more sixes in ten rolls is 0.2249, not quite 1 in 4.
For a real-world example, see how I’ve used the binomial distribution to model the number of flu infections (X) for the vaccinated vs. unvaccinated over 20 years (N).
Learn more about Cumulative Distribution Functions: Uses, Graphs & vs PDF.
Binomial Distribution Assumptions and Notation
The binomial distribution models the probabilities for a binomial random variable having exactly X successes occurring in N trials. Your variable must satisfy the following requirements to be a binomial random variable. The binomial distribution is appropriate only for data that fulfill these assumptions.
- There must be only two possible outcomes per trial. For example, defective or not defective, sale or no sale, pass or fail, etc.
- The trials are independent. One trial’s outcome does not affect the subsequent trial. For instance, one coin toss doesn’t affect the result of the following coin toss. Learn more about Independent Events.
- The probability remains constant over time. In some areas, this assumption is true due to the physical characteristics of the process, such as coin tosses and die rolls. However, the probability won’t necessarily remain constant in other contexts. For example, the likelihood that a manufacturing process creates defective parts can change over time. If the probability can change, use the P chart (a control chart) to confirm this assumption.
Bernoulli Trials
Typically, you’ll use the binomial distribution when you have Bernoulli Trials, also known as Binomial Experiments. These trials involve binomial random variables that satisfactorily follow the assumptions above. In these trials, analysts label one of the possible outcomes as a success and the other outcome a failure.
A Bernoulli trial contains a set number of trials where the probability of a success is constant. The experiment counts the number of successes (X) out of the total number of trials (N).
You can think of the binomial probability distribution as modeling the number of successes (X) in a sample size of N.
Parameters and Notation
The binomial distribution has two parameters, n and p.
- n: the number of trials.
- p: the event or success probability.
You denote a binomial distribution as b(n,p).
Alternatively, you can write X∼b(n,p), which means that your binomial random variable X follows a binomial probability distribution with n trials and an event probability of p.
The previous examples assess probabilities corresponding with rolling sixes in a series of 10 die rolls. In this scenario, success is rolling a six, while a failure is rolling anything other than a six. The probability of rolling a six is 1/6 = 0.1667.
If rolling sixes is our random variable X, and we roll the die ten times, we can use the following notation for the binomial distribution:
X∼b(10,0.1667)
Binomial Distribution Calculator
Use this binomial distribution calculator to calculate the binomial probabilities and cumulative probabilities. Note that it uses “events” to indicate the number of trials (n).
Let’s use this calculator to recreate the preceding die examples. In the calculator, enter Number of events (n) = 10, Probability of success per event (p) = 16.67%, choose exactly r successes, and Number of successes (r) = 3. The calculator displays a binomial probability of 15.51%, matching our results above for this specific number of sixes.
Next, change exactly r successes to r or more successes. The calculator displays 22.487, matching the results for our example with the binomial inverse cumulative distribution.
Now, try one yourself. Imagine you’re drawing a random sample of 20 from a population where 10% are statisticians. You’re hoping that your study will have 3 or fewer statisticians because they’ll gang up and ask too many pesky questions about your study design. What is the likelihood of obtaining ≤ 3 statisticians?
See the correct answer at the end of this post. Next, onto the formula for those who want to calculate the probabilities manually.
Binomial Distribution Formula
Typically, you’ll use statistical software or online calculators to calculate the probabilities for the binomial distribution. However, I’ll show you the binomial distribution formula to calculate them manually. The following formulas show you how to calculate the mean, variance, and probabilities for binomial distributions. Additionally, I’ll walk you through the formulas with worked examples.
Mean of Binomial Distribution
Let’s start with the formula for the mean of the binomial distribution.
n * p
Multiply the number of trials by the success probability. This value represents the average or expected number of successes.
For example, we roll the die ten times, and the probability of rolling a six is 0.1667.
10 * 0.1667
The mean for this binomial distribution is 1.667. On average, we’d expect to roll that many sixes in ten rolls. Of course, the actual counts of successes will always be either zero or a positive integer.
Variance of Binomial Distribution
The formula for the variance of the binomial distribution is the following:
σ2 = npq
As before, n and p are the number of trials and success probability, respectively. Q is the failure probability, which equals 1-p.
Notice that the variance of the binomial distribution is at its maximum when the probabilities for success and failure are both 0.5. As those probabilities move away from 0.5 in opposite directions, the variance decreases. Additionally, the variance also increases as the number of trials increase.
For our die example we have n = 10 rolls, a success probability of p = 0.1667, and a failure probability of q = 0.833.
10 * 0.1667 * 0.8333 = 1.3891
The variance for this binomial distribution is 1.3891.
The variance of the binomial distribution represents the variability of the probabilities around the mean of the binomial distribution. Variances use squared units. Learn more about Variances. The standard deviation is the square root of the variance of the binomial distribution.
Binomial Distribution Formula
The binomial distribution formula is the following:
where:
- n is the number of trials.
- X is the number of successes
- p is the probability of a success.
Use this formula to calculate the binomial probability for X successes occurring in n trials.
nCx is the number of ways to obtain samples with the specified number of successes occurring within the set number of trials where the order of outcomes does not matter. Specifically, it’s the number of combinations without repetition. For more information, read my post about Finding Combinations.
The binomial distribution formula takes the number of combinations, multiplies that by the probability of success raised by the number of successes, and multiplies that by the probability of failures raised by the number of failures.
Let’s work through an example calculation to bring the formula to life!
Worked Example of Finding a Binomial Probability
We’ll use the binomial distribution formula to calculate the chances of rolling exactly three sixes in ten die rolls for this example. Here are the values to enter into the formula:
- n = 10
- X = 3
- p = 0.1667
For the number of combinations, we have:
Now, let’s enter our values into the binomial distribution formula.
This calculation by hand confirms the previous statistical software results within rounding error.
If you need to calculate a cumulative probability for a binomial random variable, calculate the likelihood for each individual outcome and then sum them for all outcomes of interest.
For example, if you want to calculate the probability of ≥ 3 sixes in 10 rolls, calculate the likelihoods for three sixes, four sixes, etc., on up to ten sixes. Then sum that set of binomial probabilities.
In the calculator example, there is an 86.7% chance of having ≤ 3 statisticians in your sample of 20 people.
Finally, the binomial and beta distributions are closely related. Click the link to learn more!
Is there some way to combine binomial distributions? Here’s an example. Ann, Bob, and Carol are shooting threes on a basketball court. Ann takes 50 shots and has a 30% success rate. Bob takes 30 shots and has a 20% success rate. Carol takes 20 shots and has a 10% success rate. I can use the cumulative binomial distribution to calculate the chance that Ann makes 10 or more shots or that Bob makes 10 or more shots. How do I calculate the probability that the three of them combine to make 20 or more shots?
Would binomial distributions be suitable for determining the probability of a prisoner re-offending once released from prison? Thank you.