What is the Binomial Distribution?
The binomial distribution is a discrete probability distribution that calculates the likelihood an event will occur a specific number of times in a set number of opportunities. Use this distribution when you have a binomial random variable. These variables count how often an event occurs within a fixed number of trials. They have only two possible outcomes that are mutually exclusive.
For example, the binomial probability distribution can answer the following questions. What is the probability of getting:
- Six heads when you toss the coin ten times?
- 12 women in a sample size of 20?
- Three defective items in a batch of 100?
- Two flu infections over 20 years?
This distribution is an example of a Probability Mass Function (PMF) because it calculates likelihoods for discrete random variables. It is an extension of the Bernoulli distribution that can model only 1 trial.
In this post, learn how to use the binomial distribution, its cumulative form, and when you can use it. I also include a binomial calculator that you can use with what you learn.
Note that this post focuses on how to use and graph the binomial distribution. If you want to learn how to calculate the probabilities by hand, please read Binomial Distribution Formula: Probability, Standard Deviation & Mean.
The binomial distribution models the probabilities for exactly X events occurring in N trials when the probability of an event is known for a binomial random variable. Let’s get into some examples because that brings it to life!
I’ll start by using statistical software to calculate the binomial probabilities and create distribution plots. This process will help you understand what you can learn from it.
Suppose you’re playing a game where rolling sixes on a die is really good. You want to know the probability of rolling exactly three sixes in ten die rolls. In this example, the number of events is 3 (X), the number of trials is 10 (N), and the probability (p) is 1/6 = 0.1667.
My software tells me that the likelihood is:
The binomial probability distribution calculates a likelihood of 0.155095 for rolling precisely three sixes in ten rolls.
That’s interesting but perhaps not so helpful by itself. We’re also interested in the chances for rolling other numbers of sixes. Seeing the distribution of probabilities for different numbers of sixes is much more helpful.
Binomial Distribution Graph
The binomial distribution graph is useful because it displays the probability of differing numbers of successes (Xs) out of the total number of trials (N). In the graph below, the distribution plot finds the likelihood of rolling exactly no sixes, 1 six, 2 sixes, 3 sixes, . . ., and up to 10 sixes in the ten die rolls. Using this approach, the binomial distribution graph covers the complete range of possible successes up to the total number of trials.
I like these graphs because they emphasize how we’re working with a distribution, and it’s easy to see which values happen more frequently.
In the chart, each bar represents the probability of rolling a specific number of sixes out of ten die rolls. The graph does not show the chances for seven and higher because the likelihoods of that many sixes in just ten rolls are too low to display on the chart.
The binomial distribution graph indicates the probability of rolling no sixes is about 16%. The highest chance is rolling one six (32%). Although, rolling two sixes occurs almost as frequently. Probabilities drop off quickly starting with three sixes. Additionally, the bar for three sixes matches our earlier result of 0.155095.
Related post: Understanding Probability Distributions
Binomial Cumulative Distribution Function
The binomial probability distribution is excellent for understanding the likelihood of obtaining an exact number of events (X) within a certain number of trials (N). However, many times you’re not interested in just one specific value for a binomial random variable. For example, in the die rolling example above, you might know from experience that rolling three or more sixes within ten rolls means you’re doing well. So, you actually want to learn the probability of rolling at least three sixes.
Let me introduce you to the binomial cumulative distribution function.
Technically, the binomial cumulative probability calculates the likelihood of obtaining less than or equal to X events in N trials. If you need to obtain a ≥ probability, use the inverse cumulative distribution. These days, statistical software will generally let you specify the direction of the cumulative function for the binomial distribution from the start. I’ll use the binomial distribution graph again to show you how it works.
For our example, we want to know the chances of rolling ≥ 3 sixes in 10 rolls. Below, the shaded region shows the inverse cumulative probability of rolling at least three sixes in ten die rolls.
The likelihood for rolling three or more sixes in ten rolls is 0.2249, not quite 1 in 4.
For a real-world example, see how I’ve used the binomial distribution to model the number of flu infections (X) for the vaccinated vs. unvaccinated over 20 years (N).
Learn more about Cumulative Distribution Functions: Uses, Graphs & vs PDF.
Binomial Distribution Assumptions and Notation
The binomial distribution models the probabilities for a binomial random variable having exactly X successes occurring in N trials. Your variable must satisfy the following requirements to be a binomial random variable. The binomial distribution is appropriate only for data that fulfill these assumptions.
- There must be only two possible outcomes per trial. For example, defective or not defective, sale or no sale, pass or fail, etc.
- The trials are independent. One trial’s outcome does not affect the subsequent trial. For instance, one coin toss doesn’t affect the result of the following coin toss. Learn more about Independent Events.
- The probability remains constant over time. In some areas, this assumption is true due to the physical characteristics of the process, such as coin tosses and die rolls. However, the probability won’t necessarily remain constant in other contexts. For example, the likelihood that a manufacturing process creates defective parts can change over time. If the probability can change, use the P chart (a control chart) to confirm this assumption.
Typically, you’ll use the binomial distribution when you have Bernoulli Trials, also known as Binomial Experiments. These trials involve binomial random variables that satisfactorily follow the assumptions above. In these trials, analysts label one of the possible outcomes as a success and the other outcome a failure.
A Bernoulli trial contains a set number of trials where the probability of a success is constant. The experiment counts the number of successes (X) out of the total number of trials (N).
You can think of the binomial probability distribution as modeling the number of successes (X) in a sample size of N.
Parameters and Notation
The binomial distribution has two parameters, n and p.
- n: the number of trials.
- p: the event or success probability.
You denote a binomial distribution as b(n,p).
Alternatively, you can write X∼b(n,p), which means that your binomial random variable X follows a binomial probability distribution with n trials and an event probability of p.
The previous examples assess probabilities corresponding with rolling sixes in a series of 10 die rolls. In this scenario, success is rolling a six, while a failure is rolling anything other than a six. The probability of rolling a six is 1/6 = 0.1667.
If rolling sixes is our random variable X, and we roll the die ten times, we can use the following notation for the binomial distribution:
Binomial Distribution Calculator
Use this binomial distribution calculator to calculate the binomial probabilities and cumulative probabilities. Note that it uses “events” to indicate the number of trials (n).
Let’s use this calculator to recreate the preceding die examples. In the calculator, enter Number of events (n) = 10, Probability of success per event (p) = 16.67%, choose exactly r successes, and Number of successes (r) = 3. The calculator displays a binomial probability of 15.51%, matching our results above for this specific number of sixes.
Next, change exactly r successes to r or more successes. The calculator displays 22.487, matching the results for our example with the binomial inverse cumulative distribution.
Now, try one yourself. Imagine you’re drawing a random sample of 20 from a population where 10% are statisticians. You’re hoping that your study will have 3 or fewer statisticians because they’ll gang up and ask too many pesky questions about your study design. What is the likelihood of obtaining ≤ 3 statisticians?
See the correct answer at the end of this post.
Finally, the binomial and beta distributions are closely related. Click the link to learn more!
For more information about how to use binary data, read my posts, Maximize the Value of Your Binary Data, the Negative Binomial Distribution, the Geometric Distribution, and the Hypergeometric Distribution.
In the calculator example, there is an 86.7% chance of having ≤ 3 statisticians in your sample of 20 people.