What is a Geometric Distribution?
The geometric distribution is a discrete probability distribution that calculates the probability of the first success occurring during a specific trial. In other words, during a series of attempts, what is the probability of success first occurring during each attempt? Use this distribution when you need to understand how many attempts are necessary to produce the first successful outcome.
For example, the geometric distribution can answer the following questions. What is the probability of the first:
- Six in a series of die rolls?
- Year of catching the flu over the years?
- Person to support a law during a repeated sampling for an interview?
- Product to have a defect in a random sample from an assembly line?
- Successful attempt for a project or task?
In this post, learn when to use the geometric distribution and its cumulative form, about its formula, and how to calculate probabilities by hand. I also include a geometric distribution calculator that you can use with what you learn.
Related post: Understanding Probability Distributions
The Geometric Distribution is Memoryless
The geometric distribution is “memoryless.” Memoryless is a distribution attribute indicating that the occurrence of the next success does not depend on when the last success occurred or when you start looking for successes. This type of process has independent events that occur with a constant probability.
For example, when you start observing die rolls looking for the next six, the probability of when the next six appears does not depend on when the last six appeared or when you start watching for sixes.
Both the geometric distribution and exponential distribution are memoryless. Learn more about the Exponential Distribution.
Using the Geometric Distribution
The geometric distribution models the probabilities for the first event occurring during various trials when the likelihood of an event is known. Let’s bring it to life with an example!
I’ll start by using statistical software to calculate the geometric distribution probabilities and create distribution plots. This progression will help you know what you can learn from this distribution.
Imagine you’re playing a game where rolling sixes on a die is beneficial. You’d like to know the probability of when you will throw the first six. Suppose you want to learn the chance of getting the first six on precisely the third roll. In this example, the probability (p) is 1/6 = 0.1667, and we’re interested in the third roll.
My software displays the probability as the following:
The geometric distribution calculates that the probability of the first six occurring on the third die roll is 0.115755.
Please note the full implications of this scenario. The final probability incorporates the fact that the first two rolls are not sixes, and then the third roll is a six—that’s the only sequence of events allowing the first success to occur on the third attempt.
Consequently, the calculations factor in the probability of the event not occurring on particular trials (failures) and then appearing on a specific attempt (success). Later, you’ll see that the geometric distribution formula incorporates both the probabilities of failure and success.
That single probability is interesting but perhaps not so useful by itself. We’re also interested in the chances of rolling the first six on other rolls.
Graphing the Full Distribution of Outcomes
The geometric distribution is valuable because it can describe the probability of the first event occurring on all possible numbers of trials. Graphing the distribution excels at displaying the larger context for various attempts. Theoretically, the distribution goes up to an infinite number of trials, but we’ll stop before then!
Returning to the die example, we’ll use the geometric distribution to find the probability of rolling the first six on various rolls.
The geometric distribution graph below displays the probability of rolling the first six in precisely 1, 2, 3, etc. rolls, up to 30.
Each bar in the geometric distribution graph indicates the probability of rolling the first six on a specific trial. For instance, the likelihood of rolling the first six on the third roll is 0.115755, corresponding to the preceding statistical output. The distribution continues beyond 30 rolls, but the probabilities become infinitesimal.
Geometric Cumulative Distribution Function
The geometric distribution is superb for understanding when an event might first occur.
However, many times you’re not interested in a specific trial. For example, you might be interested in the probability of getting your first six within the first six rolls. In that case, you’re interested in the total probability across multiple trials.
Let me introduce you to the geometric cumulative distribution function. The cumulative distribution simply sums the probabilities for a range of trials. Again, a geometric distribution graphs brings it to life.
Technically, the geometric cumulative probability calculates the likelihood of obtaining the first event in less than or equal to N trials. If you need a ≥ probability, use the inverse geometric cumulative distribution. These days, most statistical software lets you specify the direction.
For our example, we want to learn the cumulative probability that the first 6 appears within the first six rolls (i.e., ≤ 6). In the geometric distribution graph below, the shaded region displays the cumulative probability.
Interestingly, you might think you’re virtually guaranteed to get a 6 when you roll the die six times. However, the red shaded region in the geometric distribution graph indicates you have a two-thirds (66%) cumulative chance of getting the first six within the first six rolls. This cumulative probability sums the individual likelihoods of the first six die rolls.
For a real-world example, learn how I use the geometric distribution to model the number of years to the first flu infection for the vaccinated vs. unvaccinated.
For information about other distributions for binary data, read my posts, Maximize the Value of Your Binary Data, the Binomial Distribution, the Negative Binomial Distribution, and the Hypergeometric Distribution.
Geometric Distribution Assumptions and Notation
The geometric distribution models the probabilities for the first success occurring on the Xth trial. However, your data must meet the following requirements for the geometric distribution to be appropriate.
- Your data must be binary: For example, infected or uninfected, 6 or not 6, pass or fail, etc.
- Independent trials: One trial’s result does not affect the next trial. For example, a coin toss doesn’t affect the following coin toss.
- The probability remains constant over time. In some contexts, this supposition is true due to the physical attributes of the process, such as coin tosses and die rolls. However, the likelihood won’t necessarily remain constant in other contexts. For example, the probability of a product defect at a manufacturing plant can change over time. Use a P chart (a control chart) to verify this assumption when the chance can change.
Geometric Experiments and Bernoulli Trials
Typically, you’ll use the geometric distribution when you have Bernoulli Trials. These trials satisfy the binomial distribution assumptions above. In these trials, analysts label one outcome a success and the other a failure.
A Bernoulli trial contains attempts where the probability of success is constant. Geometric experiments perform a series of attempts until the first success.
Two Forms of the Geometric Distribution
Note that there are two forms of the geometric distribution. The two forms model:
- The first success occurring on the Xth trial.
- The number of failures before the first success.
Fortunately, these two forms are equivalent. The difference comes down to how you count the trials. The former definition counts all trials, including the final, successful one. The latter definition counts only the failures before the success. Success on the Xth trial is equivalent to having X – 1 failures. Statisticians sometimes refer to the first form as the shifted geometric distribution because it moves the final trial over by counting the success.
For the die example, I calculated the probability of first rolling a six on the third trial. Using the number of failures form, we don’t count the success trial, just the preceding failures. Hence, if we have success on the third roll, that means we have two failures, and those two forms of the geometric distribution calculate the same probability.
I use the success on the Xth trial version throughout this post, except for the geometric distribution calculator section, where we need to convert to the number of failures form.
Parameter and Notation
The geometric distribution has one parameter, p = the probability of success for each trial. You denote the distribution as G(p), which indicates a geometric distribution with a success probability of p.
Geometric Distribution Calculator
Use this geometric distribution calculator to calculate probabilities and cumulative probabilities. Note that it uses the number of failures version of the geometric distribution.
For the die example, I calculated the probability of the first success of rolling a six on the third trial as 0.115755. Using the number of failures form, we don’t count the success trial, just the preceding failures. If we have success on the third roll, that means we have two failures, which we enter in this calculator.
In the calculator, enter Number of failures = 2 and Probability of success = 0.1667. The calculator displays a probability of 0.1157546, matching our results above within rounding error.
Now, try one yourself. Suppose you have a team working on a project, and you give them four attempts to complete the project successfully due to time constraints. It’s a very complex task, and they have only a 30% chance of correctly completing the project on each attempt. If they fail, they’ll need to try again. What is the cumulative probability of succeeding on attempts 1 – 4? Remember, for this calculator, you need to convert to the number of failures. Answer at the end!
Geometric Distribution Formula
Generally, you’ll use statistical software or online calculators to calculate probabilities for the geometric distribution. However, I’ll teach you how to use the geometric distribution formula so you can calculate them manually. Additionally, I’ll walk you through the formula with a worked example.
The geometric distribution formula for the probability of the first success occurring on the Xth trial is the following:
- x is the number of trials.
- p is the probability of a success for each trial.
The geometric distribution formula takes the probability of failure (1 – p) and raises it by the number of failures (x – 1). That produces the likelihood of having failures for all trials before the trial of interest (x). Then the equation multiplies the probability of failure by the probability of success (p) occurring on the trial of interest. That gives you the probability of the first success happening on the Xth trial.
Let’s work through an example calculation to bring the formula to life!
Worked Example of Finding a Geometric Distribution Probability
We’ll use the geometric distribution formula to calculate the probability of rolling the first six on the third roll. That’s the example we used before, and now we’ll calculate it by hand. Here are the values to enter into the formula:
- x = 3
- p = 0.1667
This calculation by hand confirms the previous statistical software results within rounding error.
If you need to calculate the cumulative geometric probability for a range of trials, calculate the probability for each attempt and then sum the probabilities.
For example, if you need to calculate the probability of the first six occurring in the first six rolls, calculate the likelihood of it happening on rolls 1, 2, 3, 4, 5, and 6. Then sum those probabilities.
Finally, the solution for the problem in the geometric distribution calculator section is that the cumulative probability of completing the project successfully during attempts 1 – 4 is 0.7599. To solve this problem:
- Enter 0.3 for the Probability of success.
- In Number of failures, enter 0, 1, 2, and 3 individually and record each probability.
- Sum those probabilities.