What is a Hypergeometric Distribution?
The hypergeometric distribution is a discrete probability distribution that calculates the likelihood an event happens k times in n trials when you are sampling from a small population without replacement.
This distribution is like the binomial distribution except for the sampling without replacement aspect. When you sample without replacement, the probabilities change with each subsequent trial. Conversely, the binomial distribution assumes the chances remain constant over the trials.
For instance, when you draw an ace from a deck of cards, the probability decreases for drawing another ace on the next draw because the deck has fewer aces.
The hypergeometric distribution can answer the following questions. What is the probability of getting:
- Two red candies when we draw five candies from a jar containing five red candies and 10 white candies.
- Drawing five cards of the same suit from a regular deck of cards.
- 8 women on a jury of 13 people when drawing randomly from a jury pool of 50 people evenly split between men and women?
As the population size increases, the hypergeometric distribution more closely approximates the binomial distribution. This distribution is an example of a Probability Mass Function (PMF) because it calculates likelihoods for discrete random variables.
In this post, learn how to use the hypergeometric distribution and its cumulative form, when you can use it, its formula, and how to calculate probabilities by hand. I also include a hypergeometric distribution calculator that you can use with what you learn. I’ll walk you through the formulas for calculating hypergeometric distribution probabilities.
For more information about other ways to use binary data, read my posts, Maximize the Value of Your Binary Data, the Bernoulli, Binomial, Negative Binomial, and the Geometric Distribution.
Hypergeometric Probabilities
The hypergeometric distribution models the probabilities for exactly k events occurring in n trials when you know the composition of a small population. Let’s look at an example to bring it to life!
I’ll start by using statistical software to calculate the hypergeometric probabilities and create distribution plots. This process will help you understand what you can learn from it. Then we’ll move on to the hypergeometric distribution formula.
Suppose we’re interested in the possible outcomes for a jury selection. We want to know the probability of drawing 8 women for a jury of 13 when there are 25 female and 25 male candidates. For this example, assume the jurors are randomly selected from the pool of candidates.
We’ll need the following information to solve this problem:
- Total population size is 50 candidates (N).
- Number of events (Female) in the population (all candidates) is 25 (K)
- The jury size is 13 (n).
- Outcome of interest is selecting 8 women (k).
The hypergeometric distribution accounts for how the probabilities change with each selection. As we select men and women from the candidate pool, it affects the makeup of the remaining population in the pool because there are no replacements. Each woman we choose reduces the number of women in the candidate pool, thus lowering the likelihood that the following selection will be a woman. Conversely, selecting a man increases the chances that the next juror will be a woman.
Example Results
My statistical software tells me that the likelihood is:
The hypergeometric probability distribution calculates a likelihood of 0.161934 for selecting eight women in 13 draws.
That’s interesting but perhaps not so helpful by itself. We’re also interested in the chances of selecting other numbers of female jurors. Seeing the distribution of probabilities for different numbers is much more helpful.
Related post: Understanding Probability Distributions
Hypergeometric Distribution Graph
The hypergeometric distribution graph is helpful because it displays the probability of differing numbers of successes (k) out of the total number of trials (n). In the chart below, the distribution plot finds the likelihood of selecting exactly no women, 1 woman, 2 women, 3 women, . . ., and up to 13 women in the 13 selections. With this approach, the hypergeometric distribution graph covers the complete range of possible successes up to the total number of trials.
I like these graphs because they emphasize how we’re working with a distribution, and it’s easy to see which values happen more frequently. The graph below does not show the chances for fewer than 2 or more than 11 because those likelihoods are too low to display on the chart.
In the chart, each bar represents the probability of selecting a specific number of women during the 13 selections. The bar for 8 corresponds with the probability (0.161934) shown in the output above. At a glance, we can see that selecting 6 or 7 female jurors are the most likely outcomes with both having equal probabilities of approximately 0.24.
Hypergeometric Cumulative Distribution Function
The hypergeometric distribution is excellent for understanding the likelihood of obtaining an exact number of events (k) within a certain number of trials (n) for a small population without replacement. However, you’re often not interested in just one specific number of outcomes.
For example, in the jury selection example above, you might want to learn the probability of selecting at least eight women.
Let me introduce you to the hypergeometric cumulative distribution function.
Technically, the hypergeometric cumulative probability calculates the likelihood of obtaining less than or equal to k events in n trials. Use the inverse cumulative distribution when you need to get a ≥ probability. These days, most statistical software will let you indicate the direction of the cumulative function for the hypergeometric distribution. I’ll use the hypergeometric distribution graph again to show you how it works.
For example, we want to know the chances of selecting ≥ 8 women in 13 attempts. Below, the shaded region shows the inverse cumulative probability of choosing at least eight women in 13 draws.
The likelihood of randomly choosing eight or more women in 13 selections is 0.2601, approximately 1 in 4.
Learn more about Cumulative Distribution Functions: Uses, Graphs & vs PDF.
Hypergeometric Distribution Calculator
Use my hypergeometric distribution calculator below to calculate probabilities and cumulative probabilities. Click the link for its standalone page that you might want to bookmark.


