The Monty Hall Problem is where Monty presents you with three doors, one of which contains a prize. He asks you to pick one door, which remains closed. Monty opens one of the other doors that does not have the prize. This process leaves two unopened doors—your original choice and one other. He allows you to switch from your initial choice to the other unopened door. Do you accept the offer?
If you accept his offer to switch doors, you’re twice as likely to win—66% versus 33%—than if you stay with your original choice.
The solution to the Monty Hall Problem is tricky and counter-intuitive. It did trip up many experts back in the 1980s. However, the correct answer to the Monty Hall Problem is now well established using a variety of methods. It has been proven mathematically, with computer simulations, and empirical experiments, including on television by both the Mythbusters (CONFIRMED!) and James Mays’ Man Lab. You won’t find any statisticians who disagree with the solution.
In this post, I’ll explore aspects of this problem that have arisen in discussions with some stubborn resisters to the notion that you can increase your chances of winning by switching!
The Monty Hall problem provides a fun way to explore issues that relate to hypothesis testing. I’ve got a lot of fun lined up for this post, including the following!
- Using a computer simulation to play the game 10,000 times.
- Assessing sampling distributions to compare the 66% percent hypothesis to another contender.
- Performing a power and sample size analysis to determine the number of times you need to play the Monty Hall game to get an answer.
- Conducting an experiment by playing the game repeatedly myself, record the results, and use a proportions hypothesis test to draw conclusions!
I won’t re-explain the logic behind how the Monty Hall Problem works in this post. To learn about that, read my other post about the Monty Hall Problem.
Motivations for Writing this Post
Despite the universal acceptance among statisticians, there are stubborn resisters to the solution. They’re convinced that you have a 50% chance of winning by switching rather than a 66% chance. My response has been, “test it empirically by playing it yourself!” It’s a simple enough experiment to perform on your own. You just need a friend and play the game multiple times. It sounds fun!
However, it dawned on me that, in the minds of the doubters, we need to establish that the winning percentage is 66% rather than 50%. Of course, in reality, the difference is 33% to 66%, but they disagree with that notion. That 16% difference is small enough to be difficult to detect through an experiment. A small sample size won’t reveal that difference with an acceptable degree of confidence.
I’ll use this problem to highlight various statistical concepts that relate to answering this question. Plus, it gives me another opportunity to use the Statistics 101 simulation giftware, which I love using and recommend! It’s free to use, but they do ask for a donation. I’ve used this application to illustrate how both bootstrapping and the central limit theorem works in statistics.
I’ll start by using Statistics 101 to simulate the Monty Hall Problem thousands of times to show the solution that way. Then, I’ll highlight the difficulty in discriminating between a 50% and 66% chance of winning using sampling distributions. I’ll also perform a power and sample size analysis to determine what sample size I should recommend to the doubters! Finally, I’ll conduct an empirical experiment and use a hypothesis test.
Simulating the Monty Hall Problem
Conveniently, the creators of Statistics 101 include a variety of example scripts with their software, including one that simulates the Monty Hall Problem. Now, I’ve been told by deniers that any simulation that shows switching produces a 66% chance of winning must be a case of “garbage in, garbage out.” So, I’ll briefly explain the script below. It’s pretty straightforward . . . and garbage free!
- Defines the door arrangement.
- Specifies the number of times to play the game.
- The software randomly assigns the prize to one of the doors.
- The simulated contestant randomly chooses a door.
- The software records the result for both staying and switching.
After that, it’s a simple matter to take the two counts of wins and convert them to winning percentages for both staying and switching.
So, with no further ado, let’s run the Monty Hall game 10,000 times. And, the answer is!
After playing the game 10,000 times and switching every time, the simulated contestant won 66.36% of the times. Not far from the predicted percentage at all!
Distinguishing between Winning Percentages of 50% and 66%
Case closed, right? I can still hear some complaints of “garbage in, garbage out”—even though that’s not true. Consequently, let’s suppose we still want to demonstrate this solution empirically. As I mentioned earlier, it can be difficult detecting that small difference.
In comparing winning percentages of 66% to 50%, the difficulty is that we’re dealing with binary data (Win/Lose) and a binomial distribution of outcomes. 50 and 66 are just the average percentage or expected value that you expect over many repetitions. However, just like flipping a coin multiple times, there’s a distribution of outcomes around the mean. If you flip a coin 10 times, you don’t expect that it’ll always be heads-up precisely 50% of the time. If you play the game, always switch, and end up winning 58% of the time, how do you determine whether the expected winning percentage was 50% or 66%?
That’s what we’re going to tackle in this post!
Graphing the Sampling Distributions for the Monty Hall Problem
I could use the binomial distribution to illustrate how this works. However, to use that approach, I’d need to enter the expected winning percentage for switching—which is under contention. So, in the spirit of not assuming the 66% value to be true, I’ll continue using the simulation software and have it play the game and create the sampling distributions using the game outcomes. To do this, I’ve modified the script that they supply to run samples of different sizes many different times. This process allows us to see the distribution of sample winning percentages in a similar process that I’ve used for my posts about bootstrapping and the central limit theorem.
These distributions are sampling distributions and highlight the spread of sample winning percentages you expect for samples of different sizes. I’ll use sample sizes of 10, 25, 50, 100, and 400. Keep in mind that I’m running each sample size 100,000 times. For instance, the software will run the experiment with 10 trials per sample, calculate the winning percentage over those 10 trials, save the winning percentage, and then repeats that process for a total of 100,000 times. Then, the software graphs the distribution of winning percentages for all 100,000 samples. Then, I modify the script to have it do the same with samples that contain 25 trials, and so on.
With all that in mind, the following graphs show the sampling distributions of winning percentages for the Monty Hall game as it is actually played in the simulator and for an alternative game where the expected winning percentage is 50%. The graphs show how the outcomes can overlap, which makes distinguishing the correct winning percentage difficult. This problem is particularly evident with smaller sample sizes where the distribution spreads are wider. As you increase the sample size, the spreads narrow and it becomes easier determining which process produces an observed outcome.
Understanding the Sampling Distribution Graphs
In the following graphs, each bar represents a winning percentage. These are discrete distributions because there are a limited number of possible winning percentages. For example, with a sample size of 10, winning percentages can be only (1/10) 10%, (2/10) 20%, (3/10) 30%, and so on. The height of the bar represents the probability of obtaining that particular winning percentage.
The grey bars represent the 50% chance of winning process I’ve added to the script, which is essentially a coin toss. The red bars represent the simulated Monty Hall game for a player who always switches. Notice how the gray bars center on 50% while the red bars center on 66%.
The goal of these graphs is to show how large of a sample size we need to conclude that switching in the Monty Hall game causes you to win more than 50% of the time. If you were performing an experiment, you would set the number of trials you’ll conduct in advance, run all the trials, and calculate one winning percentage. You can then locate the percentage you obtain on these charts and determine which distribution is more likely to have produced the result you observe. It’s starting to become kind of like a hypothesis test—which we’ll get to later in this post!
Please note that the axes’ scaling changes. I’ve been unable to find a way to keep it consistent for easier comparisons!
Monty Hall Problem Sample size = 10
This graph shows how the two distributions overlap substantially, which indicates that you’d frequently obtain similar results for expected winning percentages of 50% and 66%. In fact, the probability of winning 60% of the time is nearly equal for both distributions. It’s too small of a sample to be able to distinguish which one is correct unless you get a very low or very high value.
If you had a winning percentage of 80% or higher, it’s improbable that the 50% process produced it because the grey bar is so short at that percentage. Conversely, it’s implausible that the 67% Monty Hall process would yield a 30% winning percentage because the red bar is so short at that percentage.
Monty Hall Problem Sample size =25
As the sample sizes increase, the sampling distributions start to narrow. While there is still a significant degree of overlap, we’re starting to see two distinct distributions. However, either distribution is roughly equally as likely to produce winning percentages near 60%. You’d need a winning percentage greater than 68% or less than 48% to decide.
Monty Hall Problem Sample size = 50
The sampling distributions continue to narrow as the sample size increases. However, there is still a moderate amount of overlap. Sample percentages greater than 62% are very unlikely to exist if the expected percentage of switching is truly 50%. Notably, the sample size is becoming large enough to determine that the Monty Hall distribution centered on 66% is more likely to have produced a percentage even when it is smaller than the expected winning percent (62% vs 66%). Large sample sizes are great! On the other hand, winning percentages less than 56% are unlikely to be created if switching in the Monty Hall game has an expected winning percentage of 66%. The zone of uncertainty between those two values continues to shrink.
Monty Hall Problem Sample size = 100
With this large of a sample, there is a small overlap. There’s only a 4.4% chance that a 50% process would have a winning percentage higher than 58% with a sample size of 100 trials while the Monty Hall process centered on 66% only has a 4.3% chance of having a winning percentage less than 58%.
Monty Hall Problem Sample size = 400
Finally, with a huge sample size of 400 trials, you can clearly see the separate distributions. It’s virtually impossible for a coin flip 50/50 to have a wining percentage greater than 55% with a sample size of 400. Meanwhile, it’s almost impossible for the Monty Hall distribution to have a winning percentage lower than 62%. Neither distribution produces many outcomes in this no-man’s land.
Power and Sample Size Analysis for the Monty Hall Problem
Ok, we’ve had our fun playing with the sampling distributions. Now, let’s get ready to perform a hypothesis test to answer this question. Hypothesis tests use sample data to draw conclusions about a population.
We’re performing an experiment, and our strategy will always be to switch. We’ll use a one-sample proportion test to determine whether we can reject the notion that the Monty Hall game follows the distribution that centers on 50%.
Before we perform this experiment, let’s do a power and sample size analysis to determine a sample size that will give us sufficient power to detect this difference. I’ll estimate the sample size necessary to produce 80%, 90%, and 95% power.
I’ve told the software that I want to perform a One Proportion test and entered estimates for our expected proportion (0.67) and the comparison proportion (0.5). If this difference exists in the population, statistical power indicates the probability that our experiment will detect it. Power analyses are beneficial because they help you collect a large enough sample to detect a difference if it exists, but stops you from collecting an overly large sample because that can cost time and money.
The power and sample size results are below.
The output indicates that to obtain statistical power of 0.8, 0.9, and 0.95, we’d need sample sizes of 66, 87, and 107, respectively.
Statistical power of 80% is a standard benchmark. However, because obtaining a larger sample doesn’t cost more money, just a bit more time, I’m looking into higher levels of power. In the end, I’ll go with an even 100 samples, which produces a power of 93.7% (not shown).
Performing the Monty Hall Experiment at Home
My daughter and I used three playing cards to play the Monty Hall game at home. One card represented the prize door, and the other two were non-prize doors. My daughter was Monty, and I was the contestant. We played it 100 times.
For each trial, she’d randomize the three cards and then place them face down in a row while noting the location of the prize. I’d pick my card, and she turned over one of the other two cards while taking care not to reveal the prize. I’d always switch to the other card and record the results, which you can download in this CSV data file: MontyHallExperiment.
In our experiment, I won 64 times out of 100. I performed the One Proportion test by telling the software to compare the sample results (0.64) to the comparison proportion of 0.5.
The hypotheses for this test are the following:
- Null: The winning percentage for always switching equals 50%.
- Alternative: The winning percentage for switching does not equal 50%.
If we obtain a p-value that is less than our significance level (0.05), we can reject the null hypothesis and conclude that the expected winning percentage for switching in the Monty Hall game does not equal 50%.
The low p-value indicates that we can reject the null hypothesis and conclude that the expected winning percentage for switching in the Monty Hall Problem is not 50%. That’s consistent with what we know about the probabilities in the underlying game.
In closing, I do not doubt that the expected winning percentage is genuinely 66%. You can prove it using probabilities and logic. I think most of us accept this truth even though it’s admittedly a head scratcher at first! This problem was a useful way to illustrate various principles involved in hypothesis testing and a great way to showcase the proportions hypothesis test.
If you still don’t accept it, grab a friend, three playing cards, and play the game 100 times!