What is Monte Carlo Simulation?
Monte Carlo simulation uses random sampling to produce simulated outcomes of a process or system. This method uses random sampling to generate simulated input data and enters them into a mathematical model that describes the system. The simulation produces a distribution of outcomes that analysts can use to derive probabilities.
Performing a Monte Carlo simulation requires the following information:
- A function or equation that takes inputs and produces outcomes.
- Probability distributions for all inputs.
To use the Monte Carlo method, analysts need to supply an equation that describes how inputs produce specific outcomes in a process. These mathematical functions were discovered through prior research and represent known physical properties, the results of experimental designs, and regression equations. Additionally, the researchers need to understand the real-world variability of the inputs.
The Monte Carlo model itself is deterministic. Imagine a regression equation that describes a system. For any given set of input values, the model produces a single outcome. However, repeated random sampling introduces natural input variability into the function to create a distribution of simulated results. Monte Carlo simulations typically draw hundreds of thousands of values from the probability distributions for each input.
When researchers perform Monte Carlo analysis correctly, the random sampling process accurately produces combinations of input values, ranging from common to rare, that match the real world. The simulation then determines the relative frequencies of the outcomes these input combinations produce. Analysts can use these results to calculate the probabilities of important outcomes.
In this post, learn why you would use a Monte Carlo simulation, how it can help decision-making, and work through a hands-on example using Excel. Additionally, you can use the step-by-step instructions to create your own simulations!
Why Use Monte Carlo Simulations?
As a seasoned data analyst, the thought of relying on simulated data seemed unconventional initially. How can you enhance the performance of a process or improve decision-making using simulated data in a Monte Carlo simulation?
To improve a process, you could collect a massive amount of data to understand the relationship between input and output variability under different conditions.
However, with a good understanding of the inputs and a model representing the process, you can generate ample simulated input values and apply them to the model to produce a simulated output distribution.
With simulated data, it might sound like you’re getting something for nothing. But you’re actually incorporating a lot of real-world knowledge. Rigorous Monte Carlo simulations require accurate process equations and input variability information—in other words, empirical knowledge that grounds them in reality.
Analysts typically use simulated data when real data collection is impossible or too costly. Stanislaw Ulam developed the Monte Carlo simulation method for the Manhattan project when Uranium was too rare to use for experiments. So, they simulated the data!
One of the beauties of Monte Carlo simulation lies in its ability to change input distributions effortlessly to answer “what-if” questions. In the upcoming example, we’ll manipulate simulated inputs to enhance product quality.
Monte Carlo simulations enable analysts to:
- Account for input variability in product results.
- Optimize process parameters.
- Pinpoint critical quality factors.
- Reduce adverse outcomes.
- Assess risk.
Who Uses Monte Carlo Simulation?
Monte Carlo simulation has become an integral tool in decision-making for companies like General Motors, Proctor and Gamble, Pfizer, Bristol-Myers Squibb, and Eli Lilly. These companies use simulations to estimate both the average return and risk factor of new products, helping determine which ones go to market.
GM uses Monte Carlo simulations to forecast net income, predict costs, and manage risk. Sears uses this method to determine inventory needs, while financial planners use it to optimize investment strategies for their clients’ retirement. Monte Carlo simulation is a versatile and valuable tool in the business world.
I’ve used the process to simulate student retention and new enrollment at a private school. The results helped us understand the range of possibilities for total enrollment and the potential for changing the number of classrooms for each grade level.
Monte Carlo Simulation Using Excel
Performing a Monte Carlo simulation example will bring it to life! I’ll use Excel for this example because everyone can access it, and I want to provide you with a hands-on feel for it. Typically, you should run at least 100,000 iterations of the simulation for the best results. Unfortunately, Excel can’t manage that. I’ll run 2,000 iterations, which is enough to understand how it works.
Download the Excel file that contains the working simulation: MonteCarloSimulation.
A building insulation manufacturer is developing a new product. The engineers have performed experiments relating process inputs to insulation effectiveness.
Designed experiments have produced the following regression equation that describes the relationships between three inputs and the outcomes.
During this process, they recorded input variability under normal conditions. The inputs follow the following probability distributions:
Injection Pressure: Normal Distribution
- Mean: 112
- Standard Deviation: 40
Injection Temperature: Normal Distribution
- Mean: 92.5
- Standard Deviation: 12
Cooling Temperature: Normal Distribution
- Mean: 35
- Standard Deviation: 11
Learn more about the Mean, Standard Deviation, and Normal Distribution.
Setting Up Excel
For the Monte Carlo simulation example in Excel, I set up a section describing the input distributions.
The Simulation section draws one random value from each input distribution and calculates the effectiveness. I use Excel’s NORM.INV function with RAND() to draw a random value from each normal distribution.
The Effectiveness calculation uses the regression equation.
That leaves us with one iteration of the Monte Carlo simulation.
What about the other 1,999?
For that, I’ll use a Data Table.
- Create the Iteration section with the 2000 rows.
- In D9, I reference the Effectiveness equation in G7.
- Select the range B9:C2008. That’s the entire Iteration section minus the headings.
- Choose Data > What If Analysis > Data Table.
- In Column input cell, enter a blank cell (e.g., I10) and click OK.
At this point, Excel populates all 2,000 iterations. Snippet below.
Assessing the Monte Carlo Simulation Results in Excel
Because Monte Carlo simulations use random sampling, your results will differ slightly from mine. As the number of iterations increases, they tend to converge on a stable value. But with only 2000, the differences are a bit more noticeable.
For this example, our Monte Carlo simulation outcomes are in the iteration table. We need to tell Excel how to analyze them.
I’ll assess the results by creating a histogram of the outcomes and calculating the percentage of times that insulation effectiveness falls below the lower spec limit (LSL) of 10 or above the upper spec limit (USL) of 30. Those outcomes are out of spec. (I hand-drew the reference lines in the histogram because Excel can’t do that easily.)
To calculate the out-of-spec percentages, I use Excel’s COUNTIF function and divide by the 2000 iterations. For example, for the LSL cell, I use: =COUNTIF(C9:C2008,”<10″)/2000. Then format the cells as percentages.
Interestingly, the histogram indicates that our outcomes are right-skewed. Additionally, 1.6% are below the LSL and 0.80% are above the USL. A total of 2.40% are out of spec. I also calculate the mean (~17.3), but the out-of-spec statistics are more important for this example.
Monte Carlo Simulation and What If Scenarios
Imagine that the 2.4% defect rate is unacceptable. Let’s use the Monte Carlo simulation to see if we can lower it.
In manufacturing, changing input means is generally much easier than reducing the dispersion (standard deviation). Altering the mean typically involves changing settings, such as the pressure or heating settings. Reducing the variability is usually more expensive because it requires improving the technology. Think of your oven at home. You can change the mean heat by adjusting the temperature knob. However, if the variability of the heat around that mean is too high, you’ll need to upgrade the oven.
Of course, changing the input distributions requires using subject-area knowledge to ensure you only use feasible values.
Next, I’ll change the input means to see if we can reduce the defects.
New Input Values
In Excel, I enter the following new mean values in the Inputs section:
- InjPress: 169.5
- InjTemp: 75.5
- CoolTemp: 21.4
That’s all I need to run an entirely new, “what if” scenario!
These new values produce the following Monte Carlo simulation results. Again, yours will be slightly different.
The distribution is better centered between the spec limits and produces only 1.15% defects, almost all are below the LSL.
We can continue this process until we obtain an acceptable defect rate. If we can’t sufficiently reduce defects by only changing the means, we might need to reduce their variability. That would be a crucial finding itself. In that case, the Monte Carlo simulation tells us that our current technology is inadequate for this product and gives us an idea of how much better it must be.
Monte Carlo simulations allow us to find the initial estimate and then try out different scenarios to guide our decision-making without needing to collect data for all scenarios. Of course, these are simulated outcomes, and we’d need to verify the top scenarios with actual manufacturing runs.
As you saw, Monte Carlo simulation creates simulated data. Bootstrapping is a statistical procedure that uses real data to create simulated samples. Learn more by reading, Introduction to Bootstrapping.
Bradley says
How could you account for potential correlations between the input variables youโre simulating? In your example the InjPress, InjTemp, and CoolTemp. Can you somehow generate them through a defined covariance matrix?
Mehmet Yakut says
Hi, Jim,
Thank you for the article. It is very good and useful.
Can I use the Monte Carlo Simulation Method to estimate the uncertainity of measurement?
Best regards
Jim Frost says
Hi Mehmet,
Thanks so much for your kind words!
No, Monte Carlo Simulations don’t assess the uncertainty of measurements. They don’t even use real measurements because all measurements are simulated. You’d need to perform some type of Measurement Systems Analysis to understand measurement uncertainty. I’ve written about several, such as Gage R&R.
Federico says
Wonderful article! In statistics and machine learning, historical data is used to train/build a specific model. Once created, the model is used to a) make predictions on new unseen inputs or b) do inference.
I am still not sure when to choose to perform a MC simulation and when to collect historical data to train a specific ML model and then use that model for forecasting and inference.
Maybe it is not an either or situation: any MC simulation needs, beforehand, an established model which gets fed with a multitude of random inputs from a particular statistical probability distribution. The generated output data is then either analyzed using a histogram or averaged together, correct?
I guess that, once we build a trustworthy ML model, we can perform a MC simulation using that model to explore the what-if scenarios… Once trained and built, a reliable ML model is either deployed or tested using a MC simulation to see what the potential outputs will be given different sets of random inputs…Hope I am on the right track ๐ Thank YOU
Jim Frost says
Hi Federico,
In your last paragraph, you really hit the nail on the head when talking about what-if scenarios.
All Monte Carlo simulations work with some sort of underlying model. I’ve frequently used regression models for mine, but it can be ML, equations that describe physical properties, or other things. When you have a model, you can use it to understand how things work and to make predictions. Frequently, you develop these models using historical data. These models by themselves allow you to predict future outcomes by entering specific values into the model.
Monte Carlo simulations expand on the underlying model in several ways by using random sampling. For instance, it can help you see the effects of real-world variability on the outcomes. After all, in the real world, you won’t have just one set of inputs for your model but a whole range. If you have the underlying model and you know the input variability, how does that work its way through the outputs? What sort of pattern or distribution of outcomes does it produce? Then of course you can work on those what-if scenarios by changing the inputs. In short, MC allows you to explore a wide range of scenarios and uncertainties with a probabilistic approach to outcomes rather than providing a single prediction.
So, think of MC as a way to further explore a model, ML or otherwise. Of course, the quality of the results depends on the quality of the model.
As for interpreting the results, there are a variety of ways. Graphical ways such as histograms are very common. You might also evaluate the proportion of out of spec values. How many are less than or greater than particular key outcomes values. Or whatever is important to your subject area. It really depends on what you need to know about the outcome. The MC simulation gives you all the set of outcomes, but you as the analyst needs to know how to evaluate them.
Yes, you’re definitely on the right track!
Gilberto says
Very nice article. What software should I use if I want to run 100.000 simulations?
Jim Frost says
Hi Gilberto,
Thanks, I’m so glad to hear that the article was helpful!
I’m sure that the usual suspects of R, Python, etc. can manage it. Additionally, statistical software companies make applications dedicated to Monte Carlo simulations. For example, the place where I used to work has an app for that. Of course, that’s paid but does simplify the process.
You really just need the abilities to resample from probability distributions, input those values into an equation, and able to work with large datasets.
For a nice free solution, I recommend using Statistics 101. That’s freeware (although they like donations) that is a powerful resampling tool. I’ve used it specifically for Monte Carlo simulations (along with other things) and it can easily handle 500,000 iterations.