What is Monte Carlo Simulation?
Monte Carlo simulation uses random sampling to produce simulated outcomes of a process or system. This method uses random sampling to generate simulated input data and enters them into a mathematical model that describes the system. The simulation produces a distribution of outcomes that analysts can use to derive probabilities.
Performing a Monte Carlo simulation requires the following information:
- A function or equation that takes inputs and produces outcomes.
- Probability distributions for all inputs.
To use the Monte Carlo method, analysts need to supply an equation that describes how inputs produce specific outcomes in a process. These mathematical functions were discovered through prior research and represent known physical properties, the results of experimental designs, and regression equations. Additionally, the researchers need to understand the real-world variability of the inputs.
The Monte Carlo model itself is deterministic. Imagine a regression equation that describes a system. For any given set of input values, the model produces a single outcome. However, repeated random sampling introduces natural input variability into the function to create a distribution of simulated results. Monte Carlo simulations typically draw hundreds of thousands of values from the probability distributions for each input.
When researchers perform Monte Carlo analysis correctly, the random sampling process accurately produces combinations of input values, ranging from common to rare, that match the real world. The simulation then determines the relative frequencies of the outcomes these input combinations produce. Analysts can use these results to calculate the probabilities of important outcomes.
In this post, learn why you would use a Monte Carlo simulation, how it can help decision-making, and work through a hands-on example using Excel. Additionally, you can use the step-by-step instructions to create your own simulations!
Why Use Monte Carlo Simulations?
As a seasoned data analyst, the thought of relying on simulated data seemed unconventional initially. How can you enhance the performance of a process or improve decision-making using simulated data in a Monte Carlo simulation?
To improve a process, you could collect a massive amount of data to understand the relationship between input and output variability under different conditions.
However, with a good understanding of the inputs and a model representing the process, you can generate ample simulated input values and apply them to the model to produce a simulated output distribution.
With simulated data, it might sound like you’re getting something for nothing. But you’re actually incorporating a lot of real-world knowledge. Rigorous Monte Carlo simulations require accurate process equations and input variability information—in other words, empirical knowledge that grounds them in reality.
Analysts typically use simulated data when real data collection is impossible or too costly. Stanislaw Ulam developed the Monte Carlo simulation method for the Manhattan project when Uranium was too rare to use for experiments. So, they simulated the data!
One of the beauties of Monte Carlo simulation lies in its ability to change input distributions effortlessly to answer “what-if” questions. In the upcoming example, we’ll manipulate simulated inputs to enhance product quality.
Monte Carlo simulations enable analysts to:
- Account for input variability in product results.
- Optimize process parameters.
- Pinpoint critical quality factors.
- Reduce adverse outcomes.
- Assess risk.
Who Uses Monte Carlo Simulation?
Monte Carlo simulation has become an integral tool in decision-making for companies like General Motors, Proctor and Gamble, Pfizer, Bristol-Myers Squibb, and Eli Lilly. These companies use simulations to estimate both the average return and risk factor of new products, helping determine which ones go to market.
GM uses Monte Carlo simulations to forecast net income, predict costs, and manage risk. Sears uses this method to determine inventory needs, while financial planners use it to optimize investment strategies for their clients’ retirement. Monte Carlo simulation is a versatile and valuable tool in the business world.
I’ve used the process to simulate student retention and new enrollment at a private school. The results helped us understand the range of possibilities for total enrollment and the potential for changing the number of classrooms for each grade level.
Monte Carlo Simulation Using Excel
Performing a Monte Carlo simulation example will bring it to life! I’ll use Excel for this example because everyone can access it, and I want to provide you with a hands-on feel for it. Typically, you should run at least 100,000 iterations of the simulation for the best results. Unfortunately, Excel can’t manage that. I’ll run 2,000 iterations, which is enough to understand how it works.
Download the Excel file that contains the working simulation: MonteCarloSimulation.
A building insulation manufacturer is developing a new product. The engineers have performed experiments relating process inputs to insulation effectiveness.
Designed experiments have produced the following regression equation that describes the relationships between three inputs and the outcomes.
During this process, they recorded input variability under normal conditions. The inputs follow the following probability distributions:
Injection Pressure: Normal Distribution
- Mean: 112
- Standard Deviation: 40
Injection Temperature: Normal Distribution
- Mean: 92.5
- Standard Deviation: 12
Cooling Temperature: Normal Distribution
- Mean: 35
- Standard Deviation: 11
Setting Up Excel
For the Monte Carlo simulation example in Excel, I set up a section describing the input distributions.
The Simulation section draws one random value from each input distribution and calculates the effectiveness. I use Excel’s NORM.INV function with RAND() to draw a random value from each normal distribution.
The Effectiveness calculation uses the regression equation.
That leaves us with one iteration of the Monte Carlo simulation.
What about the other 1,999?
For that, I’ll use a Data Table.
- Create the Iteration section with the 2000 rows.
- In D9, I reference the Effectiveness equation in G7.
- Select the range B9:C2008. That’s the entire Iteration section minus the headings.
- Choose Data > What If Analysis > Data Table.
- In Column input cell, enter a blank cell (e.g., I10) and click OK.
At this point, Excel populates all 2,000 iterations. Snippet below.
Assessing the Monte Carlo Simulation Results in Excel
Because Monte Carlo simulations use random sampling, your results will differ slightly from mine. As the number of iterations increases, they tend to converge on a stable value. But with only 2000, the differences are a bit more noticeable.
For this example, our Monte Carlo simulation outcomes are in the iteration table. We need to tell Excel how to analyze them.
I’ll assess the results by creating a histogram of the outcomes and calculating the percentage of times that insulation effectiveness falls below the lower spec limit (LSL) of 10 or above the upper spec limit (USL) of 30. Those outcomes are out of spec. (I hand-drew the reference lines in the histogram because Excel can’t do that easily.)
To calculate the out-of-spec percentages, I use Excel’s COUNTIF function and divide by the 2000 iterations. For example, for the LSL cell, I use: =COUNTIF(C9:C2008,”<10″)/2000. Then format the cells as percentages.
Interestingly, the histogram indicates that our outcomes are right-skewed. Additionally, 1.6% are below the LSL and 0.80% are above the USL. A total of 2.40% are out of spec. I also calculate the mean (~17.3), but the out-of-spec statistics are more important for this example.
Monte Carlo Simulation and What If Scenarios
Imagine that the 2.4% defect rate is unacceptable. Let’s use the Monte Carlo simulation to see if we can lower it.
In manufacturing, changing input means is generally much easier than reducing the dispersion (standard deviation). Altering the mean typically involves changing settings, such as the pressure or heating settings. Reducing the variability is usually more expensive because it requires improving the technology. Think of your oven at home. You can change the mean heat by adjusting the temperature knob. However, if the variability of the heat around that mean is too high, you’ll need to upgrade the oven.
Of course, changing the input distributions requires using subject-area knowledge to ensure you only use feasible values.
Next, I’ll change the input means to see if we can reduce the defects.
New Input Values
In Excel, I enter the following new mean values in the Inputs section:
- InjPress: 169.5
- InjTemp: 75.5
- CoolTemp: 21.4
That’s all I need to run an entirely new, “what if” scenario!
These new values produce the following Monte Carlo simulation results. Again, yours will be slightly different.
The distribution is better centered between the spec limits and produces only 1.15% defects, almost all are below the LSL.
We can continue this process until we obtain an acceptable defect rate. If we can’t sufficiently reduce defects by only changing the means, we might need to reduce their variability. That would be a crucial finding itself. In that case, the Monte Carlo simulation tells us that our current technology is inadequate for this product and gives us an idea of how much better it must be.
Monte Carlo simulations allow us to find the initial estimate and then try out different scenarios to guide our decision-making without needing to collect data for all scenarios. Of course, these are simulated outcomes, and we’d need to verify the top scenarios with actual manufacturing runs.
As you saw, Monte Carlo simulation creates simulated data. Bootstrapping is a statistical procedure that uses real data to create simulated samples. Learn more by reading, Introduction to Bootstrapping.