Bootstrapping is a statistical procedure that resamples a single dataset to create many simulated samples. This process allows you to calculate standard errors, construct confidence intervals, and perform hypothesis testing for numerous types of sample statistics. Bootstrap methods are alternative approaches to traditional hypothesis testing and are notable for being easier to understand and valid for more conditions.

In this blog post, I explain bootstrapping basics, compare bootstrapping to conventional statistical methods, and explain when it can be the better method. Additionally, I’ll work through an example using real data to create bootstrapped confidence intervals.

## Bootstrapping and Traditional Hypothesis Testing Are Inferential Statistical Procedures

Both bootstrapping and traditional methods use samples to draw inferences about populations. To accomplish this goal, these procedures treat the single sample that a study obtains as only one of many random samples that the study could have collected.

From a single sample, you can calculate a variety of sample statistics, such as the mean, median, and standard deviation—but we’ll focus on the mean here.

Now, suppose an analyst repeats their study many times. In this situation, the mean will vary from sample to sample and form a distribution of sample means. Statisticians refer to this type of distribution as a sampling distribution. Sampling distributions are crucial because they place the value of your sample statistic into the broader context of many other possible values.

While performing a study many times is infeasible, both methods can estimate sampling distributions. Using the larger context that sampling distributions provide, these procedures can construct confidence intervals and perform hypothesis testing.

**Related posts**: Differences between Descriptive and Inferential Statistics

## Differences between Bootstrapping and Traditional Hypothesis Testing

A primary difference between bootstrapping and traditional statistics is how they estimate sampling distributions.

Traditional hypothesis testing procedures require equations that estimate sampling distributions using the properties of the sample data, the experimental design, and a test statistic. To obtain valid results, you’ll need to use the proper test statistic and satisfy the assumptions. I describe this process in more detail in other posts—links below.

The bootstrap method uses a very different approach to estimate sampling distributions. This method takes the sample data that a study obtains, and then resamples it over and over to create many simulated samples. Each of these simulated samples has its own properties, such as the mean. When you graph the distribution of these means on a histogram, you can observe the sampling distribution of the mean. You don’t need to worry about test statistics, formulas, and assumptions.

The bootstrap procedure uses these sampling distributions as the foundation for confidence intervals and hypothesis testing. Let’s take a look at how this resampling process works.

**Related posts**: How t-Tests Work and How the F-test Works in ANOVA

## How Bootstrapping Resamples Your Data to Create Simulated Datasets

Bootstrapping resamples the original dataset with replacement many thousands of times to create simulated datasets. This process involves drawing random samples from the original dataset. Here’s how it works:

- The bootstrap method has an equal probability of randomly drawing each original data point for inclusion in the resampled datasets.
- The procedure can select a data point more than once for a resampled dataset. This property is the “with replacement” aspect of the process.
- The procedure creates resampled datasets that are the same size as the original dataset.

The process ends with your simulated datasets having many different combinations of the values that exist in the original dataset. Each simulated dataset has its own set of sample statistics, such as the mean, median, and standard deviation. Bootstrapping procedures use the distribution of the sample statistics across the simulated samples as the sampling distribution.

## Example of Bootstrap Samples

Let’s work through an easy case. Suppose a study collects five data points and creates four bootstrap samples, as shown below.

This simple example illustrates the properties of bootstrap samples. The resampled datasets are the same size as the original dataset and only contain values that exist in the original set. Furthermore, these values can appear more or less frequently in the resampled datasets than in the original dataset. Finally, the resampling process is random and could have created a different set of simulated datasets.

Of course, in a real study, you’d hope to have a larger sample size, and you’d create thousands of resampled datasets. Given the enormous number of resampled data sets, you’ll always use a computer to perform these analyses.

## How Well Does Bootstrapping Work?

Resampling involves reusing your one dataset many times. It almost seems too good to be true! In fact, the term “bootstrapping” comes from the impossible phrase of pulling yourself up by your own bootstraps! However, using the power of computers to randomly resample your one dataset to create thousands of simulated datasets actually produces meaningful results.

The bootstrap method has been around since 1979, and its usage has increased. Various studies over the intervening decades have determined that bootstrap sampling distributions approximate the correct sampling distributions.

To understand how it works, keep in mind that bootstrapping does not create new data. Instead, it treats the original sample as a proxy for the real population and then draws random samples from it. Consequently, the central assumption for bootstrapping is that the original sample accurately represents the actual population.

The resampling process creates many possible samples that a study could have drawn. The various combinations of values in the simulated samples collectively provide an estimate of the variability between random samples drawn from the same population. The range of these potential samples allows the procedure to construct confidence intervals and perform hypothesis testing. Importantly, as the sample size increases, bootstrapping converges on the correct sampling distribution under most conditions.

Now, let’s see an example of this procedure in action!

## Example of Using Bootstrapping to Create Confidence Intervals

For this example, I’ll use bootstrapping to construct a confidence interval for a dataset that contains the body fat percentages of 92 adolescent girls. I used this dataset in my post about identifying the distribution of your data. These data do not follow the normal distribution. Because it does not meet the normality assumption of traditional statistics, it’s a good candidate for bootstrapping. Although, the large sample size might let us bypass this assumption. The histogram below displays the distribution of the original sample data.

Download the CSV dataset to try it yourself: body_fat.

### Performing the bootstrap procedure

To create the bootstrapped samples, I’m using Statistics101, which is a giftware program. This is a great simulation program that I’ve also used to tackle the Monty Hall Problem!

Using its programming language, I’ve written a script that takes my original dataset and resamples it with replacement 500,000 times. This process produces 500,000 bootstrapped samples with 92 observations in each. The program calculates the mean of each sample and plots the distribution of these 500,000 means in the histogram below. Statisticians refer to this type of distribution as the sampling distribution of means. Bootstrapping methods create these distributions using resampling while traditional methods use equations for probability distributions.

To create the bootstrapped confidence interval, we simply use percentiles. For a 95% confidence interval, we need to identify the middle 95% of the distribution. To do that, we use the 97.5^{th} percentile and the 2.5^{th} percentile (97.5 – 2.5 = 95). In other words, if we order all sample means from low to high, and then chop off the lowest 2.5% and the highest 2.5% of the means, the middle 95% of the means remain. That range is our bootstrapped confidence interval!

For the body fat data, the program calculates a 95% bootstrapped confidence interval of the mean [27.16 30.01]. We can be 95% confident that the population mean falls within this range.

This interval has the same width as the traditional confidence interval for these data, and it is different by only several percentage points. The two methods are very close.

Notice how the sampling distribution in the histogram approximates a normal distribution even though the underlying data distribution is skewed. This approximation occurs thanks to the central limit theorem. As the sample size increases, the sampling distribution converges on a normal distribution regardless of the underlying data distribution (with a few exceptions). For more information about this theorem, read my post about the Central Limit Theorem.

Compare this process to how traditional statistical methods create confidence intervals.

## Benefits of Bootstrapping over Traditional Statistics

Readers of my blog know that I love intuitive explanations of complex statistical methods. And, bootstrapping fits right in with this philosophy. This process is much easier to comprehend than the complex equations required for the probability distributions of the traditional methods. However, bootstrapping provides more benefits than just being easy to understand!

Bootstrapping does not make assumptions about the distribution of your data. You merely resample your data and use whatever sampling distribution emerges. Then, you work with that distribution, whatever it might be, as we did in the example.

Conversely, the traditional methods often assume that the data follow the normal distribution or some other distribution. For the normal distribution, the central limit theorem might let you bypass this assumption for samples sizes that are larger than ~30. Consequently, you can use bootstrapping for a wider variety of distributions, unknown distributions, and smaller sample sizes. Sample sizes as small as 10 can be usable.

In this vein, all traditional methods use equations that estimate the sampling distribution for a specific sample statistic when the data follow a particular distribution. Unfortunately, formulas for all combinations of sample statistics and data distributions do not exist! For example, there is no known sampling distribution for medians, which makes bootstrapping the perfect analyses for it. Other analyses have assumptions such as equality of variances. However, none of these issues are problems for bootstrapping.

## For Which Sample Statistics Can I Use Bootstrapping?

While this blog post focuses on the sample mean, the bootstrap method can analyze a broad range of sample statistics and properties. These statistics include the mean, median, mode, standard deviation, analysis of variance, correlations, regression coefficients, proportions, odds ratios, variance in binary data, and multivariate statistics among others.

There are several, mostly esoteric, conditions when bootstrapping is not appropriate, such as when the population variance is infinite, or when the population values are discontinuous at the median. And, there are various conditions where tweaks to the bootstrapping process are necessary to adjust for bias. However, those cases go beyond the scope of this introductory blog post.

Stan Alekman says

Resampling simply runs a Monte Carlo simulation on existing data to give some idea about the influence of extreme values. It does ask why extreme values or outliers are present. It does not test for outliers and cull them. It simply tries to average out their effects. But in non-experimental settings, outliers are critical. They are the signals that tell us of the presence of assignable causes. Resampling sidesteps the assumption of independent and identically distributed random variables without having to deal with outliers. The emphasis is completely upon estimation of parameters, not process characterization or improvement. Given this difference in emphasis, it works.

If I see appreciably different results between the usual tests and resampling, I would suspect the data of having come from an unpredictable process. In that case the resampling results would provide estimates with less variation, but the question of whether or not those estimates were estimates of one parameter or many different parameters would remain unanswered. Resampling works with data that are mostly homogeneous with only a few outliers.

Stan Alekman

Dwasch says

Hello.

Thanks for this helpful summary. Am I correct in understanding bootstrapping doesn’t rely on either a normality assumption or (for group comparison) a homogeneity of variances assumption? If so, could you point to a references without too much hassle? Would be helpful for an revise and resubmit.

Thanks!

Stan Alekman says

The question then remains: is the bootstrap confidence interval more reliable (closer to the truth) than is the confidence interval by traditional means? Without an answer or consensus, decisions based on analysis will not necessarily be the best we can make. We strive to make the best evidence based decisions.

Jim Frost says

Hi Stan,

This being statistics, the answer is a definite, “it depends.” I know that’s not helpful but a blanket answer isn’t possible. There are some cases where your data just don’t fit an existing analysis. It might deviate from the assumptions too much. Or, perhaps the appropriate test does not exist. In those cases, bootstrapping is clearly superior.

However, in other cases where your data completely satisfy the assumptions of a proven test, it’s harder to make the case that either method is superior. I’d say that bootstrapping is more flexible in terms of the conditions and tests that it can handle. I also haven’t thoroughly researched bootstrapping and might be unaware of how it compares to traditional methods (such as t-tests and CIs) when your data do satisfy the assumptions. I wouldn’t be surprised if someone performed a simulation study to look into this question. If this is a question you face for a study, it would probably be wise to research it.

I also don’t know the properties of your data. My sense is that the more closely your data follow the normal distribution the more equivalent the two approaches become. However, as your data diverge from the normal distribution, I’d expect bootstrapping to become the better analysis. However, I’m not familiar enough with that literature to give you practical advice for making that decision.

In statistics, knowing which test is better typically depends on understanding the characteristics of your data and the stringency of the relevant requirements. This holds true for deciding between traditional vs. bootstrapping methods.

Stan Alekman says

I wonder. I collect a sample and estimate a mean and confidence interval by the traditional t-distribution.

Then I re-estimate the mean and confidence interval by boot strapping and find a somewhat different mean a narrower interval.

Is it appropriate to report the boot strap estimates? Will bootstrap estimates be acceptable for journal publications? Are boot strap estimates superior?

Jim Frost says

Hi Stan,

Unfortunately, I don’t have concrete answers for your questions. In terms of what journal publications will accept, that will vary by field and journal. Most journal articles I’ve read use the traditional t-distribution, tests, and CIs. I think that’s mainly due to familiarity and tradition rather than it being better. Most people are more familiar with the traditional hypothesis tests. However, that’s not to say that journals won’t accept bootstrap results. I’d look into what the journal has published as well others in your field. There is a good case to be made for bootstrap methods.

Where I think the bootstrap method really shines is for cases where you don’t satisfy the assumptions for a traditional test. Or, perhaps there isn’t even a traditional method for what you want to accomplish. That’s where I’d say that bootstrapping is superior. If you have data that satisfy the assumptions, my sense is that both methods are similarly good.

Sorry for the vague answer. But, I don’t think a concrete one exists!

Fizza says

Hey. How can I use bootstrapping for multiple regression?

Shashank Garg says

Thank you Jim for such a simple explanation of Bootstrapping. I was trying to get the initials of the design from long but was not able to figure it out. Now it will easier for me to understand further details of it. I was also not a supporter of the the theory that all phenomena are normally distributed. Although, bootstrapping also makes assumptions, still we have something new to ponder.

Jim Frost says

Hi Shashank,

You’re very welcome! As some one who “grew up” on traditional hypothesis testing procedures, learning about bootstrapping was very interesting.

In traditional hypothesis, it’s true that not all distributions are normal. However, the central limit theorem is our friend in that regard because, with a large enough sample, the sampling distributions approximate the normal distribution, which satisfies the assumption for those tests.

saroja says

i love the way you make the concepyt clear

Debanjan says

So, should my original mean fall within the bootstrap confidence interval or not?

Jim Frost says

For a 95% confidence interval, you can be 95% confident that the interval contains the population mean. The population mean is the unknowable parameter that we’re estimating with a sample. So, yes, the process typically produces intervals that contain the population parameter. However, occasionally it won’t because of an unusual sample. Of course, this assumes that you’re drawing a random, independent sample.

Karan Desai says

Hi Jim,

This is the first time I read about bootstrapping and loved the concept. No wonder the name of the method is bootstrapping. You have explained it really well. Your blog is a gem.

Stanley Alekman says

Thanks for the explanation. I failed to understand that earlier.

Stan Alekman

Jim Frost says

You bet. And, it probably means I didn’t explain it clearly enough!

Stan Alekman says

Not to be argumentative, inference from a single non-representative sample as opposed to a hundred thousand resamples from a single non-representative sample seems like the wrong direction to take.

Jim Frost says

It actually works out to be fairly equivalent. The traditional approach uses the sample to calculate a sampling distribution, such as the t-distribution. That distribution is calculated from your one sample and it is equivalent to the distribution you’d obtain after performing the analysis (e.g., t-test) an infinite number of times. If your sample is not representative, that distribution will not be correct.

I’m not trying to convince you to use bootstrapping by any means. But a non-representative sample will affect the sampling distribution for both approaches because both use a single sample to estimate a sampling distribution. The methodology to produce that sampling distribution is different (resampling vs. formulas), but the end results are similar.

I haven’t used bootstrapping methods extensively myself. My training and experience has been with the traditional methods. However, the research that supports the validity of bootstrap methodology is very strong.

Stan Alekman says

Thanks for the info re bootstrapping regression coefficients, etc. Frankly, I am distrustful of bootstrap estimates. The underlying assumption is that the original sample mimics the population. It is very difficult to collect truly random samples in industrial settings.

Jim Frost says

I think in general it’s harder than commonly recognized to get a truly random, representative sample. The only thing that I’d add is that the traditional statistical methods also assume representative samples. So, if it’s that’s a problem, it’ll affect both bootstrap and traditional methods.

Aijaz Ahmad Dar says

I am interested in bootstrapping and I am using it. But I am having a question that i asked to many but I don’t get the answer. My question is how to find the Confidence interval (C.I) for the support parameter (I mean the situation where MLE is the first order and nth order statistics). example in Pareto distribution, power distribution.

Stan Alekman says

Thank you. Look forward to it.

Stan Alekman says

Can you reply to my specific question regarding tolerance intervals by bootstrapped mean and bootstrapped standard deviation? This would be an excellent procedure, if valid, to generate precise tolerance intervals.

Thank you.

Stan Alekman

Jim Frost says

Hi Stan,

You can create bootstrapped tolerance intervals. I don’t know enough about it right know to give you an intelligent response about it. That’s forthcoming after I learn more!

Stan Alekman says

Can you prepare an article describing how to bootstrap regression coefficients, and regression coefficient confidence intervals?

Can bootstrap estimates of means and standard deviations (as in your example) be used to estimate tolerance intervals using the bootstrapped mean +-k*bootstrapped sigma where k is the smallest value in the table since hundreds of thousands of bootstrap sampling steps are used to estimate the bootstrapped sigma?

Regards,

Stan Alekman

Jim Frost says

Hi Stan,

I was wondering what the reaction would be to bootstrapping. I had hopes there would be interest in it. I think it’s safe to say that there will be more articles about it!

Matt says

Jim, great article, generating lots of discussion among my peers. Thanks.

“An Introduction to Statistical Learning with Applications in R” by Gareth James et al has a short section (5.2, pages 187-190) on bootstrapping, with an example on regression coefficients. Essentially the bootstrapped samples draw the X and Y data from the original, then you figure the regression coefficient for each bootstrapped sample. Across all bootstrapped samples, figure your statistic of the coefficient.

Sampath says

It’s really interesting post. Thank you Jim.

Jim Frost says

Thank you, Sampath! I’m glad you enjoyed it.

Mcpheson says

Nicely intuitive.

Jim Frost says

Thank you!

محمد عبدالله محمد احمد says

Thanks a lot Mr. Frost

Jim Frost says

You’re very welcome!

ihsanullah says

please examples

Jim Frost says

Hi, I include a great example right in this post! 🙂