Bootstrapping is a statistical procedure that resamples a single dataset to create many simulated samples. This process allows you to calculate standard errors, construct confidence intervals, and perform hypothesis testing for numerous types of sample statistics. Bootstrap methods are alternative approaches to traditional hypothesis testing and are notable for being easier to understand and valid for more conditions.

In this blog post, I explain bootstrapping basics, compare bootstrapping to conventional statistical methods, and explain when it can be the better method. Additionally, I’ll work through an example using real data to create bootstrapped confidence intervals.

## Bootstrapping and Traditional Hypothesis Testing Are Inferential Statistical Procedures

Both bootstrapping and traditional methods use samples to draw inferences about populations. To accomplish this goal, these procedures treat the single sample that a study obtains as only one of many random samples that the study could have collected.

From a single sample, you can calculate a variety of sample statistics, such as the mean, median, and standard deviation—but we’ll focus on the mean here.

Now, suppose an analyst repeats their study many times. In this situation, the mean will vary from sample to sample and form a distribution of sample means. Statisticians refer to this type of distribution as a sampling distribution. Sampling distributions are crucial because they place the value of your sample statistic into the broader context of many other possible values.

While performing a study many times is infeasible, both methods can estimate sampling distributions. Using the larger context that sampling distributions provide, these procedures can construct confidence intervals and perform hypothesis testing.

**Related posts**: Differences between Descriptive and Inferential Statistics

## Differences between Bootstrapping and Traditional Hypothesis Testing

A primary difference between bootstrapping and traditional statistics is how they estimate sampling distributions.

Traditional hypothesis testing procedures require equations that estimate sampling distributions using the properties of the sample data, the experimental design, and a test statistic. To obtain valid results, you’ll need to use the proper test statistic and satisfy the assumptions. I describe this process in more detail in other posts—links below.

The bootstrap method uses a very different approach to estimate sampling distributions. This method takes the sample data that a study obtains, and then resamples it over and over to create many simulated samples. Each of these simulated samples has its own properties, such as the mean. When you graph the distribution of these means on a histogram, you can observe the sampling distribution of the mean. You don’t need to worry about test statistics, formulas, and assumptions.

The bootstrap procedure uses these sampling distributions as the foundation for confidence intervals and hypothesis testing. Let’s take a look at how this resampling process works.

**Related posts**: How t-Tests Work and How the F-test Works in ANOVA

## How Bootstrapping Resamples Your Data to Create Simulated Datasets

Bootstrapping resamples the original dataset with replacement many thousands of times to create simulated datasets. This process involves drawing random samples from the original dataset. Here’s how it works:

- The bootstrap method has an equal probability of randomly drawing each original data point for inclusion in the resampled datasets.
- The procedure can select a data point more than once for a resampled dataset. This property is the “with replacement” aspect of the process.
- The procedure creates resampled datasets that are the same size as the original dataset.

The process ends with your simulated datasets having many different combinations of the values that exist in the original dataset. Each simulated dataset has its own set of sample statistics, such as the mean, median, and standard deviation. Bootstrapping procedures use the distribution of the sample statistics across the simulated samples as the sampling distribution.

## Example of Bootstrap Samples

Let’s work through an easy case. Suppose a study collects five data points and creates four bootstrap samples, as shown below.

This simple example illustrates the properties of bootstrap samples. The resampled datasets are the same size as the original dataset and only contain values that exist in the original set. Furthermore, these values can appear more or less frequently in the resampled datasets than in the original dataset. Finally, the resampling process is random and could have created a different set of simulated datasets.

Of course, in a real study, you’d hope to have a larger sample size, and you’d create thousands of resampled datasets. Given the enormous number of resampled data sets, you’ll always use a computer to perform these analyses.

## How Well Does Bootstrapping Work?

Resampling involves reusing your one dataset many times. It almost seems too good to be true! In fact, the term “bootstrapping” comes from the impossible phrase of pulling yourself up by your own bootstraps! However, using the power of computers to randomly resample your one dataset to create thousands of simulated datasets actually produces meaningful results.

The bootstrap method has been around since 1979, and its usage has increased. Various studies over the intervening decades have determined that bootstrap sampling distributions approximate the correct sampling distributions.

To understand how it works, keep in mind that bootstrapping does not create new data. Instead, it treats the original sample as a proxy for the real population and then draws random samples from it. Consequently, the central assumption for bootstrapping is that the original sample accurately represents the actual population.

The resampling process creates many possible samples that a study could have drawn. The various combinations of values in the simulated samples collectively provide an estimate of the variability between random samples drawn from the same population. The range of these potential samples allows the procedure to construct confidence intervals and perform hypothesis testing. Importantly, as the sample size increases, bootstrapping converges on the correct sampling distribution under most conditions.

Now, let’s see an example of this procedure in action!

## Example of Using Bootstrapping to Create Confidence Intervals

For this example, I’ll use bootstrapping to construct a confidence interval for a dataset that contains the body fat percentages of 92 adolescent girls. I used this dataset in my post about identifying the distribution of your data. These data do not follow the normal distribution. Because it does not meet the normality assumption of traditional statistics, it’s a good candidate for bootstrapping. Although, the large sample size might let us bypass this assumption. The histogram below displays the distribution of the original sample data.

Download the CSV dataset to try it yourself: body_fat.

### Performing the bootstrap procedure

To create the bootstrapped samples, I’m using Statistics101, which is a giftware program. Using its programming language, I’ve written a script that takes my original dataset and resamples it with replacement 500,000 times. This process produces 500,000 bootstrapped samples with 92 observations in each. The program calculates the mean of each sample and plots the distribution of these 500,000 means in the histogram below.

To create the bootstrapped confidence interval, we simply use percentiles. For a 95% confidence interval, we need to identify the middle 95% of the distribution. To do that, we use the 97.5^{th} percentile and the 2.5^{th} percentile (97.5 – 2.5 = 95). In other words, if we order all sample means from low to high, and then chop off the lowest 2.5% and the highest 2.5% of the means, the middle 95% of the means remain. That range is our bootstrapped confidence interval!

For the body fat data, the program calculates a 95% bootstrapped confidence interval of the mean [27.163043478260878 30.00652173913044].

This interval has the same width as the traditional confidence interval for these data, but it is shifted up by several percentage points. I’m assuming that the upward shift occurs because the original data are so nonnormally distributed. Even the large number of samples, working with the central limit theorem, didn’t let our skewed data bypass the normality assumption!

Compare this process to how traditional statistical methods create confidence intervals.

## Benefits of Bootstrapping over Traditional Statistics

Readers of my blog know that I love intuitive explanations of complex statistical methods. And, bootstrapping fits right in with this philosophy. This process is much easier to comprehend than the complex equations required for the probability distributions of the traditional methods. However, bootstrapping provides more benefits than just being easy to understand!

Bootstrapping does not make assumptions about the distribution of your data. You merely resample your data and use whatever sampling distribution emerges. Then, you work with that distribution, whatever it might be, as we did in the example.

Conversely, the traditional methods often assume that the data follow the normal distribution or some other distribution. For the normal distribution, the central limit theorem might let you bypass this assumption for samples sizes that are larger than ~30—but sometimes it does not, as we saw. Consequently, you can use bootstrapping for a wider variety of distributions, unknown distributions, and smaller sample sizes. Sample sizes as small as 10 can be usable.

In this vein, all traditional methods use equations that estimate the sampling distribution for a specific sample statistic when the data follow a particular distribution. Unfortunately, formulas for all combinations of sample statistics and data distributions do not exist! For example, there is no known sampling distribution for medians, which makes bootstrapping the perfect analyses for it. Other analyses have assumptions such as equality of variances. However, none of these issues are problems for bootstrapping.

## For Which Sample Statistics Can I Use Bootstrapping?

While this blog post focuses on the sample mean, the bootstrap method can analyze a broad range of sample statistics and properties. These statistics include the mean, median, mode, standard deviation, analysis of variance, correlations, regression coefficients, proportions, odds ratios, variance in binary data, and multivariate statistics among others.

There are several, mostly esoteric, conditions when bootstrapping is not appropriate, such as when the population variance is infinite, or when the population values are discontinuous at the median. And, there are various conditions where tweaks to the bootstrapping process are necessary to adjust for bias. However, those cases go beyond the scope of this introductory blog post.

ihsanullah says

please examples

Jim Frost says

Hi, I include a great example right in this post! ðŸ™‚

Ù…ØÙ…Ø¯ Ø¹Ø¨Ø¯Ø§Ù„Ù„Ù‡ Ù…ØÙ…Ø¯ Ø§ØÙ…Ø¯ says

Thanks a lot Mr. Frost

Jim Frost says

You’re very welcome!

Mcpheson says

Nicely intuitive.

Jim Frost says

Thank you!

Sampath says

It’s really interesting post. Thank you Jim.

Jim Frost says

Thank you, Sampath! I’m glad you enjoyed it.

Stan Alekman says

Can you prepare an article describing how to bootstrap regression coefficients, and regression coefficient confidence intervals?

Can bootstrap estimates of means and standard deviations (as in your example) be used to estimate tolerance intervals using the bootstrapped mean +-k*bootstrapped sigma where k is the smallest value in the table since hundreds of thousands of bootstrap sampling steps are used to estimate the bootstrapped sigma?

Regards,

Stan Alekman

Jim Frost says

Hi Stan,

I was wondering what the reaction would be to bootstrapping. I had hopes there would be interest in it. I think it’s safe to say that there will be more articles about it!

Matt says

Jim, great article, generating lots of discussion among my peers. Thanks.

“An Introduction to Statistical Learning with Applications in R” by Gareth James et al has a short section (5.2, pages 187-190) on bootstrapping, with an example on regression coefficients. Essentially the bootstrapped samples draw the X and Y data from the original, then you figure the regression coefficient for each bootstrapped sample. Across all bootstrapped samples, figure your statistic of the coefficient.

Stan Alekman says

Can you reply to my specific question regarding tolerance intervals by bootstrapped mean and bootstrapped standard deviation? This would be an excellent procedure, if valid, to generate precise tolerance intervals.

Thank you.

Stan Alekman

Jim Frost says

Hi Stan,

You can create bootstrapped tolerance intervals. I don’t know enough about it right know to give you an intelligent response about it. That’s forthcoming after I learn more!

Stan Alekman says

Thank you. Look forward to it.

Aijaz Ahmad Dar says

I am interested in bootstrapping and I am using it. But I am having a question that i asked to many but I don’t get the answer. My question is how to find the Confidence interval (C.I) for the support parameter (I mean the situation where MLE is the first order and nth order statistics). example in Pareto distribution, power distribution.

Stan Alekman says

Thanks for the info re bootstrapping regression coefficients, etc. Frankly, I am distrustful of bootstrap estimates. The underlying assumption is that the original sample mimics the population. It is very difficult to collect truly random samples in industrial settings.

Jim Frost says

I think in general it’s harder than commonly recognized to get a truly random, representative sample. The only thing that I’d add is that the traditional statistical methods also assume representative samples. So, if it’s that’s a problem, it’ll affect both bootstrap and traditional methods.

Stan Alekman says

Not to be argumentative, inference from a single non-representative sample as opposed to a hundred thousand resamples from a single non-representative sample seems like the wrong direction to take.

Jim Frost says

It actually works out to be fairly equivalent. The traditional approach uses the sample to calculate a sampling distribution, such as the t-distribution. That distribution is calculated from your one sample and it is equivalent to the distribution you’d obtain after performing the analysis (e.g., t-test) an infinite number of times. If your sample is not representative, that distribution will not be correct.

I’m not trying to convince you to use bootstrapping by any means. But a non-representative sample will affect the sampling distribution for both approaches because both use a single sample to estimate a sampling distribution. The methodology to produce that sampling distribution is different (resampling vs. formulas), but the end results are similar.

I haven’t used bootstrapping methods extensively myself. My training and experience has been with the traditional methods. However, the research that supports the validity of bootstrap methodology is very strong.

Stanley Alekman says

Thanks for the explanation. I failed to understand that earlier.

Stan Alekman

Jim Frost says

You bet. And, it probably means I didn’t explain it clearly enough!