The central limit theorem in statistics states that, given a sufficiently large sample size, the sampling distribution of the mean for a variable will approximate a normal distribution regardless of that variable’s distribution in the population.

Unpacking the meaning from that complex definition can be difficult. That’s the topic for this post! I’ll walk you through the various aspects of the central limit theorem (CLT) definition, and show you why it is so important in the field of statistics.

## Distribution of the Variable in the Population

Part of the definition for the central limit theorem states, “regardless of the variable’s distribution in the population.” This part is easy! In a population, values of a variable can follow different probability distributions. These distributions can range from normal, left skewed, right skewed, and uniform among others.

**Normal**

**Right-Skewed**

**Left-Skewed**

**Uniform**

This part of the definition refers to the distribution of the variable’s values in the population from which you draw a random sample.

The central limit theorem applies to almost all types of probability distributions, but there are exceptions. For example, the population must have a finite variance. That restriction rules out the Cauchy distribution because it has an infinite variance.

Additionally, the central limit theorem applies to independent, identically distributed variables. In other words, the value of one observation does not depend on the value of another observation. And, the distribution of that variable must remain constant across all measurements.

**Related Post**: Understanding Probability Distributions

## Sampling Distribution of the Mean

The definition for central limit theorem also refers to “the sampling distribution of the mean.” What’s that?

Typically, you perform a study once, and you might calculate the mean of that one sample. Now, imagine that you repeat the study many times and collect the same sample size for each one. Then, you calculate the mean for each of these samples and graph them on a histogram. The histogram displays the distribution of sample means, which statisticians refer to as the sampling distribution of the mean.

Fortunately, we don’t have to repeat studies many times to estimate the sampling distribution of the mean. Statistical procedures can estimate that from a single random sample.

The shape of the sampling distribution depends on the sample size. In other words, if you perform the study using the same procedure but change only the sample size, the shape of the sampling distribution will differ for each sample size. And, that’s brings us to the next part of the CLT definition!

## Central Limit Theorem and a Sufficiently Large Sample Size

The shape of the sampling distribution changes with the sample size. And, the definition of the central limit theorem states that when you have a sufficiently large sample size, the sampling distribution starts to approximate a normal distribution. How large does the sample size have to be for that approximation to occur?

It depends on the shape of the variable’s distribution in the underlying population. The more the population distribution differs from being normal, the larger the sample size must be. Typically, statisticians say that a sample size of 30 is sufficient for most distributions. However, strongly skewed distributions can require larger sample sizes. We’ll see the sample size aspect in action during the empirical demonstration below.

## Central Limit Theorem and Approximating the Normal Distribution

To recap, the central limit theorem links the following two distributions:

- The distribution of the variable in the population.
- The sampling distribution of the mean.

Specifically, the CLT states that regardless of the variable’s distribution in the population, the sampling distribution of the mean will tend to approximate the normal distribution.

In other words, the population distribution can look like the following:

But, the sampling distribution can appear like below:

It’s not surprising when you start with a normally distributed variable that the sampling distribution will also be normally distributed. But, it is surprising that nonnormal population distributions can also produce normal sampling distributions.

**Related Post**: Normal Distribution in Statistics

## Properties of the Central Limit Theorem

Let’s get more specific about the normality features of the central limit theorem. Normal distributions have two parameters, the mean and standard deviation. What values do these parameters converge on?

As the sample size increases, the sampling distribution converges on a normal distribution where the mean equals the population mean, and the standard deviation equals σ/√n. Where:

- σ = the population standard deviation
- n = the sample size

As the sample size (n) increases, the standard deviation of the sampling distribution becomes smaller because the square root of the sample size is in the denominator. In other words, the sampling distribution clusters more tightly around the mean as sample size increases.

Let’s put all of this together. As sample size increases, the sampling distribution more closely approximates the normal distribution, and the spread of that distribution tightens. These properties have essential implications in statistics that I’ll discuss later in this post.

**Related Posts**: Measures of Central Tendency and Measures of Variability

## Empirical Demonstration of the Central Limit Theorem

Now the fun part! There is a mathematical proof for the central theorem, but that is beyond the scope of this blog post. However, I will show how it works empirically by using statistical simulation software. I’ll define population distributions and have the software draw many thousands of random samples from it. The software will calculate the mean of each sample and then graph these sample means on a histogram to display the sampling distribution of the mean.

For the following examples, I’ll vary the sample size to show how that affects the sampling distribution. To produce the sampling distribution, I’ll draw 500,000 random samples because that creates a fairly smooth distribution in the histogram.

Keep this critical difference in mind. While I’ll collect a consistent 500,000 samples per condition, the size of those samples will vary, and that affects the shape of the sampling distribution.

Let’s test this theory! To do that, I’ll use Statistics101, which is a giftware computer program.

## Testing the Central Limit Theorem with Three Probability Distributions

I’ll show you how the central limit theorem works with three different distributions: moderately skewed, severely skewed, and a uniform distribution. The first two distributions skew to the right and follow the lognormal distribution. The probability distribution plot below displays the population’s distribution of values. Notice how the red dashed distribution is much more severely skewed. It actually extends quite a way off the graph! We’ll see how this makes a difference in the sampling distributions.

Let’s see how the central limit theorem handles these two distributions and the uniform distribution.

## Moderately Skewed Distribution and the Central Limit Theorem

The graph below shows the moderately skewed lognormal distribution. This distribution fits the body fat percentage dataset that I use in my post about identifying the distribution of your data. These data correspond to the blue line in the probability distribution plot above. I use the simulation software to draw random samples from this population 500,000 times for each sample size (5, 20, 40).

In the graph above, the gray color shows the skewed distribution of the values in the population. The other colors represent the sampling distributions of the means for different sample sizes. The red color shows the distribution of means when your sample size is 5. Blue denotes a sample size of 20. Green is 40. The red curve (n=5) is still skewed a bit, but the blue and green (20 and 40) are not visibly skewed.

You can see that as the sample size increases, the sampling distributions more closely approximate the normal distribution and becomes more tightly clustered around the population mean—just as the central limit theorem states!

## Very Skewed Distribution and the Central Limit Theorem

Now, let’s try this with the very skewed lognormal distribution. These data follow the red dashed line in the probability distribution plot above. I follow the same process but use larger sample sizes of 40 (grey), 60 (red), and 80 (blue). I do not include the population distribution in this one because it is so skewed that it messes up the X-axis scale!

The population distribution is extremely skewed. It’s probably more skewed than real data tend to be. As you can see, even with the largest sample size (blue, n=80), the sampling distribution of the mean is still skewed right. However, it is less skewed than the sampling distributions for the smaller sample sizes. Also, notice how the peaks of the sampling distribution shift to the right as the sample increases. Eventually, with a large enough sample size, the sampling distributions will become symmetric, and the peak will stop shifting and center on the actual population mean.

If your population distribution is extremely skewed, be aware that you might need a substantial sample size for the central limit theorem to kick in and produce sampling distributions that approximate a normal distribution!

## Uniform Distribution and the Central Limit Theorem

Now, let’s change gears and look at an entirely different type of distribution. Imagine that we roll a die and take the average value of the rolls. The probabilities for rolling the numbers on a die follow a uniform distribution because all numbers have the same chance of occurring. Can the central limit theorem work with discrete numbers and uniform probabilities? Let’s see!

In the graph below, I follow the same procedure as above. In this example, the sample size refers to the number of times we roll the die. The process calculates the mean for each sample.

In the graph above, I use sample sizes of 5, 20, and 40. We’d expect the average to be (1 + 2 + 3 + 4 + 5 + 6 / 6 = 3.5). The sampling distributions of the means center on this value. Just as the central limit theorem predicts, as we increase the sample size, the sampling distributions more closely approximate a normal distribution and have a tighter spread of values.

You could perform a similar experiment using the binomial distribution with coin flips and obtain the same types of results when it comes to, say, the probability of getting heads. All thanks to the central limit theorem!

## Why is the Central Limit Theorem Important?

The central limit theorem is vital in statistics for two main reasons—the normality assumption and the precision of the estimates.

### Central limit theorem and the normality assumption

The fact that sampling distributions can approximate a normal distribution has critical implications. In statistics, the normality assumption is vital for parametric hypothesis tests of the mean, such as the t-test. Consequently, you might think that these tests are not valid when the data are nonnormally distributed. However, if your sample size is large enough, the central limit theorem kicks in and produces sampling distributions that approximate a normal distribution. This fact allows you to use these hypothesis tests even when your data are nonnormally distributed—as long as your sample size is large enough.

You might have heard that parametric tests of the mean are robust to departures from the normality assumption when your sample size is sufficiently large. That’s thanks to the central limit theorem!

For more information about this aspect, read my post that compares parametric and nonparametric tests.

### Precision of estimates

In all of the graphs, notice how the sampling distributions of the mean cluster more tightly around the population mean as the sample sizes increase. This property of the central limit theorem becomes relevant when you are using a sample to estimate the mean of an entire population. When you have a larger sample size, your sample mean is more likely to be close to the real population mean. In other words, your estimate is more precise.

Conversely, the sampling distributions of the mean for smaller sample sizes are much broader. For small sample sizes, it’s not unusual for sample means to be further away from the actual population mean. You obtain less precise estimates.

In closing, understanding the central limit theorem is crucial when it comes to trusting the validity of your results and assessing the precision of your estimates. Use large sample sizes to satisfy the normality assumption even when your data are nonnormally distributed and to obtain more precise estimates!

EDE WILLIAMS says

Hi Jim

Really appreciate your efforts, making CLT simple for me to understand that I don’t need anybody to explain any further.

Williams from Federal Polytechnic nekede, studying STATISTICS

Patrick says

Hi Jim,

thank you, I greatly appreciate your detailed answer to my question!

Best regards,

Patrick

Patrick says

Hi Jim,

thanks for this excellent post!

I was just wondering about the following: you are saying “In statistics, the normality assumption is vital for parametric hypothesis tests of the mean, such as the t-test. (…) if your sample size is large enough, the central limit theorem kicks in and produces sampling distributions that approximate a normal distribution. This fact allows you to use these hypothesis tests even when your data are nonnormally distributed—as long as your sample size is large enough.”

Does that basically apply to all paramteric hypothesis tests, including linear regression analysis? I once discussed this with a statistician, who objected that the normality assumption does not apply to the distribution of the dependent variable (and this is also true for the t-test which is just a special case of linear regression) but rather to the distribution of the residuals. He then argued that if I have a poor set of predictors, the model will most likely not achieve normality of the residuals, regardless of sample size.

This left me wondering whether or not I can use linear regression with large sample sizes without having to worry about distributional assumptions. Do I need normally distributed residuals at all with a large sample size – or does the CLT also apply to the residuals?

I’d greatly appreciate your view on this aspect.

Thank you and best regards,

Patrick

Jim Frost says

Hi Patrick,

Off hand, I’d say that it applies to most types of parametric hypothesis tests. Even populations that follow the binomial and Poisson distributions will have sampling distributions that follow the normal distribution with a large enough sample size. Consequently, it applies to proportions tests and Poisson rates of occurrence tests. However, I haven’t thought it through enough whether it applies to all parametric tests.

As for regression analysis, that gets a bit complicated! First, yes, the normality assumption applies to the distribution of the residuals rather than the dependent variable. However, that assumption is an optional one that applies only if you want to use hypothesis testing and confidence intervals, as you can read about in my post about OLS Assumptions.

As for whether the hypothesis test results are valid in regression analysis when residuals are nonnormally distributed with a sufficiently large sample size, I’d say the answer is both yes and no! How’s that for covering my bases?!

Here’s the rationale for both answers.

Yes, I do believe the central limit theorem kicks in with the sampling distributions for the coefficient estimates. With a large enough sample size, these sampling distributions should follow a normal distribution even when the residuals are nonnormal. It’s for those sampling distributions of the coefficients estimates where the CLT would come into play. The p-values for the coefficients are, of course, based on those coefficient sampling distributions.

However, the answer can also be no! Often times the residuals won’t follow a normal distribution because you’re specifying an incorrect model. You might not be including all the relevant variables, not modeling the curvature correctly, not including interaction terms, etc. Model specification errors can produce nonnormal residuals. In that case, I don’t think having a sufficiently large sample size fixes the problem. Chances are your coefficients are biased and not meaningful because the model is just wrong.

Consequently, the answer depends on what is causing the nonnormal residuals. Also, I don’t have time to thoroughly research this issue, but if you’re doing this for a paper or report, I’d find some article to support this just to be sure. I’d also imagine (again, check) that it’s really the sample size per number of model terms that is important. You’d need many observations per model term. And, of course, you’d have to be certain that you’re specifying the correct model!

I hope this helps!

Nicole Paschal says

For the histogram with the line over it, are you saying that the line is the actual or the estimated data that the collected histogram data fits into? Thank you.

Jim Frost says

Hi, which section is this graph in? I’m not exactly sure which one you’re referring too.

JOHN HAROLD says

I’ll look out for your maiden book on Regression Analysis next year.

JOHN HAROLD says

“Making complex concepts simpler”. That is your trademark.

I often refer my students to read some of your posts especially after I have introduced them to the topic. They thank “me” for showing them another perspective. But the credit is rightfully yours. Thanks.

Jim Frost says

Hi John, thanks so much for your kind words. They mean a lot to me because that quote is what I strive for with this blog. Thanks for sharing with your students too!

Linda says

Hi Jim,

Thanks for a great post. I just have one quick question about the application of the CLT.

When we use the CLT we can find the probability of a certain event but I am wondering how that probability works with a skewed population.

If a population has for example a large right skew with the most common values around 0 minutes but we know that in the normally distributed sample distribution, the most common values center around the mean (which is the same mean as in the population). I.e. having 68% of the values around the mean in the sample vs most of the values around 0 in the population makes me feel as if the probabilities from the sample do not apply to the population. Would you be willing to explain how this works?

Thanks,

Linda

Jim Frost says

Hi Linda,

I think I see where there might be a slight misunderstanding if I’m reading your comment correctly. In this post, I think the example graph I show in the section “Moderately Skewed Distribution and the Central Limit Theorem” roughly matches the scenario in your comment other than the fact that the most common values are not around zero. So, I’ll use the graph in that section to answer your question.

There are two different types of distribution in play here. There is the distribution of the data values in the population, which is the grey distribution in the graph. That’s the distribution of the actual data. That distribution estimates the probability of

individual valuesoccurring in the population.Then, there is the sampling distribution of the mean, which I show using different colors to represent different sample sizes. This is a distribution of

sample meansrather than individual values. You’re correct that the probabilities are different. You can calculate the probabilities of individual values (or ranges technically for continuous data) from the data distribution. And, you can calculate probabilities associated with sample means using the sampling distribution of the means.For example, if your sample size is large enough so that the sampling distribution approximates a normal distribution, then roughly 68% of

sample meansfrom that population will fall within +/- 1 standard deviation of the sampling distribution. The standard deviation for the sampling distribution of the means is called the standard error of the mean and it equals the population standard deviation divided by the square root of the sample size.Point being that the sampling distribution has different properties (particularly the standard deviation) than the data distribution. Hence, you’ll obtain different probabilities for a particular individual value versus a sample mean of the same value–which makes sense when you think about it. An individual value is very different from a sample mean even when they have the same numeric value. The graph in the section that I reference shows this visually.

I hope this answers your question. Please let me know if there’s anything else I can clarify!

Sandeep Ray Chaudhuri says

Hi Jim,

Your Blog is really helpful in brushing up/learning afresh key concepts in Statistics. I have a suggestion that if the key topics are structured then it will help people who are new to learn in a structured manner.

Keep up the awesome work!

Jim Frost says

Hi Sandeep,

Thanks for the kind words! I’ll be writing various books that present these topics in an organized manner. The first one on regression analysis will be available in early 2019, and there will be more to follow!

Surya says

Jim…you are a Gem of a person :-)…I have suggested my Friend as well to subscribe to your posts

Jim Frost says

Thank you so much, Surya! And, thanks for sharing! 🙂

Janna Beckerman says

It totally helps. Thank you!

Janna Beckerman says

Thank you so much for your blog. My question: You said “Typically, statisticians say that a sample size of 30 is sufficient for most distributions. ” I, too, was taught to obtain a sample size of ~30, but I can’t figure out where we all came up with that number. I’ve asked colleagues. No one knows. Do you? And will you share how this number came about?

Jim Frost says

Hi Janna,

You’re very welcome! This number is what emerges as a good rule of thumb for sample sizes that will generally produce an approximately normal sampling distribution for most types of probability distributions. The idea is that when you meet this threshold, you don’t have to worry about whether your data are normally distributed when you use a parametric test that otherwise assumes normal data. (Of course, depending on your study area, you might need a larger sample size to have adequate statistical power, but that’s a different matter.)

There is a fudge factor around this number in several ways. For one thing, how closely does the sampling distribution need to approximate the normal distribution to be good enough? And, the degree to which the population distribution differs from the normal distribution affects this number. In this blog post, some of the examples illustrate how sometimes n=20 is sufficient while in the extremely skewed distribution, a sample size of 80 was not sufficient. That example is probably more skewed than most real data. But, it illustrates the point that the number depends on the shape of the distribution in the underlying population.

Not all statisticians agree. I think 30 is the most common number that I hear. But, others say 40 just to be safe. And, I used to work at Minitab statistical software and a group there did a study about this where they assessed what sample size is required for nonnormal data so that the actual type I error rate matches the significance level for various parametric tests–and that ultimately links back to the CLT theorem and the ideas discussed in this post.

They developed a table of sample sizes based on the type of analysis. You can see this table in my post about parametric vs. nonparametric analysis. In that post, there is also a link to the white paper that they developed. For example, they conclude that a 1-sample t-test requires at least a sample size of 20. Indeed, the moderately skewed example in this post produces a fairly normal looking sampling distribution with a sample size of 20.

I think its hard to find a concrete reference to this number because it’s more a rule of thumb. There’s no formula or calculation that spits out this number. Both a researcher’s notion of how closely the sampling distribution needs to approximate the normal distribution and how different their distribution is from the normal distribution affect the number they will use! 20-40 should be good for most distributions.

I hope this helps!

Debashis Dalai says

Thank you so much Jim for all the efforts put into simplifying a complex subject like Statistics! You help us a lot. Thanks again!

MG says

Great job Jim.

Jim Frost says

Thanks, MG! 🙂