The central limit theorem in statistics states that, given a sufficiently large sample size, the sampling distribution of the mean for a variable will approximate a normal distribution regardless of that variable’s distribution in the population.
Unpacking the meaning from that complex definition can be difficult. That’s the topic for this post! I’ll walk you through the various aspects of the central limit theorem (CLT) definition, and show you why it is vital in statistics.
Distribution of the Variable in the Population
Part of the definition for the central limit theorem states, “regardless of the variable’s distribution in the population.” This part is easy! In a population, the values of a variable can follow different probability distributions. These distributions can range from normal, left-skewed, right-skewed, and uniform among others.
This part of the definition refers to the distribution of the variable’s values in the population from which you draw a random sample.
The central limit theorem applies to almost all types of probability distributions, but there are exceptions. For example, the population must have a finite variance. That restriction rules out the Cauchy distribution because it has infinite variance.
Additionally, the central limit theorem applies to independent, identically distributed variables. In other words, the value of one observation does not depend on the value of another observation. And, the distribution of that variable must remain constant across all measurements.
Related Post: Understanding Probability Distributions and Independent and Identically Distributed Variables
Sampling Distribution of the Mean
The definition for the central limit theorem also refers to “the sampling distribution of the mean.” What’s that?
Typically, you perform a study once, and you might calculate the mean of that one sample. Now, imagine that you repeat the study many times and collect the same sample size for each one. Then, you calculate the mean for each of these samples and graph them on a histogram. The histogram displays the distribution of sample means, which statisticians refer to as the sampling distribution of the mean.
Fortunately, we don’t have to repeat studies many times to estimate the sampling distribution of the mean. Statistical procedures can estimate that from a single random sample.
The shape of the sampling distribution depends on the sample size. If you perform the study using the same procedure and change only the sample size, the shape of the sampling distribution will differ for each sample size. And, that brings us to the next part of the CLT definition!
Central Limit Theorem and a Sufficiently Large Sample Size
As the previous section states, the shape of the sampling distribution changes with the sample size. And, the definition of the central limit theorem states that when you have a sufficiently large sample size, the sampling distribution starts to approximate a normal distribution. How large does the sample size have to be for that approximation to occur?
It depends on the shape of the variable’s distribution in the underlying population. The more the population distribution differs from being normal, the larger the sample size must be. Typically, statisticians say that a sample size of 30 is sufficient for most distributions. However, strongly skewed distributions can require larger sample sizes. We’ll see the sample size aspect in action during the empirical demonstration below.
Central Limit Theorem and Approximating the Normal Distribution
To recap, the central limit theorem links the following two distributions:
- The distribution of the variable in the population.
- The sampling distribution of the mean.
Specifically, the CLT states that regardless of the variable’s distribution in the population, the sampling distribution of the mean will tend to approximate the normal distribution.
In other words, the population distribution can look like the following:
But, the sampling distribution can appear like below:
It’s not surprising that a normally distributed variable produces a sampling distribution that also follows the normal distribution. But, surprisingly, nonnormal population distributions can also create normal sampling distributions.
Related Post: Normal Distribution in Statistics
Properties of the Central Limit Theorem
Let’s get more specific about the normality features of the central limit theorem. Normal distributions have two parameters, the mean and standard deviation. What values do these parameters converge on?
As the sample size increases, the sampling distribution converges on a normal distribution where the mean equals the population mean, and the standard deviation equals σ/√n. Where:
- σ = the population standard deviation
- n = the sample size
As the sample size (n) increases, the standard deviation of the sampling distribution becomes smaller because the square root of the sample size is in the denominator. In other words, the sampling distribution clusters more tightly around the mean as sample size increases.
Let’s put all of this together. As sample size increases, the sampling distribution more closely approximates the normal distribution, and the spread of that distribution tightens. These properties have essential implications in statistics that I’ll discuss later in this post.
Related Posts: Measures of Central Tendency and Measures of Variability
Empirical Demonstration of the Central Limit Theorem
Now the fun part! There is a mathematical proof for the central theorem, but that goes beyond the scope of this blog post. However, I will show how it works empirically by using statistical simulation software. I’ll define population distributions and have the software draw many thousands of random samples from it. The software will calculate the mean of each sample and then graph these sample means on a histogram to display the sampling distribution of the mean.
For the following examples, I’ll vary the sample size to show how that affects the sampling distribution. To produce the sampling distribution, I’ll draw 500,000 random samples because that creates a fairly smooth distribution in the histogram.
Keep this critical difference in mind. While I’ll collect a consistent 500,000 samples per condition, the size of those samples will vary, and that affects the shape of the sampling distribution.
Let’s test this theory! To do that, I’ll use Statistics101, which is a freeware computer program. This is a great simulation program that I’ve also used to tackle the Monty Hall Problem!
Testing the Central Limit Theorem with Three Probability Distributions
I’ll show you how the central limit theorem works with three different distributions: moderately skewed, severely skewed, and a uniform distribution. The first two distributions skew to the right and follow the lognormal distribution. The probability distribution plot below displays the population’s distribution of values. Notice how the red dashed distribution is much more severely skewed. It actually extends quite a way off the graph! We’ll see how this makes a difference in the sampling distributions.
Let’s see how the central limit theorem handles these two distributions and the uniform distribution.
Moderately Skewed Distribution and the Central Limit Theorem
The graph below shows the moderately skewed lognormal distribution. This distribution fits the body fat percentage dataset that I use in my post about identifying the distribution of your data. These data correspond to the blue line in the probability distribution plot above. I use the simulation software to draw random samples from this population 500,000 times for each sample size (5, 20, 40).
In the graph above, the gray color shows the skewed distribution of the values in the population. The other colors represent the sampling distributions of the means for different sample sizes. The red color shows the distribution of means when your sample size is 5. Blue denotes a sample size of 20. Green is 40. The red curve (n=5) is still skewed a bit, but the blue and green (20 and 40) are not visibly skewed.
As the sample size increases, the sampling distributions more closely approximate the normal distribution and become more tightly clustered around the population mean—just as the central limit theorem states!
Very Skewed Distribution and the Central Limit Theorem
Now, let’s try this with the very skewed lognormal distribution. These data follow the red dashed line in the probability distribution plot above. I follow the same process but use larger sample sizes of 40 (grey), 60 (red), and 80 (blue). I do not include the population distribution in this one because it is so skewed that it messes up the X-axis scale!
The population distribution is extremely skewed. It’s probably more skewed than real data tend to be. As you can see, even with the largest sample size (blue, n=80), the sampling distribution of the mean is still skewed right. However, it is less skewed than the sampling distributions for the smaller sample sizes. Also, notice how the peaks of the sampling distribution shift to the right as the sample increases. Eventually, with a large enough sample size, the sampling distributions will become symmetric, and the peak will stop shifting and center on the actual population mean.
If your population distribution is extremely skewed, be aware that you might need a substantial sample size for the central limit theorem to kick in and produce sampling distributions that approximate a normal distribution!
Uniform Distribution and the Central Limit Theorem
Now, let’s change gears and look at an entirely different type of distribution. Imagine that we roll a die and take the average value of the rolls. The probabilities for rolling the numbers on a die follow a uniform distribution because all numbers have the same chance of occurring. Can the central limit theorem work with discrete numbers and uniform probabilities? Let’s see!
In the graph below, I follow the same procedure as above. In this example, the sample size refers to the number of times we roll the die. The process calculates the mean for each sample.
In the graph above, I use sample sizes of 5, 20, and 40. We’d expect the average to be (1 + 2 + 3 + 4 + 5 + 6 / 6 = 3.5). The sampling distributions of the means center on this value. Just as the central limit theorem predicts, as we increase the sample size, the sampling distributions more closely approximate a normal distribution and have a tighter spread of values.
You could perform a similar experiment using the binomial distribution with coin flips and obtain the same types of results when it comes to, say, the probability of getting heads. All thanks to the central limit theorem!
Why is the Central Limit Theorem Important?
The central limit theorem is vital in statistics for two main reasons—the normality assumption and the precision of the estimates.
Central limit theorem and the normality assumption
The fact that sampling distributions can approximate a normal distribution has critical implications. In statistics, the normality assumption is vital for parametric hypothesis tests of the mean, such as the t-test. Consequently, you might think that these tests are not valid when the data are nonnormally distributed. However, if your sample size is large enough, the central limit theorem kicks in and produces sampling distributions that approximate a normal distribution. This fact allows you to use these hypothesis tests even when your data are nonnormally distributed—as long as your sample size is large enough.
You might have heard that parametric tests of the mean are robust to departures from the normality assumption when your sample size is sufficiently large. That’s thanks to the central limit theorem!
For more information about this aspect, read my post that compares parametric and nonparametric tests.
Precision of estimates
In all of the graphs, notice how the sampling distributions of the mean cluster more tightly around the population mean as the sample sizes increase. This property of the central limit theorem becomes relevant when using a sample to estimate the mean of an entire population. With a larger sample size, your sample mean is more likely to be close to the real population mean. In other words, your estimate is more precise.
Conversely, the sampling distributions of the mean for smaller sample sizes are much broader. For small sample sizes, it’s not unusual for sample means to be further away from the actual population mean. You obtain less precise estimates.
In closing, understanding the central limit theorem is crucial when it comes to trusting the validity of your results and assessing the precision of your estimates. Use large sample sizes to satisfy the normality assumption even when your data are nonnormally distributed and to obtain more precise estimates!
Dr Dilip Raj says
very informative sir.
Todd says
Hello Jim:
First of all, terrific site so keep up the good work. I will certainly visit frequently given my upcoming coursework.
I’ve been trying to find software or a website that would allow me to play w/ data and see how the histogram changes. What would be your recommendations?
For example, I’m looking at a histogram that is left skewed w/ n of 12K. I want to see how the distribution shape changes if n is significantly less, significantly more, and then around 10K. How would this histogram change in appearance with these varying sample sizes?
Thanks
Jim Frost says
Hi Todd,
I’ve used freeware called Statistics101 for that. In fact, that’s what I’ve used in this post. You can sample data from various distributions and create histograms from them. You can do that for the distribution of individual values as well as for sampling distributions with samples of whichever size you set. You can see both in this post.
I’d suspect that you won’t notice a significant visual difference between 10k and 12k. Both are very large sample sizes. Starting with small samples (n < 100), you'll start with blocky distributions that don't represent the shape of the parent distribution well. As you grow larger from there, the the histograms should consistently reflect the shape of the parent distribution but still have some "blockiness." As the sample size increases, the histogram bars become narrower and narrower and start to closely follow the smooth curve of the parent distribution . Eventually, as sample size grows, the distribution of bars almost looks like a continuous distribution.
Jakob says
Hi Jim,
Sorry for replying to a previous comment; I can’t figure out how to create a new comment.
Is it possible for you to have a sample size that is large enough that the central limit theorem no longer applies? If the population distribution is skewed, but a sample size of sufficient size is not skewed, wouldn’t a sample size that is too large start to look like the skewed population distribution?
Many thanks,
Jakob
Jim Frost says
Hi Jakob,
You should just be able to click in the comment box at the bottom to post a new one.
I think I understand what you’re asking. The thing to remember is that you can have large samples sizes, but your plotting the sample means (or other statistic) for those samples. That’s the distribution in question. And the distribution of sample means will converge on the normal distribution as sample size increases.
So, the parent distribution can be skewed, and your large samples might be similarly skewed, but the distribution of their means won’t be skewed with a sufficiently large sample size.
Ary Agrawal says
Hey Jim, I enjoy reading your articles. I am working on an assignment that deals with the topic of the central limit theorem, and I was curious to know what simulation software you are using. I wish to calculate the probability of each of the mean value that a sample size of a certain discrete distribution will give me, and even writing a program in Java doesn’t seem to help me for large values where the sample size exceeds 20.
Jim Frost says
The simulation software I used was Statistics 101. I mention it and have a link to it in the “Empirical Demonstration . . . ” section in this post. It’s giftware . . . meaning they hope you donate some money to use it.
Harald Foidl says
Hi Jim!
Great post, well explained!
Although I understand the theorem and also can prove it with simulations I am struggling with an intuitive explanation why the CLT works?
I am aware that the theorem was mathematically proven – but what should I say to my grandma when she asks “why as sample sizes increase, the sampling distribution of the mean approaches a normal distribution”?
Jim Frost says
Hi Harald,
That’s a great question! As you might know, I love presenting intuitive explanations whenever possible on my website. I’ll have to admit, I had to think about this one for awhile! Here’s what I came up with. I hope it works. It’s a tough one to explain intuitively!
Imagine you have a large basket of fruit that’s quite varied—some apples, a bunch of bananas, a few oranges, etc. This basket represents a population with a non-normal distribution, where each type of fruit represents different values in your population.
Now, let’s say you start making smoothies. Each smoothie is like a sample from your population. In each smoothie (sample), you put a random assortment of fruit (data points). If you make a small smoothie (a small sample), the taste of that smoothie might be heavily influenced by the particular fruits you picked. Maybe you got more bananas or mostly apples, so the flavor (sample mean) is skewed towards those fruits.
However, as you start making larger and larger smoothies (increasing your sample size), something interesting happens. The chance of getting a skewed combination of fruits decreases. You’re more likely to get a more balanced mix of all the fruits in your basket in each smoothie. This is because, with a larger sample, the peculiarities of any single type of fruit (outliers in your population) are averaged out by the presence of all the other types.
As you keep increasing the size of your smoothies, the flavor (sample mean) of each smoothie begins to converge towards a consistent, ‘average’ taste, regardless of the original fruit distribution in the basket. This average flavor is akin to the normal distribution in the Central Limit Theorem. It represents the mean of all your smoothies (sample means) becoming more predictable and less variable as your sample size increases.
Harald Foidl says
Hey Jim! Thank you very much for your time thinking about my question. You really were able to explain it intuitively to me! Great example!
Best,
Harald
Alex F says
Hi Jim,
Just discovered your website and love your content.
If we took a sample from a across a whole population of size n, and let X be the random variable for the value of an observation in each sample and sd(X) be the standard deviation of X across the whole population, the standard deviation of the distribution of sample means for such a sample would be exactly zero (rather than sd(X)/sqrt(N) as indicated by the CLT), since each time you repeated the sample across the whole population, the sample mean would be exactly the same each time, since the sample would be exactly the same each time it was carried out (the sample each time would be the entire population).
Does this indicate that the standard deviation of the sample means is only approximately sd(X)/sqrt(N), rather than exactly sd(X)/sqrt(N)?
Many thanks,
Alex
Jim Frost says
Hi Alex, I’m so glad my website has been helpful!
As for your question. Remember that the central limit theorem applies to samples, and you’re talking about the entire population. By definition, that’s not a sample. The CLT is not applicable in that scenario.
David Connell says
Hi Jim
I think I am right in summarising that when you use the CLT you get the probability of an event ( for the population) based on the sample. As sample size increases the accuracy of this probability increases. This is independent of the underlying distribution type. My question is whether the probability calculated is an estimate or not. I read that if the underlying distribution is Normal it is not an estimate and I wonder is this also the case if the underlying distribution is not Normal eg Poisson, geometric.
Jim Frost says
Hi David,
That’s sort of it. The CLT applies to sampling distributions of the mean. Consequently, it doesn’t apply to individual events but to sample means. You can use sampling distributions of the mean to find the probability of obtaining a particular sample mean with a specified sample size in populations with specified properties.
The blog post explains this in more detail but, in summary, the CLT states that when the distribution of values in a population is nonnormal, the sampling distribution of the mean will approach a normal distribution with specific properties as sample size grows.
So, if you have a nonnormal distribution of values, then, yes, if you use a normal sampling distribution, it is an approximation of the real sampling distribution. As your sample size grows, the difference between the true sampling distribution and your normal approximation becomes smaller. Frequently, statisticians say that a sample size of 30 will often produce a sampling distribution that closely approximates a normal distribution. However, some severely skewed distributions might require much larger sample sizes. I use simulations in this blog post to show the approximation process.
In one sense, if you start with a normal distribution of values in the population, then the normal sampling distribution is correct and not an approximation of the real one. However, assuming you’re working with a sample and not the full population, then you’re by definition working with estimates. The sample estimates the distribution of values in the population, which in turn estimates the sampling distribution. But, yes, if the population distribution of values is normal, then the sampling distribution is normal even with small sample sizes. However, keep in mind that small samples produce relatively imprecise estimates of the population distribution, and in turn an imprecise estimate of the sampling distribution. So, a normal distribution of population values doesn’t get you off the hook for needing a good sample size!
Prachi says
Hi Jim! I really really love your blogs, they are really helpful and well explained.
It would be even more great if this website had an option for switching to dark mode. Please do consider it 🙂
Love from India <3
Ben says
Hi Jim,
That was a really interesting read, very well explained, thank you for that.
I just have one hopefully simple question. When calculating confidence intervals for a large non-normal distribution, can the central limit theorem be applied so that the confidence interval is calculated using the same method as for a normal distribution (with Z values)? Or does the confidence interval need to be calculated using t-values instead or another method maybe?
Sorry if this has already been explained!
Thanks in advance,
Ben
Jim Frost says
Hi Ben,
In the sense that the sampling distribution converges on the normal distribution, yes, you can use z-values to calculate the confidence intervals. However, CIs based on z-scores will always be narrower than using t-values to some degree. T-values account for the changes in precision for different sample sizes, whereas Z-score assume you know the standard deviation of the population or have an infinitely large sample!
The size of that difference between the two methods depends on your DF. With larger sample sizes, the difference becomes trivial. I show this difference between t-value and Z-scores in my post about Confidence Intervals. Read that for more details!
Generally, I’d just use t-values because they are designed with the sample size in mind.
Ekaveera Kumar says
Thanks Jim. Can i ask any doubts in statistics here?
Jim Frost says
Hi Ekaveera,
Yes, you can! Please post your questions in the comment section of the relevant post. That helps keep the questions and answers organized in the right posts. If you need help finding the relevant post, use the search box near the top of the right margin.
Jeson says
Hi Jim
May i have your insight on below question:
as you state in post that “the sampling distribution converges on a normal distribution where the mean equals the population mean, and the standard deviation equals σ/√n”, why do the SPC use different estimator for population standard deviation, like X bar-s chart, population std is estimated by sum of sample std divided by sample amount?
Appreciated your comments.
Faith says
You’ve saved my stats grade! Thank you so much. I have a question.
If the question says to find the probability if the average duration lies within 0.8 of the population mean. Let’s assume the average is um 2.5 in this case. How will that be solved. I got a question like this and the word “within” is confusing.
Jim Frost says
Hi Faith,
Yay! I’m so glad my website was helpful!
Within in this context probability refers to the mean +/- 0.8. However, there should be some units with that 0.8. Is it within +/- 0.8 standard deviations of the mean? Or is it the mean 2.5 +/- 0.8? If the data follow the normal distribution, you will presumably have some way to convert it to a z-score (or it is the z-score) and use that to find the probability. My z-table article not only has the z-table but shows you how to use it for various purposes.
gabrielle says
Hi Jim – thank you so much for the post!
I have been wondering what does the sample size mean in the central limit theorem, as it could mean one of the two things 1) the # of observations in each sample 2) # of samples repeatedly draw from population. After several readings & simulation myself, I think I would want to agree it is the 2nd: the # of samples repeatedly draw from population, although the # of the observations also plays some roles in determine the sampling distribution of the sample means. Here is what i found & please lmk if otherwise.
The sampling distribution of sample means will approach to normal distribution, regardless of underlying population distribution, if repeatedly draw infinite N times. However, if the # of observations are large (say, >30), the sampling distribution will be tighter and more normal, compare to smaller sample, given the same # of repeatedly draws.
Does it sound right to you?
Jim Frost says
Hi Gabrielle,
I write about it in this post. Sample size is the number of observations in each sample. If you have a skewed population distribution and you use a small sample size (e.g., n = 5), the sampling distribution will be skewed even if you draw hundreds of thousands of random samples from the population.
Please read the examples I show in the post more closely to understand this issue. Pay particular attention to the moderately skewed distribution example where I use different samples sizes (n = 5, 20, & 40) but draw each sample size the same number of times (500,000) from the population.
Mihoby says
Indeed, thank you!
Mihoby Razafinimanana says
Hello,
First, Thank you for your super clear and helpful post, it helped me a lot with my data!
Second, if I remember well, you were providing a table with a miminum sample size depending on the statistical test we want to apply, and If I’m not mistaken, isn’t present anymore in your post.
Is there a reason for that? Or else do you have a reference to give to get a clear sample size number ?
Thanks a lot!
MR
Jim Frost says
Hi Mihoby,
I know the table you’re referring to. I actually include it in a different post about parametric vs. nonparametric tests. Click the link to see it!
Kewal says
Hi Jim, amazing post. I’m new to stats, accept my apologies in advance if this is a stupid question.
Let’s say i have data of 500000 bank withdrawals, i take a sample of 100 withdrawals, find its mean and repeat this process 100000 times. The distribution of this mean would be normal and the mean of this distribution would be the mean of my 500000 withdrawals right?
Secondly, assume on one fine day a customer withdraws $100, can I use the CLT and the normal distribution i plotted above to find the probability of a $100 withdrawal? As in how likely is it that a customer would withdraw $100 from bank?
Jim Frost says
Hi Kewal,
Thanks for the kind words! And your questions definitely are not stupid!
You’re correct about the things in your paragraph about repeating a sample size of 100 for 100,000 times. That sampling distribution should follow a normal distribution and center on the mean of the 500,000 withdrawals. The only caveat is if the full population of 500,000 withdrawals is extremely skewed, then a sample size of 100 might not be large enough. But usually a sample size of 100 would be sufficient.
Unfortunately, your next paragraph isn’t quite right. Sampling distributions apply to sample means, not individual observations. So, you wouldn’t use the sampling distribution to assess an individual withdrawal of $100 and the CLT only applies to sampling distributions. Instead, you’d take your sample and use it to estimate the probability distribution that the population follows. After determining that, you can use that probability distribution function to calculate the probability associated with that withdrawal. The important difference is that you’re using the probability distribution for the population rather than the sampling distribution because you’re looking at an individual value.
The following two posts provide details about the above:
Identifying the distribution of your data
Understanding probability distributions
I hope that helps!
Niloufar says
Thank you for the answer.
I didn’t use ANOVA because variance of different groups are not same and also, the data don’t have normal distribution. So, I was thinking z-test for a large data set is appropriate because I can apply CLT.
Jim Frost says
Hi Niloufar,
The CLT also applies to ANOVA. So, that’s not a problem. And, read my post about post hoc tests and you’ll see why running a series of Z-tests is not a good idea!
Unequal variance can be a problem. In that case, you should use Welch’s ANOVA with a post hoc test. It can handle the unequal variances. Read the link for more details.
Niloufar says
Hi Jim. Thank you so much for such a great explanation for CLT. I’m not a statistician so my question may be ridiculous.
I have different experiment and I want to apply statistics to compare these experiments with a base experiment. For each experiment I have almost 300 data points, which are detected via sensors. With respect to the conditions, I want to use two sample z-test to compare each experiment with the base one, statistically using python. So, my question is that how does python calculate z and report p-value? Does it select a proper sample from my data and then calculate the average of that sample and then report p-value? OR, I have to select a sample from my data and then give the sample to python to calculate p-value?
Regards,
Nilou
Jim Frost says
Hi Niloufar,
I’m not an expert with Python, so I can’t answer that part. However, from a statistical standpoint, if you have three or more experiments (including the base experiment), then you’ll need to perform one-way ANOVA and follow that up with a post hoc test. There are post hoc tests specifically designed for comparing each experiment to a baseline. Read my article about using post hoc tests with ANOVA and you’ll see why you should NOT perform a series of z-tests to do this but instead use ANOVA with a post hoc test.
Chris Gibbons says
Hi jim, thank you for your kindness in helping a global community of learners.
In hypothesis testing when exploring differences, we compare the means of the control and exp conditions and we look at the size of the difference in the means and how likely the size of that difference could have occurred by chance.
But from a purely statistical point of view, the calculus just looks at how likely the observed mean could have occurred by chance ie where it would sit in a distribution of multiple means, so why have a control condition?
I understand the obvious reason from a research standpoint, but my query is from the stats standpoint?
Jim Frost says
Hi Chris,
Technically, the hypothesis tells you the likelihood of obtaining your result, or more extreme, under the assumption that the null hypothesis is correct. It’s not quite accurate to say, “occurring by chance.” Occurring by chance but when you assume the null it true. So, a conditional probability. It actually makes an important difference in how you interpret the p-value. It’s not just a minor semantics difference. For more information, read my post about interpreting p-values.
Now, on to your question! Why do you have a control group or condition? Without a control group, you wouldn’t have any basis of comparison for the outcome in the treatment group. You’d know what the outcome was for that group, but is it better than if you didn’t administer the treatment? There’s no way to know. For more information, read my post about control groups, where I discuss this issue specifically. You say you know this from a research standpoint, but I’d say it’s the same from a statistics standpoint. Unless I’m misunderstanding what you’re asking.
However, comparing groups isn’t required for hypothesis testing. You can perform a 1-sample test and estimate its CI if you don’t want to make a comparison. So, it’s not required. You can do that if you just want to use a sample to estimate the population value of, say, the mean. Alternatively, you can supply a reference value in a 1-sample test to determine whether the sample value is significantly different from that value. You’d need to choose a reference value that is meaningful to your study area.
I’m not sure if I’m answering your question. Let me know if I’m not!
De says
Hi Jim,
How do you know if a given distribution is normally distributed? How do you ‘proof’ that, and therefore justify the fact that you are e.g., using a parametric test on a small sample size.
Regards,
De
Jim Frost says
Hi De,
That can be a tricky situation! When you have a small sample size, a normality test might produce a high p-value (indicating a normal distribution) because of the small sample size rather than because the data are consistent with a normal distribution. With a small sample size, the test has low power for detecting deviations from the normal distribution.
Typically, you’d use subject-area knowledge and previous research to establish that your measurements follow a normal distribution. If you can’t do that, then you really need to consider a nonparametric test!
accioyugen says
I learned a lot. Thanks
Can I know what app/site you used in making the graph in “Moderately Skewed Distribution and the Central Limit Theorem” section?
Thank you so much
Gihan Ragab says
Hi Jim,
Thanks for the illustration.
What is a z statistic and what qualifies a statistic to be z statistic based on the central limit theorem and the basic properties of normal distributions?
Jim Frost says
Hi Gihan,
Please read my post about z-scores. And you can also read about the normal distribution. They should answer some of your questions.
I’m not exactly sure what you’re asking. Hopefully, those article help. And then if you have more specific questions, you can post them in the appropriate place.
Marq Piper says
Hi Jim,
Thank you for the excellent post. I have a question regarding the iid assumption for the CLT to hold.
I want to do a t-test on two means from a distribution that is not normally distributed. According to the CLT I can do this given a large sample (which I have). However, it may be that the underlying data is not independently distributed. Is it sufficient that the sampling distribution of means is iid, or does the actual underlying data have to be iid ?
Thank you very much for your help.
Jim Frost says
Hi Marq,
Independent and identically distribution (IID) combines two broad characteristics of a sample. Click the link to learn more. I’d need to know more about how you suspect your data violates IID? There is no independent distribution assumption, so I’m not sure what you mean exactly. IID is when the observations are independent events and that they all follow an identical distribution. Are you data not independent events, they don’t all follow the same distribution, or both?
ps says
HI Jim ….can you please tell whether this theorem holds for only mean or sampling distribution of variances also
Jim Frost says
Hi, the central limit theorem applies only to the sampling distribution of the means–not variances.
Mohd says
What can you do with the outputs of normalized data?
Jim Frost says
Hi Mohd,
That’s a fairly broad question. I’ll start by referring you to my post about the normal distribution. For your question, focus on the portion about standardization and z-scores. If you have a more specific question about this topic, please post it in the comments section of that post. Thanks.
Anderson Andrade says
Hi Jim,
if the population distribution is normal, then the distribution of samples means will always be normal, regardless of the size of the samples, right? Even is the “sample size” is 1, the sampling distribution will be normal, as long as I take several replicates (in this case, we are reproducing the original distribution, right?).
Well, If I know that a given variable is normally distributed in a population (e.g., IQ) can I always use parametric tests to analyse samples, regardless of the sample size and distribution of the sample?
best,
Anderson
Jim Frost says
Hi Anderson,
Yes, that’s correct. As long as you are taking representative samples from the population, the sampling distribution should be normally distributed regardless of size. The trick is to really know that the population is normally distributed!
Mohit Kumar says
Hi Jim, thanks for the excellent article. I think when estimating a mean, doing multiple iterations (let us say 20) of a fixed sample size (let us say 50) is equivalent of doing one iteration of 1000 samples. Is that correct? Does this mean we never need to do multiple iterations?
Jim Frost says
Hi Mohit,
In the context of the central limit theorem, it’s different. If we draw 50 random samples of size 20 from the same population, they will have a range of characteristics. You can plot the distribution of those characteristics (i.e., mean and standard deviations) on histograms. One the whole, you’d expect the overall mean of the samples to equal the population mean. The standard deviation of the distributions of the means (known as the standard error of the means) would be a particular value. Now, if you have one large random sample of 1000 from the same population, you’d expect it to fall near the population mean. However, because the sample size is so much larger, you’d expect the precision of the estimate to be much higher.
In reality, a study would never do multiple iterations. Well, almost never. Statistical process control does use repeated sampling. But generally no. Just collect a large sample size! The reason I show multiple iterations here is so you can see how the central limit theorem works. It works out mathematically. But it also works out when you use simulations to draw random samples from the same population. But, in most real studies, just collect one good sized sample! From the one sample, your statistical software will estimate the sampling distribution using the appropriate formulae. You want those calculations to be based on the largest single sample you can reasonably collect!
Amanda Muller says
Can you explain how the central limit theorem relates to sample numbers for categorical variables please?
Evangelist Ndifreke Akpan says
Please for the right top above can you give me references, I need them. Thanks
Jim Frost says
Hi, I’m not sure what you’re referring to when you say “the right top above.” Please be more specific so I can help you. Thanks!
Praise Ololade Farayola says
Wow. Nice article. please how does the equation change when the sample size (n) is not the same for all sample?
Thant Ku Tay Aung says
Hello , Jim
I don’t get that ..” Fortunately, we don’t have to repeat studies many times to estimate the sampling distribution of the mean. Statistical procedures can estimate that from a single random sample.”
how can we estimate the sampling distribution of the mean if we have only a single random sample ?
Jim Frost says
Hi Thant,
That’s the magic of statistics and it took statisticians to develop the distributions that estimate sampling distributions! These distributions form the basis of many hypothesis tests, such as t-tests, F-tests, and chi-square tests. The tests are named after the distributions that estimate sampling distributions. The bootstrapping method uses an entirely different technique to estimate sampling distributions. For more information about how it works, click the links.
Mark says
Thank you. Very helpful blog.
Junaid Ali says
What is the relationship of CLT with inferential statistics?
Steve Montgomery says
I am running a Monte Carlo analysis on a project schedule and I am interested in both realistic duration and cost. If I understand your blog correctly, as the number of Monte Carlo iterations increases the more the Central Limit Theorem (CLT) causes the random value outputs to form a narrow normal distribution. So the random numbers cluster in the middle and the range of min/max is narrower than reality. If this is the case, would it be best to reduce the number of iterations in the Monte Carlo analysis?
–Steve
Jim Frost says
Hi Steve,
There are two things to distinguish between.
The first is the number of iterations. Increase the iterations does two main things. It creates smoother curves and more consistent results if you repeat the analysis. Increasing the iterations does not narrow the distribution.
The other aspect is the sample size for each iteration. And, it’s the sample size that reduces the spread of the distribution. Larger samples produce tighter sampling distributions of the mean and those sampling distributions will more closely approximate the normal distribution. The number of samples you’d use depends on your scenario. In this post, I show different samples size to show how that affects the spread of sample means. It illustrates why large samples produce more precise estimates along with more closely approximating the normal distribution.
I hope that helps!
ayushi says
Hello,
I have used z value for descriptive methods and graphical methods for establishing normality till now, There were slight discrepancies but normality was established by referring to research papers to help cover the discrepancy.
I have read that it is helpful to use a combo of plots and tests for large samples and thats why I decided to use Shapiro-Wilk test. If normality is established in the graphical and descriptive methods, do I still need to use the normality tests?
Jim Frost says
Hi Ayushi,
You should be using Normality Tests and normal probability plots to assess normality–and the distribution of continuous data in general. I have to posts that will help you. Identifying the distribution of your data and using normal probability plots to assess distributions. Those posts should answer your questions.
Ayushi says
Hello
My question : I am working on my dissertation and establishing normality for a sample of 200 respondents. there was a slight deviation in normality which was worked upon in descriptive statistics. But when i caluclated the Shapiro-wilk test, both my variables ended up being less than the standard p-value. Can I skip using Shapiro and solely rely on graphs and descriptive stats for establishing normality?
Jim Frost says
Hi Ayushi,
Which descriptive statistics are you using to assess normality? The top two things you should be using to assess normality are normality tests and normal probability plots. I cover both of those in my post about identifying the distribution of your data. That doesn’t focus entirely on normality tests. It includes them but also tests for other distributions.
Also read my post about using normal probability plots for assessing normality. There are cases where you’d actually use the graph results over the hypothesis test results.
Finally, with 200 observations, you might not need to worry so much about whether your data follow the normal distribution. I write about that in my post about parametric vs. nonparametric tests.
I hope that all helps!
Anita says
Hi Jim,
I have some very basic stats questions.
I am having trouble determining what question #5 below is asking me for. Would possible answers be “normal, poisson, uniform” or is the question more asking me to come up with the population mean and std dev from the sample and use CLT?
5. You conduct an experiment where you collect the data in the table below. What population distribution does this sample come from? How confident are you of your conclusion?
14.79911 28.23858 22.7928 24.51667 22.50702
35.50089 21.75057 28.0851…..23.47746…..17.71201
etc. there are 100 data points (20 columns and 5 rows)
Here is the output when all the data is put in one column:
Column1
Mean 25.1957477
Standard Error 0.486145044
Median 25.213175
Mode #N/A
Standard Deviation 4.86145044
Sample Variance 23.63370038
Kurtosis -0.332161421
Skewness -0.02021877
Range 23.44958
Minimum 13.42521
Maximum 36.87479
Sum 2519.57477
Count 100
Confidence Level(95.0%) 0.964617237
Also, I need a push in the right direction with question #1 below. Would this be a z-test?
1. You receive a number of complaints from your employees that this year’s promotions were not assigned fairly (in that some VPs favored different hair colors), so you decide to determine if the distribution of promotions differed by region. You conduct a hypothesis test to this effect. (There are 5 different regions).
What is the critical value of your test statistic if you are willing to reject your null hypothesis at the = .05 level of significance? (Ensure you identify what type of statistic it is.)
Your articles are so helpful, they are clear and concise and much appreciated!
Thanks so much for your time!
Leyla Depret-Bixio says
Dear Jim,
Thank you very much for this useful post.
Sometimes as a statisticians we face situations where we have to explain the CLT to a non statistical audience in a easy and understanding way!…your explanation will really helps
Thanks
Jim Frost says
Hi Leyla, you’re very welcome! I’m glad it was helpful! 🙂
Ben says
If a sample size is large enough to rely on CLT and assume normality for the purpose of conducting a t-test, is it necessary to conduct normality tests (such as the Shapiro Wilk test) to establish that the population is normally distributed or is this essentially irrelevant to this hypothesis test?
Jim Frost says
Hi Ben,
If you have a large enough sample size, you don’t need to check for normality. If you do check and the data are nonnormal, you don’t have to worry because the sampling distribution for the means will still be normally distributed, which is what counts.
However, with smaller sample sizes, satisfying the normality assumption is important.
Luana says
Hello
Does the CLT also work with other than the mean, e.g. with the median? And is there no distribution that is not affected by the ctl?
sincerely
Luana
Jim Frost says
Hi Luana,
As I point out in this post, the CLT applies to most but not all distributions. First, the distribution must be for independent, identically distributed variables. And, the population must have a finite variance.
The CLT can work for the median with certain distributions. But it must satisfy more conditions than for the mean. These are two complex to discuss here. So, the short answer is to not expect that the CLT applies to the sampling distribution of the medians without checking the properties of that distribution specifically.
I hope this helps!
Rizza says
Can you give me an example
Jim Frost says
You’ll need to be more specific. I provide multiple examples throughout this post.
SHRINIVAS SHIVAJIRAO JADHAV says
Hi…… I have a 30pcs dimensional data which is non-normal, How to apply CLT to this data to calculate subgroup sample size <30 and the resulting output will be normal & remains within specification limits?
Jerome says
Greetings!
I truly enjoyed reading your post! but why didn’t you plot your sampling distributions on a normal probability plot? it would help visualizing how much they approximate a strict normal distribution: not all bell-shaped distributions are indeed normal (e.g. the t with low d.f., the Cauchy).
I believe that your example with the dice toss is not a good example of a uniform distribution, because the middle value sums can be obtained from a greater number of dice combinations than the extreme value sums.
Besides, if your point is to show that the sampling distribution of a statistic is continuously distributed and approximates (I would emphasize this word!) a normal distribution even if the observed data does not come from a continuous distribution, why don’t you sample from a discrete distribution? I am especially curious to see the results from sampling a negative binomial or a multinomial, and from a mixture distribution such as a zero-inflated Poisson: the central limit theorem does not work with the zero component of the latter distribution.
Respectfully yours,
Jerome
Jim Frost says
Hi Jerome,
That’s a good idea to plot the sample means on a normal probability plot to show how closely the follow the normal distribution. Unfortunately, it’s a software issue on my end. The software I used for the random sampling can’t create normal probability plots. I might be able to export those sample means but I’m calculating hundreds of thousands of sample means, which my statistical software might balk at importing. I might give it some more thought. Perhaps just using fewer sample means. I use such a large number of means because it produces the nice smooth curves!
The uniform distribution is defined as distributions where all possible values have an equal probability of occurring. As such, the outcomes of rolling a die do follow the uniform distribution. In my example, I roll only one die, not a pair of dice. So, we’re using the values of 1 to 6 with a probability of 1/6 for each possible outcome. Consequently, they follow the normal distribution.
As for discrete outcomes, I do show the die example, which is a discrete distribution. Also, if you want to see an example that uses the binary distribution, read my post about Revisiting the Monty Hall Problem. That post is not about the central limit theorem but I do show the sampling distributions for two different binary distributions (i.e., different probabilities of events occurring) for sample sizes ranging from 10 to 400. And, you’ll see that as the sample size increase, it more closely approximates the normal distribution.
I think the additional distributions you mention are interesting possibilities. The central limit theorem does not apply to the Cauchy distribution because that distribution does not have a finite variance–which is one of the assumptions of the CLT. I could include only so many distributions in a blog post before it become too long. But, the software is free and easy for anyone to use.
Thanks for writing!
M.Khidhir says
awesome content! Just to clarify, if I have a right-skewed histogram using a large sample data. Will it meet the CLT requirements?
Olga says
Hi Jim,
How do I determine a sufficient sample size?
Thanks,
Jim Frost says
Hi Olga,
I’m assuming you mean how large of a sample size where you can be sure that you can use a test for a nonnormal distribution? For most cases, sample sizes of 20-30 will be sufficient. If you have groups in your data for say ANOVA and 2-sample t-test, the sample size per group depends on the number of groups. I have a table that summarizes this property in my post about parametric vs. nonparametric analyses. These sample sizes were determined by a thorough simulation study.
I hope this helps!
F.Dali says
Hi Jim,
Is CLT applicable for large sample size derived from a non-probability sampling method (i.e convenience sampling)? I noticed that CLT carries a statement saying that the draw of the samples for mean must be random.
Jim Frost says
Hi, yes, it must be a random sample. Convenience sampling won’t necessarily produce a sampling distribution that approximates the normal distribution.
Aditya Kumar says
Every time you say “the sampling distributions more closely approximate a normal distribution as sample size increases”, it means to say sampling distribution of means approximate to normal distribution, is it so?
Jim Frost says
Yes, it includes that. But, it’s not restricted to the sampling distribution of the means. Even the sampling distribution for a binomial distribution will approximate the normal distribution with a large enough sample size.
Omar Albatayneh says
Hello, Jim,
Do you believe the sampling distributions for the slope coefficients are at least approximately Normal (necessary for validity of, for example, the p-values)? Explain.
Jim Frost says
Hi Omar,
When the residuals are normally distributed, you expect the sampling distributions for the slope coefficients to be normally distributed.
EDE WILLIAMS says
Hi Jim
Really appreciate your efforts, making CLT simple for me to understand that I don’t need anybody to explain any further.
Williams from Federal Polytechnic nekede, studying STATISTICS
Patrick says
Hi Jim,
thank you, I greatly appreciate your detailed answer to my question!
Best regards,
Patrick
Patrick says
Hi Jim,
thanks for this excellent post!
I was just wondering about the following: you are saying “In statistics, the normality assumption is vital for parametric hypothesis tests of the mean, such as the t-test. (…) if your sample size is large enough, the central limit theorem kicks in and produces sampling distributions that approximate a normal distribution. This fact allows you to use these hypothesis tests even when your data are nonnormally distributed—as long as your sample size is large enough.”
Does that basically apply to all paramteric hypothesis tests, including linear regression analysis? I once discussed this with a statistician, who objected that the normality assumption does not apply to the distribution of the dependent variable (and this is also true for the t-test which is just a special case of linear regression) but rather to the distribution of the residuals. He then argued that if I have a poor set of predictors, the model will most likely not achieve normality of the residuals, regardless of sample size.
This left me wondering whether or not I can use linear regression with large sample sizes without having to worry about distributional assumptions. Do I need normally distributed residuals at all with a large sample size – or does the CLT also apply to the residuals?
I’d greatly appreciate your view on this aspect.
Thank you and best regards,
Patrick
Jim Frost says
Hi Patrick,
Off hand, I’d say that it applies to most types of parametric hypothesis tests. Even populations that follow the binomial and Poisson distributions will have sampling distributions that follow the normal distribution with a large enough sample size. Consequently, it applies to proportions tests and Poisson rates of occurrence tests. However, I haven’t thought it through enough whether it applies to all parametric tests.
As for regression analysis, that gets a bit complicated! First, yes, the normality assumption applies to the distribution of the residuals rather than the dependent variable. However, that assumption is an optional one that applies only if you want to use hypothesis testing and confidence intervals, as you can read about in my post about OLS Assumptions.
As for whether the hypothesis test results are valid in regression analysis when residuals are nonnormally distributed with a sufficiently large sample size, I’d say the answer is both yes and no! How’s that for covering my bases?!
Here’s the rationale for both answers.
Yes, I do believe the central limit theorem kicks in with the sampling distributions for the coefficient estimates. With a large enough sample size, these sampling distributions should follow a normal distribution even when the residuals are nonnormal. It’s for those sampling distributions of the coefficients estimates where the CLT would come into play. The p-values for the coefficients are, of course, based on those coefficient sampling distributions.
However, the answer can also be no! Often times the residuals won’t follow a normal distribution because you’re specifying an incorrect model. You might not be including all the relevant variables, not modeling the curvature correctly, not including interaction terms, etc. Model specification errors can produce nonnormal residuals. In that case, I don’t think having a sufficiently large sample size fixes the problem. Chances are your coefficients are biased and not meaningful because the model is just wrong.
Consequently, the answer depends on what is causing the nonnormal residuals. Also, I don’t have time to thoroughly research this issue, but if you’re doing this for a paper or report, I’d find some article to support this just to be sure. I’d also imagine (again, check) that it’s really the sample size per number of model terms that is important. You’d need many observations per model term. And, of course, you’d have to be certain that you’re specifying the correct model!
I hope this helps!
Nicole Paschal says
For the histogram with the line over it, are you saying that the line is the actual or the estimated data that the collected histogram data fits into? Thank you.
Jim Frost says
Hi, which section is this graph in? I’m not exactly sure which one you’re referring too.
JOHN HAROLD says
I’ll look out for your maiden book on Regression Analysis next year.
JOHN HAROLD says
“Making complex concepts simpler”. That is your trademark.
I often refer my students to read some of your posts especially after I have introduced them to the topic. They thank “me” for showing them another perspective. But the credit is rightfully yours. Thanks.
Jim Frost says
Hi John, thanks so much for your kind words. They mean a lot to me because that quote is what I strive for with this blog. Thanks for sharing with your students too!
Linda says
Hi Jim,
Thanks for a great post. I just have one quick question about the application of the CLT.
When we use the CLT we can find the probability of a certain event but I am wondering how that probability works with a skewed population.
If a population has for example a large right skew with the most common values around 0 minutes but we know that in the normally distributed sample distribution, the most common values center around the mean (which is the same mean as in the population). I.e. having 68% of the values around the mean in the sample vs most of the values around 0 in the population makes me feel as if the probabilities from the sample do not apply to the population. Would you be willing to explain how this works?
Thanks,
Linda
Jim Frost says
Hi Linda,
I think I see where there might be a slight misunderstanding if I’m reading your comment correctly. In this post, I think the example graph I show in the section “Moderately Skewed Distribution and the Central Limit Theorem” roughly matches the scenario in your comment other than the fact that the most common values are not around zero. So, I’ll use the graph in that section to answer your question.
There are two different types of distribution in play here. There is the distribution of the data values in the population, which is the grey distribution in the graph. That’s the distribution of the actual data. That distribution estimates the probability of individual values occurring in the population.
Then, there is the sampling distribution of the mean, which I show using different colors to represent different sample sizes. This is a distribution of sample means rather than individual values. You’re correct that the probabilities are different. You can calculate the probabilities of individual values (or ranges technically for continuous data) from the data distribution. And, you can calculate probabilities associated with sample means using the sampling distribution of the means.
For example, if your sample size is large enough so that the sampling distribution approximates a normal distribution, then roughly 68% of sample means from that population will fall within +/- 1 standard deviation of the sampling distribution. The standard deviation for the sampling distribution of the means is called the standard error of the mean and it equals the population standard deviation divided by the square root of the sample size.
Point being that the sampling distribution has different properties (particularly the standard deviation) than the data distribution. Hence, you’ll obtain different probabilities for a particular individual value versus a sample mean of the same value–which makes sense when you think about it. An individual value is very different from a sample mean even when they have the same numeric value. The graph in the section that I reference shows this visually.
I hope this answers your question. Please let me know if there’s anything else I can clarify!
Sandeep Ray Chaudhuri says
Hi Jim,
Your Blog is really helpful in brushing up/learning afresh key concepts in Statistics. I have a suggestion that if the key topics are structured then it will help people who are new to learn in a structured manner.
Keep up the awesome work!
Jim Frost says
Hi Sandeep,
Thanks for the kind words! I’ll be writing various books that present these topics in an organized manner. The first one on regression analysis will be available in early 2019, and there will be more to follow!
Surya says
Jim…you are a Gem of a person :-)…I have suggested my Friend as well to subscribe to your posts
Jim Frost says
Thank you so much, Surya! And, thanks for sharing! 🙂
Janna Beckerman says
It totally helps. Thank you!
Janna Beckerman says
Thank you so much for your blog. My question: You said “Typically, statisticians say that a sample size of 30 is sufficient for most distributions. ” I, too, was taught to obtain a sample size of ~30, but I can’t figure out where we all came up with that number. I’ve asked colleagues. No one knows. Do you? And will you share how this number came about?
Jim Frost says
Hi Janna,
You’re very welcome! This number is what emerges as a good rule of thumb for sample sizes that will generally produce an approximately normal sampling distribution for most types of probability distributions. The idea is that when you meet this threshold, you don’t have to worry about whether your data are normally distributed when you use a parametric test that otherwise assumes normal data. (Of course, depending on your study area, you might need a larger sample size to have adequate statistical power, but that’s a different matter.)
There is a fudge factor around this number in several ways. For one thing, how closely does the sampling distribution need to approximate the normal distribution to be good enough? And, the degree to which the population distribution differs from the normal distribution affects this number. In this blog post, some of the examples illustrate how sometimes n=20 is sufficient while in the extremely skewed distribution, a sample size of 80 was not sufficient. That example is probably more skewed than most real data. But, it illustrates the point that the number depends on the shape of the distribution in the underlying population.
Not all statisticians agree. I think 30 is the most common number that I hear. But, others say 40 just to be safe. And, I used to work at Minitab statistical software and a group there did a study about this where they assessed what sample size is required for nonnormal data so that the actual type I error rate matches the significance level for various parametric tests–and that ultimately links back to the CLT theorem and the ideas discussed in this post.
They developed a table of sample sizes based on the type of analysis. You can see this table in my post about parametric vs. nonparametric analysis. In that post, there is also a link to the white paper that they developed. For example, they conclude that a 1-sample t-test requires at least a sample size of 20. Indeed, the moderately skewed example in this post produces a fairly normal looking sampling distribution with a sample size of 20.
I think its hard to find a concrete reference to this number because it’s more a rule of thumb. There’s no formula or calculation that spits out this number. Both a researcher’s notion of how closely the sampling distribution needs to approximate the normal distribution and how different their distribution is from the normal distribution affect the number they will use! 20-40 should be good for most distributions.
I hope this helps!
Debashis Dalai says
Thank you so much Jim for all the efforts put into simplifying a complex subject like Statistics! You help us a lot. Thanks again!
MG says
Great job Jim.
Jim Frost says
Thanks, MG! 🙂