What is the T Distribution?
The t distribution is a continuous probability distribution that is symmetric and bell-shaped like the normal distribution but with a shorter peak and thicker tails. It was designed to factor in the greater uncertainty associated with small sample sizes.
The t distribution describes the variability of the distances between sample means and the population mean when the population standard deviation is unknown and the data approximately follow the normal distribution. This distribution has only one parameter, the degrees of freedom, based on (but not equal to) the sample size.
The t distribution, also known as the Students T Distribution, was developed by William Sealy Gosset in 1908 for use with small sample sizes. Back then, the Z distribution and the corresponding Z-test were available to test means, but they are valid for large sample sizes. There was no distribution designed for small samples.
Gosset was the Chief Brewer at the Guinness Brewery in Dublin and was dedicated to applying the scientific method to beer production. He needed a procedure for statistically analyzing small batches of barley. After developing the t distribution for this purpose, the brewery wanted Gosset to publish using a pen name so competitors would not learn about their methods. Hence, he published using the pseudonym of Student. That’s why we have the “Student T-test” today!
When to Use the T Distribution
The essential uses for the t distribution are for finding:
- P-values for t-tests when testing the mean and for the coefficients in regression analysis.
- Critical values that define the upper and lower bounds of a confidence interval.
Use the t distribution when you need to assess the mean and do not know the population standard deviation. It’s particularly important to use it when you have a small (n < 30) sample size. More about this aspect below!
In the context of a t-test, it represents the sampling distribution of t-values for your design when the null hypothesis is true. Learn more about sampling distributions.
For more detailed information, read about using it to Find P values and Confidence Intervals.
To find the critical t-values using a table, see my T-table. It includes instructions and examples of how to use it.
Related post: How to Do T-Tests in Excel
Parameter – Degrees of Freedom
The t distribution has only one parameter, the degrees of freedom (DF). In t-tests, DF are linked to the sample size. For 1-sample and paired t-tests, DF = N – 1. For 2-sample t-tests, it equals N – 2. Hence, as the sample size increases, the DF also increases. Learn more about degrees of freedom.
Let’s see how changing the degrees of freedom affects it.
This graph illustrates how Gosset designed the t distribution to handle the greater uncertainty inherent with smaller samples. As the degrees of freedom increase, the curve pulls in tighter around zero—the tails become thinner and the peak becomes taller. The blue curve has the fewest DF (3) and it has the thickest tails. Conversely, the green curve has the most DF (20) and the thinnest tails.
The changing shapes are how it factors in the greater uncertainty when you have a smaller sample. Smaller samples have thicker tails because small samples are more likely to produce unusual means than larger samples. However, as the sample size increases, outliers become rarer, and the tails thin out.
Because the t distribution is a probability distribution, t-tests can use it to calculate probabilities like the p-value while factoring in the sample size.
At around 30 degrees of freedom, the t distribution closely approximates the standard normal distribution (Z-distribution), as shown below. Consequently, when your sample size exceeds ~30, t-tests and Z-tests provide very similar results.
In this graph, the blue curve is the standard normal distribution, while the red dashed curve is the t distribution with 30 degrees of freedom.
Comments and Questions