T-tests are statistical hypothesis tests that you use to analyze one or two sample means. Depending on the t-test that you use, you can compare a sample mean to a hypothesized value, the means of two independent samples, or the difference between paired samples. In this post, I show you how t-tests use t-values and t-distributions to calculate probabilities and test hypotheses.

As usual, I’ll provide clear explanations of t-values and t-distributions using concepts and graphs rather than formulas! If you need a primer on the basics, read my hypothesis testing overview.

## What Are t-Values?

The term “t-test” refers to the fact that these hypothesis tests use t-values to evaluate your sample data. T-values are a type of test statistic. Hypothesis tests use the test statistic that is calculated from your sample to compare your sample to the null hypothesis. If the test statistic is extreme enough, this indicates that your data are so incompatible with the null hypothesis that you can reject the null.

Don’t worry. I find these technical definitions of statistical terms are easier to explain with graphs, and we’ll get to that!

When you analyze your data with any t-test, the procedure reduces your entire sample to a single value, the t-value. These calculations factor in your sample size and the variation in your data. Then, the t-test compares your sample means(s) to the null hypothesis condition in the following manner:

- If the sample data equals the null hypothesis precisely, the t-test produces a t-value of 0.
- As the sample data become progressively dissimilar from the null hypothesis, the absolute value of the t-value increases.

Read the companion post where I explain how t-tests calculate t-values.

The tricky thing about t-values is that they are a unitless statistic, which makes them difficult to interpret on their own. Imagine that we performed a t-test and it produced a t-value of 2. What does this t-value mean exactly? We know that the sample mean doesn’t equal the null hypothesis value because this t-value doesn’t equal zero. However, we don’t know how exceptional our value is if the null hypothesis is correct.

To be able to interpret individual t-values, we have to place them in a larger context. T-distributions provide this broader context so we can determine the unusualness of an individual t-value.

## What Are t-Distributions?

A single t-test produces a single t-value. Now, imagine the following process. First, let’s assume that the null hypothesis is true for the population. Now, suppose we repeat our study many times by drawing many random samples of the same size from this population. Next, we perform t-tests on all of the samples and plot the distribution of the t-values. This distribution is known as a sampling distribution, which is a type of probability distribution.

**Related post**: Understanding Probability Distributions

If we follow this procedure, we produce a graph that displays the distribution of t-values that we obtain from a population where the null hypothesis is true. We use sampling distributions to calculate probabilities for how unusual our sample statistic is if the null hypothesis is true.

Luckily, we don’t need to go to the trouble of collecting numerous random samples to create this graph! Statisticians understand the properties of t-distributions so we can estimate the sampling distribution using the t-distribution and our sample size.

The degrees of freedom (DF) for the statistical design define the t-distribution for a particular study. The DF are closely related to the sample size. For t-tests, there is a different t-distribution for each sample size.

**Related post**: Degrees of Freedom in Statistics

## Use the t-Distribution to Compare Your Sample Results to the Null Hypothesis

T-distributions assume that the null hypothesis is correct for the population from which you draw your random samples. To evaluate how compatible your sample data are with the null hypothesis, place your study’s t-value in the t-distribution and determine how unusual it is.

The sampling distribution below displays a t-distribution with 20 degrees of freedom, which equates to a sample size of 21 for a 1-sample t-test. The t-distribution centers on zero because it assumes that the null hypothesis is true. When the null is true, your study is most likely to obtain a t-value near zero and less liable to produce t-values further from zero in either direction.

On the graph, I’ve displayed the t-value of 2 from our hypothetical study to see how our sample data compares to the null hypothesis. Under the assumption that the null is true, the t-distribution indicates that our t-value is not the most likely value. However, there still appears to be a realistic chance of observing t-values from -2 to +2.

We know that our t-value of 2 is rare when the null hypothesis is true. How rare is it exactly? Our final goal is to evaluate whether our sample t-value is so rare that it justifies rejecting the null hypothesis for the entire population based on our sample data. To proceed, we need to quantify the probability of observing our t-value.

## t-Tests Use t-Values and t-Distributions to Calculate Probabilities

Hypothesis tests work by taking the observed test statistic from a sample and using the sampling distribution to calculate the probability of obtaining that test statistic if the null hypothesis is correct. In the context of how t-tests work, you assess the likelihood of a t-value using the t-distribution. If a t-value is sufficiently improbable when the null hypothesis is true, you can reject the null hypothesis.

I have two crucial points to explain before we calculate the probability linked to our t-value of 2.

Because I’m showing the results of a two-tailed test, we’ll use the t-values of +2 and -2. Two-tailed tests allow you to assess whether the sample mean is greater than or less than the target value in a 1-sample t-test. A one-tailed hypothesis test can only determine statistical significance for one or the other.

Additionally, it is possible to calculate a probability only for a range of t-values. On a probability distribution plot, probabilities are represented by the shaded area under a distribution curve. Without a range of values, there is no area under the curve and, hence, no probability.

**Related post**: One-Tailed and Two-Tailed Tests Explained

## t-Test Results for Our Hypothetical Study

Taking these points into consideration, the graph below finds the probability associated with t-values less than -2 and greater than +2 using the area under the curve. This graph is specific to our t-test design (1-sample t-test with N = 21).

The probability distribution plot indicates that each of the two shaded regions has a probability of 0.02963—for a total of 0.05926. This graph shows that t-values fall within these areas almost 6% of the time when the null hypothesis is true.

There is a chance that you’ve heard of this type of probability before—it’s the P value! While the likelihood of t-values falling within these regions seems small, it’s not quite unlikely enough to justify rejecting the null under the standard significance level of 0.05.

Learn how to interpret the P value correctly and avoid a common mistake!

**Related post**: Types of Errors in Hypothesis Testing

## t-Distributions and Sample Size

The sample size for a t-test determines the degrees of freedom (DF) for that test, which specifies the t-distribution. The overall effect is that as the sample size decreases, the tails of the t-distribution become thicker. Thicker tails indicate that t-values are more likely to be far from zero even when the null hypothesis is correct. The changing shapes are how t-distributions factor in the greater uncertainty that is present when you have a smaller sample.

You can see this effect in probability distribution plot below that displays t-distributions for 5 and 30 DF.

Sample means from smaller samples tend to be less precise. In other words, with a smaller sample, it’s less surprising to have an extreme t-value, which affects the probabilities and p-values. A t-value of 2 has a P value of 10.2% and 5.4% for 5 and 30 DF, respectively. Use larger samples!

If you like this approach and want to learn about the F-test, read my post: How the F-test Works in ANOVA.

To see an alternative to traditional hypothesis testing that does not use probability distributions and test statistics, learn about bootstrapping in statistics!

madan verma says

Hello Jim, I find this statement in this excellent write up contradicting :

1)This graph shows that t-values fall within these areas almost 6% of the time when the null hypothesis is true

I mean if this is true the t-value =0 hypothesis is rejected.

Thanks.

Jim Frost says

Hi Madan,

I can see how that statement sounds contradictory, but I can assure that it is quite accurate. It’s often forgotten but the underlying assumption for the calculations surrounding hypothesis testing, significance levels, and p-values is that the null hypothesis is true.

So, the probabilities shown in the graph that you refer to are based on the assumption that the null hypothesis is true. Further, t-values for this study design have a 6% chance of falling in those critical areas assuming the null is true (a false positive).

Significance levels are defined as the maximum acceptable probability of a false positive. Usually, we set that as 5%. In the example, there’s a large probability of a false positive (6%), so we fail to reject the null hypothesis. In other words, we fail to reject the null because false positives will happen too frequently–where the significance level defines the cutoff point for too frequently.

Keep in mind that when you have statistically significant results, you’re really saying that the results you obtained are improbable enough assuming that the null is true that you can reject the notion that the null is true. But, the math and probabilities are all based on the assumption that the null is true because you need to determine how unlikely your results are under the null hypothesis.

Even the p-value is defined in terms of assuming the null hypothesis is true. You can read about that in my post about interpreting p-values correctly.

I hope this clarifies things!

Glenn Dowell says

Jim …I was involved in in a free SAT/ACT tutoring program that I need to analyze for effectiveness .

I have pre test scores of a number of students and the post test scores after they were tutored (treatment ).

Glenn dowell

Jim Frost says

Hi Glenn,

It sounds like you need to perform a paired t-test assuming.