What is a Test Statistic?
A test statistic assesses how consistent your sample data are with the null hypothesis in a hypothesis test. Test statistic calculations take your sample data and boil them down to a single number that quantifies how much your sample diverges from the null hypothesis. As a test statistic value becomes more extreme, it indicates larger differences between your sample data and the null hypothesis.
When your test statistic indicates a sufficiently large incompatibility with the null hypothesis, you can reject the null and state that your results are statistically significant—your data support the notion that the sample effect exists in the population. To use a test statistic to evaluate statistical significance, you either compare it to a critical value or use it to calculate the p-value.
Statisticians named the hypothesis tests after the test statistics because they’re the quantity that the tests actually evaluate. For example, t-tests assess t-values, F-tests evaluate F-values, and chi-square tests use, you guessed it, chi-square values.
In this post, learn about test statistics, how to calculate them, interpret them, and evaluate statistical significance using the critical value and p-value methods.
How to Find Test Statistics
Each test statistic has its own formula. I present several common test statistics examples below. To see worked examples for each one, click the links to my more detailed articles.
Formulas for Test Statistics
|T-value for 1-sample t-test||Take the sample mean, subtract the hypothesized mean, and divide by the standard error of the mean.|
|T-value for 2-sample t-test||Take one sample mean, subtract the other, and divide by the pooled standard deviation.|
|F-value for F-tests and ANOVA||Calculate the ratio of two variances.|
|Chi-squared value (χ2) for a Chi-squared test||Sum the squared differences between observed and expected values divided by the expected values.|
Understanding the Null Values and the Test Statistic Formulas
In the formulas above, it’s helpful to understand the null condition and the test statistic value that occurs when your sample data match that condition exactly. Also, it’s worthwhile knowing what causes the test statistics to move further away from the null value, potentially becoming significant. Test statistics are statistically significant when they exceed a critical value.
All these test statistics are ratios, which helps you understand their null values.
T-Tests, Null = 0
When a t-value equals 0, it indicates that your sample data match the null hypothesis exactly.
For a 1-sample t-test, when the sample mean equals the hypothesized mean, the numerator is zero, which causes the entire t-value ratio to equal zero. As the sample mean moves away from the hypothesized mean in either the positive or negative direction, the test statistic moves away from zero in the same direction.
A similar case exists for 2-sample t-tests. When the two sample means are equal, the numerator is zero, and the entire test statistic ratio is zero. As the two sample means become increasingly different, the absolute value of the numerator increases, and the t-value becomes more positive or negative.
Related post: How T-tests Work
F-tests including ANOVA, Null = 1
When an F-value equals 1, it indicates that the two variances in the numerator and denominator are equal, matching the null hypothesis.
As the numerator and denominator become less and less similar, the F-value moves away from one in either direction.
Related post: The F-test in ANOVA
Chi-squared Tests, Null = 0
When a chi-squared value equals 0, it indicates that the observed values always match the expected values. This condition causes the numerator to equal zero, making the chi-squared value equal zero.
As the observed values progressively fail to match the observed values, the numerator increases, causing the test statistic to rise from zero.
Related post: How a Chi-Squared Test Works
You’ll never see a test statistic that equals the null value precisely in practice. However, trivial differences been sample values and the null value are not uncommon.
Interpreting Test Statistics
Test statistics are unitless. This fact can make them difficult to interpret on their own. You know they evaluate how well your data agree with the null hypothesis. If your test statistic is extreme enough, your data are so incompatible with the null hypothesis that you can reject it and conclude that your results are statistically significant. But how does that translate to specific values of your test statistic? Where do you draw the line?
For instance, t-values of zero match the null value. But how far from zero should your t-value be to be statistically significant? Is 1 enough? 2? 3? If your t-value is 2, what does it mean anyway? In this case, we know that the sample mean doesn’t equal the null value, but how exceptional is it? To complicate matters, the dividing line changes depending on your sample size and other study design issues.
Similar types of questions apply to the other test statistics too.
To interpret individual values of a test statistic, we need to place them in a larger context. Towards this end, let me introduce you to sampling distributions for test statistics!
Sampling Distributions for Test Statistics
Performing a hypothesis test on a sample produces a single test statistic. Now, imagine you carry out the following process:
- Assume the null hypothesis is true in the population.
- Repeat your study many times by drawing many random samples of the same size from this population.
- Perform the same hypothesis test on all these samples and save the test statistics.
- Plot the distribution of the test statistics.
This process produces the distribution of test statistic values that occurs when the effect does not exist in the population (i.e., the null hypothesis is true). Statisticians refer to this type of distribution as a sampling distribution, a kind of probability distribution.
Why would we need this type of distribution?
It provides the larger context required for interpreting a test statistic. More specifically, it allows us to compare our study’s single test statistic to values likely to occur when the null is true. We can quantify our sample statistic’s rareness while assuming the effect does not exist in the population. Now that’s helpful!
Fortunately, we don’t need to collect many random samples to create this distribution! Statisticians have developed formulas allowing us to estimate sampling distributions for test statistics using the sample data.
To evaluate your data’s compatibility with the null hypothesis, place your study’s test statistic in the distribution.
Related post: Understanding Probability Distributions
Example of a Test Statistic in a Sampling Distribution
Suppose our t-test produces a t-value of two. That’s our test statistic. Let’s see where it fits in.
The sampling distribution below shows a t-distribution with 20 degrees of freedom, equating to a 1-sample t-test with a sample size of 21. The distribution centers on zero because it assumes the null hypothesis is correct. When the null is true, your analysis is most likely to obtain a t-value near zero and less likely to produce t-values further from zero in either direction.
The sampling distribution indicates that our test statistic is somewhat rare when we assume the null hypothesis is correct. However, the chances of observing t-values from -2 to +2 are not totally inconceivable. We need a way to quantify the likelihood.
From this point, we need to use the sampling distributions’ ability to calculate probabilities for test statistics.
Related post: Sampling Distributions Explained
Test Statistics and Critical Values
The significance level uses critical values to define how far the test statistic must be from the null value to reject the null hypothesis. When the test statistic exceeds a critical value, the results are statistically significant.
The percentage of the area beneath the sampling distribution curve that is shaded represents the probability that the test statistic will fall in those regions when the null is true. Consequently, to depict a significance level of 0.05, I’ll shade 5% of the sampling distribution furthest away from the null value.
The two shaded areas are equidistant from the null value in the center. Each region has a likelihood of 0.025, which sums to our significance level of 0.05. These shaded areas are the critical regions for a two-tailed hypothesis test. Let’s return to our example t-value of 2.
Related post: What are Critical Values?
In this example, the critical values are -2.086 and +2.086. Our test statistic of 2 is not statistically significant because it does not exceed the critical value.
Other hypothesis tests have their own test statistics and sampling distributions, but their processes for critical values are generally similar.
Learn how to find critical values for test statistics using tables:
Related post: Understanding Significance Levels
Using Test Statistics to Find P-values
P-values are the probability of observing an effect at least as extreme as your sample’s effect if you assume no effect exists in the population.
Test statistics represent effect sizes in hypothesis tests because they denote the difference between your sample effect and no effect —the null hypothesis. Consequently, you use the test statistic to calculate the p-value for your hypothesis test.
The above p-value definition is a bit tortuous. Fortunately, it’s much easier to understand how test statistics and p-values work together using a sampling distribution graph.
Let’s use our hypothetical test statistic t-value of 2 for this example. However, because I’m displaying the results of a two-tailed test, I need to use t-values of +2 and -2 to cover both tails.
Related post: One-tailed vs. Two-Tailed Hypothesis Tests
The graph below displays the probability of t-values less than -2 and greater than +2 using the area under the curve. This graph is specific to our t-test design (1-sample t-test with N = 21).
The sampling distribution indicates that each of the two shaded regions has a probability of 0.02963—for a total of 0.05926. That’s the p-value! The graph shows that the test statistic falls within these areas almost 6% of the time when the null hypothesis is true in the population.
While this likelihood seems small, it’s not low enough to justify rejecting the null under the standard significance level of 0.05. P-value results are always consistent with the critical value method. Learn more about using test statistics to find p values.
While test statistics are a crucial part of hypothesis testing, you’ll probably let your statistical software calculate the p-value for the test. However, understanding test statistics will boost your comprehension of what a hypothesis test actually assesses.
Related post: Interpreting P-values