What is a Test Statistic?
A test statistic assesses how consistent your sample data are with the null hypothesis in a hypothesis test. Test statistic calculations take your sample data and boil them down to a single number that quantifies how much your sample diverges from the null hypothesis. As a test statistic value becomes more extreme, it indicates larger differences between your sample data and the null hypothesis.
When your test statistic indicates a sufficiently large incompatibility with the null hypothesis, you can reject the null and state that your results are statistically significant—your data support the notion that the sample effect exists in the population. To use a test statistic to evaluate statistical significance, you either compare it to a critical value or use it to calculate the p-value.
Statisticians named the hypothesis tests after the test statistics because they’re the quantity that the tests actually evaluate. For example, t-tests assess t-values, F-tests evaluate F-values, and chi-square tests use, you guessed it, chi-square values.
In this post, learn about test statistics, how to calculate them, interpret them, and evaluate statistical significance using the critical value and p-value methods.
How to Find Test Statistics
Each test statistic has its own formula. I present several common test statistics examples below. To see worked examples for each one, click the links to my more detailed articles.
Formulas for Test Statistics
Test Statistic | Formula | Finding |
T-value for 1-sample t-test | Take the sample mean, subtract the hypothesized mean, and divide by the standard error of the mean. | |
T-value for 2-sample t-test | Take one sample mean, subtract the other, and divide by the pooled standard deviation. | |
F-value for F-tests and ANOVA | Calculate the ratio of two variances. | |
Chi-squared value (χ2) for a Chi-squared test | Sum the squared differences between observed and expected values divided by the expected values. |
Understanding the Null Values and the Test Statistic Formulas
In the formulas above, it’s helpful to understand the null condition and the test statistic value that occurs when your sample data match that condition exactly. Also, it’s worthwhile knowing what causes the test statistics to move further away from the null value, potentially becoming significant. Test statistics are statistically significant when they exceed a critical value.
All these test statistics are ratios, which helps you understand their null values.
T-Tests, Null = 0
When a t-value equals 0, it indicates that your sample data match the null hypothesis exactly.
For a 1-sample t-test, when the sample mean equals the hypothesized mean, the numerator is zero, which causes the entire t-value ratio to equal zero. As the sample mean moves away from the hypothesized mean in either the positive or negative direction, the test statistic moves away from zero in the same direction.
A similar case exists for 2-sample t-tests. When the two sample means are equal, the numerator is zero, and the entire test statistic ratio is zero. As the two sample means become increasingly different, the absolute value of the numerator increases, and the t-value becomes more positive or negative.
Related post: How T-tests Work
F-tests including ANOVA, Null = 1
When an F-value equals 1, it indicates that the two variances in the numerator and denominator are equal, matching the null hypothesis.
As the numerator and denominator become less and less similar, the F-value moves away from one in either direction.
Related post: The F-test in ANOVA
Chi-squared Tests, Null = 0
When a chi-squared value equals 0, it indicates that the observed values always match the expected values. This condition causes the numerator to equal zero, making the chi-squared value equal zero.
As the observed values progressively fail to match the expected values, the numerator increases, causing the test statistic to rise from zero.
Related post: How a Chi-Squared Test Works
You’ll never see a test statistic that equals the null value precisely in practice. However, trivial differences been sample values and the null value are not uncommon.
Interpreting Test Statistics
Test statistics are unitless. This fact can make them difficult to interpret on their own. You know they evaluate how well your data agree with the null hypothesis. If your test statistic is extreme enough, your data are so incompatible with the null hypothesis that you can reject it and conclude that your results are statistically significant. But how does that translate to specific values of your test statistic? Where do you draw the line?
For instance, t-values of zero match the null value. But how far from zero should your t-value be to be statistically significant? Is 1 enough? 2? 3? If your t-value is 2, what does it mean anyway? In this case, we know that the sample mean doesn’t equal the null value, but how exceptional is it? To complicate matters, the dividing line changes depending on your sample size and other study design issues.
Similar types of questions apply to the other test statistics too.
To interpret individual values of a test statistic, we need to place them in a larger context. Towards this end, let me introduce you to sampling distributions for test statistics!
Sampling Distributions for Test Statistics
Performing a hypothesis test on a sample produces a single test statistic. Now, imagine you carry out the following process:
- Assume the null hypothesis is true in the population.
- Repeat your study many times by drawing many random samples of the same size from this population.
- Perform the same hypothesis test on all these samples and save the test statistics.
- Plot the distribution of the test statistics.
This process produces the distribution of test statistic values that occurs when the effect does not exist in the population (i.e., the null hypothesis is true). Statisticians refer to this type of distribution as a sampling distribution, a kind of probability distribution.
Why would we need this type of distribution?
It provides the larger context required for interpreting a test statistic. More specifically, it allows us to compare our study’s single test statistic to values likely to occur when the null is true. We can quantify our sample statistic’s rareness while assuming the effect does not exist in the population. Now that’s helpful!
Fortunately, we don’t need to collect many random samples to create this distribution! Statisticians have developed formulas allowing us to estimate sampling distributions for test statistics using the sample data.
To evaluate your data’s compatibility with the null hypothesis, place your study’s test statistic in the distribution.
Related post: Understanding Probability Distributions
Example of a Test Statistic in a Sampling Distribution
Suppose our t-test produces a t-value of two. That’s our test statistic. Let’s see where it fits in.
The sampling distribution below shows a t-distribution with 20 degrees of freedom, equating to a 1-sample t-test with a sample size of 21. The distribution centers on zero because it assumes the null hypothesis is correct. When the null is true, your analysis is most likely to obtain a t-value near zero and less likely to produce t-values further from zero in either direction.
The sampling distribution indicates that our test statistic is somewhat rare when we assume the null hypothesis is correct. However, the chances of observing t-values from -2 to +2 are not totally inconceivable. We need a way to quantify the likelihood.
From this point, we need to use the sampling distributions’ ability to calculate probabilities for test statistics.
Related post: Sampling Distributions Explained
Test Statistics and Critical Values
The significance level uses critical values to define how far the test statistic must be from the null value to reject the null hypothesis. When the test statistic exceeds a critical value, the results are statistically significant.
The percentage of the area beneath the sampling distribution curve that is shaded represents the probability that the test statistic will fall in those regions when the null is true. Consequently, to depict a significance level of 0.05, I’ll shade 5% of the sampling distribution furthest away from the null value.
The two shaded areas are equidistant from the null value in the center. Each region has a likelihood of 0.025, which sums to our significance level of 0.05. These shaded areas are the critical regions for a two-tailed hypothesis test. Let’s return to our example t-value of 2.
Related post: What are Critical Values?
In this example, the critical values are -2.086 and +2.086. Our test statistic of 2 is not statistically significant because it does not exceed the critical value.
Other hypothesis tests have their own test statistics and sampling distributions, but their processes for critical values are generally similar.
Learn how to find critical values for test statistics using tables:
Related post: Understanding Significance Levels
Using Test Statistics to Find P-values
P-values are the probability of observing an effect at least as extreme as your sample’s effect if you assume no effect exists in the population.
Test statistics represent effect sizes in hypothesis tests because they denote the difference between your sample effect and no effect —the null hypothesis. Consequently, you use the test statistic to calculate the p-value for your hypothesis test.
The above p-value definition is a bit tortuous. Fortunately, it’s much easier to understand how test statistics and p-values work together using a sampling distribution graph.
Let’s use our hypothetical test statistic t-value of 2 for this example. However, because I’m displaying the results of a two-tailed test, I need to use t-values of +2 and -2 to cover both tails.
Related post: One-tailed vs. Two-Tailed Hypothesis Tests
The graph below displays the probability of t-values less than -2 and greater than +2 using the area under the curve. This graph is specific to our t-test design (1-sample t-test with N = 21).
The sampling distribution indicates that each of the two shaded regions has a probability of 0.02963—for a total of 0.05926. That’s the p-value! The graph shows that the test statistic falls within these areas almost 6% of the time when the null hypothesis is true in the population.
While this likelihood seems small, it’s not low enough to justify rejecting the null under the standard significance level of 0.05. P-value results are always consistent with the critical value method. Learn more about using test statistics to find p values.
While test statistics are a crucial part of hypothesis testing, you’ll probably let your statistical software calculate the p-value for the test. However, understanding test statistics will boost your comprehension of what a hypothesis test actually assesses.
Related post: Interpreting P-values
Dr Dilip Raj says
“As the observed values progressively fail to match the observed values, the numerator increases, causing the test statistic to rise from zero”.
Sir, this sentence is written in the Chi-squared Test heading. There the observed value is written twice. I think the second one to be replaced with ‘expected values’.
Jim Frost says
Thanks so much, Dr. Raj. You’re correct about the typo and I’ve made the correction.
Anna Bálint says
Thank you very much (great page on one and two-tailed tests)!
A
Anna Bálint says
I would like to ask a question. If only positive numbers are the possible values in a sample (e.g. absolute values without 0), is it meaningful to test if the sample is significantly different from zero (using for example a one sample t-test or a Wilcoxon signed-rank test) or can I assume that if given a large enough sample, the result will by definition be significant (even if a small or very variable sample results in a non-significant hypothesis test).
Thank you very much,
Anna
Jim Frost says
Hi Anna,
If you’re talking about the raw values you’re assessing using a one-sample t-test, it doesn’t make sense to compare them to zero given your description of the data. You know that the mean can’t possibly equal zero. The mean must be some positive value. Yes, in this scenario, if you have a large enough sample size, you should get statistically significant results. So, that t-test isn’t tell you anything that you don’t already know!
However, you should be aware of several things. The 1-sample test can compare your sample mean to values other than zero. Typically, you’ll need to specify the value of the null hypothesis for your software. This value is the comparison value. The test determines whether your sample data provide enough evidence to conclude that the population mean does not equal the null hypothesis value you specify. You’ll need to specify the value because there is no obvious default value to use. Every 1-sample t-test has its subject-area context with a value that makes sense for its null hypothesis value and it is frequently not zero.
I suspect that you’re getting tripped up with the fact that t-tests use a t-value of zero for its null hypothesis value. That doesn’t mean your 1-sample t-test is comparing your sample mean to zero. The test converts your data to a single t-value and compares the t-value to zero. But your actual null hypothesis value can be something else. It’s just converting your sample to a standardized value to use for testing. So, while the t-test compares your sample’s t-value to zero, you can actually compare your sample mean to any value you specify. You need to use a value that makes sense for your subject area.
I hope that makes sense!
Anna Bálint says
Thank you very much Jim, this helps a lot! Actually, the value I would like to compare my sample to is zero, but I just couldn’t find the right way to test it apparently (it’s about EEG data). The original data was a sample of numbers between -1 and +1, with the question if they are significantly different from zero in either direction (in which case a one sample t-test makes sense I guess, since the sample mean can in fact be zero). However, since a sample mean of 0 can also occur if half of the sample differs in the negative, and the other half in the positive direction, I also wanted to test if there is a divergence from 0 in ‘absolute’ terms – that’s how the absolute valued numbers came about (I know that absolute values can also be zero, but in this specific case, they were all positive numbers) And a special thanks for the last paragraph – I will definitely keep in mind, it is a potential point of confusion.
Jim Frost says
Hi Anna,
You can use a 1-sample t test for both cases but you’ll need to set them up slightly different. To detect a positive or negative difference from zero, use a 2-tailed test. For the case with absolute values, use a one-tailed test with a critical region in the positive end. To learn more, read about One- and Two-Tailed Tests Explained. Use zero for the comparison value in both cases.
Trevor says
Very helpful and well articulated! Thanks Jim 🙂
mekdes says
Thank you for brief explanation.
ruth says
the content was helpful to me. thank you