What is a T Test?
A t test is a statistical hypothesis test that assesses sample means to draw conclusions about population means. Frequently, analysts use a t test to determine whether the population means for two groups are different. For example, it can determine whether the difference between the treatment and control group means is statistically significant.
There are three types of t tests. They all evaluate sample means using t-values, t-distributions, and degrees of freedom to calculate statistical significance. It is a parametric analysis that compares one or two group means.
The following are the standard t tests:
- One-sample: Compares a sample mean to a reference value.
- Two-sample: Compares two sample means.
- Paired: Compares the means of matched pairs, such as before and after scores.
In this post, you’ll learn about the different types of t tests, when you should use each one, and their assumptions. Additionally, I interpret an example of each type.
Which T Test Should I Use?
To choose the correct t test, you must know whether you are assessing one or two group means. If you’re working with two group means, do the groups have the same or different items/people? Use the table below to choose the proper analysis.
Number of Group Means | Group Type | T Test |
One | One sample t test | |
Two | Different items in each group | Two sample t test |
Two | Same items in both groups | Paired t test |
Now, let’s review each t test to see what it can do!
Imagine we’ve developed a drug that supposedly boosts your IQ score. In the following sections, we’ll address the same research question, and I’ll show you how the various t tests can help you answer it.
One Sample T Test
Use a one-sample t test to compare a sample mean to a reference value. It allows you to determine whether the population mean differs from the reference value. The reference value is usually highly relevant to the subject area.
For example, a coffee shop claims their large cup contains 16 ounces. A skeptical customer takes a random sample of 10 large cups of coffee and measures their contents to determine if the mean volume differs from the claimed 16 ounces using a one-sample t test.
One-Sample T Test Hypotheses
- Null hypothesis (H0): The population mean equals the reference value (µ = µ0).
- Alternative hypothesis (HA): The population mean DOES NOT equal the reference value (µ ≠ µ0).
Reject the null when the p-value is less than the significance level (e.g., 0.05). This condition indicates the difference between the sample mean and the reference value is statistically significant. Your sample data support the idea that the population mean does not equal the reference value.
Learn more about the One-Sample T-Test.
The above hypotheses are two-sided analyses. Alternatively, you can use one-sided hypotheses to find effects in only one direction. Learn more in my article, One- and Two-Tailed Hypothesis Tests Explained.
Related posts: Null Hypothesis: Definition, Rejecting & Examples and Understanding Significance Levels
Example
We want to evaluate our IQ boosting drug using a one-sample t test. First, we draw a single random sample of 15 participants and administer the medicine to all of them. Then we measure all their IQs and calculate a sample average IQ of 109.
In the general population, the average IQ is defined as 100. So, we’ll use 100 as our reference value. Is the difference between our sample mean of 109 and the reference value of 100 statistically significant? The t test output is below.
In the output, we see that our sample mean is 109. The procedure compares the sample mean to the reference value of 100 and produces a p-value of 0.036. Consequently, we can reject the null hypothesis and conclude that the population mean for those who take the IQ drug is higher than 100.
Two-Sample T Test
Use a two-sample t test to compare the sample means for two groups. It allows you to determine whether the population means for these two groups are different. For the two-sample procedure, the groups must contain different sets of items or people.
For example, you might compare averages between males and females or treatment and controls.
Two-Sample T Test Hypotheses
- Null hypothesis (H0): Two population means are equal (µ1 = µ2).
- Alternative hypothesis (HA): Two population means are not equal (µ1 ≠ µ2).
Again, when the p-value is less than or equal to your significance level, reject the null hypothesis. The difference between the two means is statistically significant. Your sample data support the theory that the two population means are different. Learn more about the Null Hypothesis: Definition, Rejecting & Examples.
Learn more about the two-sample t test.
Related posts: How to Interpret P Values and Statistical Significance
Example
For our IQ drug, we collect two random samples, a control group and a treatment group. Each group has 15 subjects. We give the treatment group the medication and a placebo to the control group.
We’ll use a two-sample t test to evaluate if the difference between the two group means is statistically significant. The t test output is below.
In the output, you can see that the treatment group (Sample 1) has a mean of 109 while the control group’s (Sample 2) average is 100. The p-value for the difference between the groups is 0.112. We fail to reject the null hypothesis. There is insufficient evidence to conclude that the IQ drug has an effect.
Paired Sample T Test
Use a paired t-test when you measure each subject twice, such as before and after test scores. This procedure determines if the mean difference between paired scores differs from zero, where zero represents no effect. Because researchers measure each item in both conditions, the subjects serve as their own controls.
For example, a pharmaceutical company develops a new drug to reduce blood pressure. They measure the blood pressure of 20 patients before and after administering the medication for one month. Analysts use a paired t-test to assess whether there is a statistically significant difference in pressure measurements before and after taking the drug.
Paired T Test Hypotheses
- Null hypothesis: The mean difference between pairs equals zero in the population (µD = 0).
- Alternative hypothesis: The mean difference between pairs does not equal zero in the population (µD ≠ 0).
Reject the null when the p-value is less than or equal to your significance level (e.g., 0.05). Your sample provides sufficiently strong evidence to conclude that the mean difference between pairs does not equal zero in the population.
Learn more about the paired t test.
Example
Back to our IQ boosting drug. This time, we’ll draw one random sample of 15 participants. We’ll measure their IQ before taking the medicine and then again afterward. The before and after groups contain the same people. The procedure subtracts the After — Before scores to calculate the individual differences. Then it calculates the average difference.
If the drug increases IQs effectively, we should see a positive difference value. Conversely, a value near zero indicates that the IQ scores didn’t improve between the Before and After scores. The paired t test will determine whether the difference between the pre-test and post-test is statistically significant.
The t test output is below.
The mean difference between the pre-test and post-test scores is 9 IQ points. In other words, the average IQ increased by 9 points between the before and after measurements. The p-value of 0.000 causes us to reject the null. We conclude that the difference between the pre-test and post-test population means does not equal zero. The drug appears to increase IQs by an average of 9 IQ points in the population.
T Test Assumptions
For your t test to produce reliable results, your data should meet the following assumptions:
You have a random sample
Drawing a random sample from your target population helps ensure it represents the population. Representative samples are crucial for accurately inferring population properties. The t test results are invalid if your data do not reflect the population.
Related posts: Random Sampling and Representative Samples
Continuous data
A t test requires continuous data. Continuous variables can take on all numeric values, and the scale can be divided meaningfully into smaller increments, such as fractional and decimal values. For example, weight, height, and temperature are continuous.
Other analyses can assess additional data types. For more information, read Comparing Hypothesis Tests for Continuous, Binary, and Count Data.
Your sample data follow a normal distribution, or you have a large sample size
A t test assumes your data follow a normal distribution. However, due to the central limit theorem, you can waive this assumption when your sample is large enough.
The following sample size guidelines specify when normality becomes less of a restriction:
- One-Sample and Paired: 20 or more observations.
- Two-Sample: At least 15 in each group.
Related posts: Central Limit Theorem and Skewed Distributions
Population standard deviation is unknown
A t test assumes you have a sample estimate of the standard deviation. In other words, you don’t know the precise value of the population standard deviation. This assumption is almost always true. However, if you know the population standard deviation, use the Z test instead. However, when n > 30, the difference between the t and Z tests becomes trivial.
Related post: Standard Deviations
Geri says
Hi Jim,
Your books have been a great resource for me and often refer to them.
I do have questions about which could be used, ANOVA or MANOVA, in the following situation. I have been comparing the two types but not sure if I’m meeting the criteria for either one and which would be most appropriate in this case.
I am wanting to examine whether there is a significant interaction effect related to change in stress levels when comparing groups. I’m looking at different teacher types (i.e. special education, general education, veteran, and new in-service teachers).
Also, I want to determine whether a statistically significant difference exists between teachers who participate in the treatment program versus those in the wait-list control group prior to the treatment program versus those in the wait-list control group prior to and after the implementation of the treatment program. Could I use the paired t-test for this?
Thank you!
Jim Frost says
Hi Geri,
I’m not 100% sure that I understand your study but here is what I gather.
It sounds like you need to perform a two-way ANOVA with an interaction effect. Teacher Type and Treatment Program Status are the two factors while the stress levels are the outcome. Include an interaction effect between the two factors to see if the relationship between teacher type and stress level depends on the treatment program. The Treatment Program factor will tell you if the overall mean between those in the program and not are different. The interaction effect tells you if treatment effect differs between teacher type.
You’d use MANOVA if you have multiple dependent variables that are correlated. It doesn’t sound like you have more than one from what you write.
A paired t-test would be unable to handle the two factors or the interaction effect. Additionally, a paired t-test can only as the before and after scores for 1 group. You have more than one group.
jim mcloughlin says
Hello Jim, and thank you on behalf of the thousands you have helped.
Question about which t test to use:
20 members of a committee are asked to interview and rate two candidates for a position – one candidate on Monday, the other candidate on Tuesday. So, one group of 20 committee members interviews 2 separate candidates one day after the other on the same variables . Would this scenario use a paired or independent application? thank you,, js
Jim Frost says
Hi Jim,
This would be a case where you’d potentially use a paired t-test. You’re determining whether there’s a significant difference between the two candidates as given by the same 20 committee members. The two observations are paired because it’s the same 20 members giving the two ratings.
The only wrinkle in that, which is why I say “potentially use,” is that ratings are often ordinal. If you have ordinal rankings, you might need to use a nonparametric test.
Shane McDonald says
Question about determining tails:
when determining the P values, this is what I am told:
“You draw a t curve and plot t value on the horizontal axis, then you check the sign in Ha, if it is > such as our case you shade the right hand side. ( if Ha has <sign, the shade the left hand side).II) Determine if the shaded side is a tail or not ( a smaller side is called a tail), if it is, P=sig/2;If it is not a tail then P=1-(sig/2)"
When emailing the isntructor, this is all I was told: For p of t test, if the shaded area according to your Ha is small, it is a tail (which is half of the two tails), if it is large then 1- a tail.
So, when determining P of T test, how do I know whether to perform 1-(p/2) or just P/2
We use the software SPSS so P=sig in the instructions.
Jim Frost says
Hi Shane,
From your description, I can’t tell what you’re saying.
Tails are just the thin, extreme parts of the distribution. In this hypothesis testing context, shaded areas are called critical regions or rejection regions. You need to determine whether your t-value (or other test statistic) falls within a critical region. If it does, your results are significant and you reject the null. However that process doesn’t tell you the p-value. I think you’re mixing two different things. Here are a couple of posts I’ve written that will clarify the issues you asked about.
Finding the P-value
One and Two Tailed Hypothesis Tests Explained
Charlotte says
Hi, Jim.
Happy New Year!
I have a few questions I was hoping youโd be able to help me with please?
In the case of a t-test, I know one assumption is that the DV should be the scale variable and the IV should be the categorical variable. I wondered if it mattered whether it was the other way around – so the scale variable was the IV and the categorial variable the DV. Would it make much difference? When Iโve done a t-test like this before, it doesnโt seem to, but I may be missing something.
Would it be better to recode the scale variable to a categorical variable and do a chi-square test?
Or does it just depend on what I am aiming to do. So whether I want to examine relationships or compare means?
Any advice would be appreciated.
Charlotte
Jim Frost says
Hi Charlotte
Yes, you can do that in the opposite direction but you’ll need to use a different analysis.
If you have two groups based on a categorical variable and a continuous variable, you have a couple of choices:
You can use the 2-sample t-test as you suggest to determine whether the group means are different.
Or, you can use something like binary logistic regression to use the continuous variable to predict the outcome of the binary variable.
Typically, you’ll choose the one that makes the most sense for your subject area. If you think group assignment affects the mean outcome, use the t-test. However, if you think the continuous value of a variable predicts the outcome of the binary variable, use binary logistic regression.
I hope that helps!
Stephen Ritchie says
Jim,
When the input variable is continuous (such as speed) and the output variable is categorical (pass/ fail) I know that logistic regression should be done. However can a standard 2-sample t-test be done to determine if the mean input level is independent of result (pass or fail)? Can a standard deviations test also be done to determine if the spread on values for the input variable is independent of result?
Mark says
This was really helpful. After reading it, conducting a T test analysis is almost like a walk in the park. Thanks!
Jim Frost says
Thanks so much, Mark!
Peach says
Thank you for your awesome work.
Daniel says
Your explanation is comprehensive even to non-statisticians
Jim Frost says
Thanks so much, Daniel. So glad my blog post could help!