Excel can perform various statistical analyses, including t-tests. It is an excellent option because nearly everyone can access Excel. This post is a great introduction to performing and interpreting t-tests even if Excel isn’t your primary statistical software package.

In this post, I provide step-by-step instructions for using Excel to perform t-tests. Importantly, I also show you how to select the correct form of t-test, choose the right options, and interpret the results. I also include links to additional resources I’ve written, which present clear explanations of relevant t-test concepts that you won’t find in Excel’s documentation. And, I use an example dataset for us to work through and interpret together!

T-tests are hypothesis tests that assess the means of one or two groups. Hypothesis tests use sample data to infer properties of entire populations. To be able to use a t-test, you need to obtain a random sample from your target populations. Depending on the t-test and how you configure it, the test can determine whether:

- Two group means are different.
- Paired means are different.
- One mean is different from a target value.

For more information about the types of t-tests you can use, read my post about 1-sample, 2-sample, and Paired t-Tests.

## Install the Data Analysis ToolPak in Excel

The Data Analysis ToolPak must be installed on your copy of Excel to perform t-tests. To determine whether you have this ToolPak installed, click *Data* in Excel’s menu across the top and look for *Data Analysis* in the *Analyze* section. If you don’t see Data Analysis, you need to install it. Don’t worry. It’s free!

To install Excel’s Analysis Tookpak, click the *File* tab on the top-left and then click *Options* on the bottom-left. Then, click *Add-Ins*. On the *Manage* drop-down list, choose *Excel Add-ins,* and click *Go*. On the popup that appears, check *Analysis ToolPak* and click *OK*.

After you enable it, click *Data Analysis* in the Data menu to display the analyses you can perform. Among other options, the popup presents three types of t-test, which we’ll cover next.

## Two-Sample t-Tests in Excel

Two-sample t-tests compare the means of precisely two groups—no more and no less! Typically, you perform this test to determine whether two population means are different. For example, do students who learn using Method A have a different mean score than those who learn using Method B? This form of the test uses independent samples. In other words, each group contains a unique set of people or items.

Statisticians consider differences between group means to be an unstandardized effect size because these values indicate the strength of the relationship using values that retain the natural units of the dependent variable. Effect sizes help you understand how important the findings are in a practical sense. To learn more about unstandardized and standardized effect sizes, read my post about Effect Sizes in Statistics.

The standard form tests the following hypotheses:

- Null: The two population means are equal.
- Alternative: The two population means are not equal.

If the p-value is less than your significance level (e.g., 0.05), you can reject the null hypothesis. The difference between the two means is statistically significant. Your sample provides strong enough evidence to conclude that the two population means are different.

For more information about the null and alternative hypotheses and other hypothesis testing terms, see my Hypothesis Testing Overview.

Also, learn about the difference between descriptive statistics and inferential statistics.

## t-Tests for Equal and Unequal Variances

You’ll notice that Excel has two forms of the two-sample t-test. One that assumes equal variances and the other that assumes unequal variances. Variances and the closely related standard deviation are measures of variability. All t-tests assume you obtained data from normally distributed populations. However, the conventional t-test also assumes the standard deviations/variances for both groups are equal. Another form of the test, known as Welch’s t-test, does not assume equal variances.

As an aside, thanks to the central limit theorem, you can safely use t-tests to analyze nonnormal data when have ~20 or more observations per group.

### Which One to Use?

Advice for using either the equal or unequal variances form of the 2-sample t-test varies because this issue is more complicated than it first appears. Some analysts advise using an F-test to determine whether the variances are unequal. And, Excel does offer the F-test Two-Sample for Variances. However, using additional tests always increases the probability of both false positives and false negatives (a.k.a, Type I and Type II errors).

Additionally, if you have a large sample size, the f-test has more statistical power. This condition can cause the test to identify an inconsequential difference as being statistically significant. That’s the difference between practical significance and statistical significance. Conversely, small sample sizes can fail to detect a substantial difference between variances.

When you have an equal, or nearly equal, number of observations in both groups and a moderate sample size, t-tests are robust to differences between variances. If you find one group has twice the variance of another group, it might be time to worry! However, you don’t need to worry about smaller differences.

Other analysts suggest always using the form of the t-test that assumes unequal variances. If you use this approach when the variances are equal, you lose a trivial amount of statistical power, but you’ll be better off when the variances are not equal.

If you have unequal variances *and* unequal samples sizes, it’s vital to use the unequal variances version of the 2-sample t-test!

## Step-by-Step Instructions for Running the Two-Sample t-Test in Excel

Let’s conduct a two-sample t-test! Our hypothetical scenario is that we are comparing scores from two teaching methods. We drew two random samples of students. One sample comprises students who learned using Method A while the other sample learned using Method B. These samples contain entirely different students. Now, we want to determine whether the two means are different. Download the CSV file that contains all data for both t-test examples in this post: t-TestExamples.

To perform a 2-sample t-test in Excel, arrange your data in two columns, as shown below.

Let’s assume that the variances are equal and use the Assuming Equal Variances version. If we had chosen the unequal variances form of the test, the steps and interpretation are the same—only the calculations change.

- In Excel, click
*Data Analysis*on the Data tab. - From the Data Analysis popup, choose
*t-Test: Two-Sample Assuming Equal Variances*. - Under
*Input*, select the ranges for both*Variable 1*and*Variable 2*. - In
*Hypothesized Mean Difference*, you’ll typically enter zero. This value is the null hypothesis value, which represents no effect. In this case, a mean difference of zero represents no difference between the two methods, which is no effect. - Check the
*Labels*checkbox if you have meaningful variable names in row 1. This option makes the output easier to interpret. Ensure that you include the label row in step #3. - Excel uses a default Alpha value of 0.05, which is usually a good value. Alpha is the significance level. Change this value only when you have a specific reason for doing so.
- Click OK.

For the example data, your popup should look like the image below:

After Excel creates the output, I autofit the width of column A to display all text in it.

## Interpreting the Two-Sample t-Test Results

The output indicates that mean for Method A is 71.50362 and for Method B it is 84.74241. Looking in the Variances row, we can see that they are not exactly equal, but they are close enough to assume equal variances. The p-value is the most important statistic. If you want to learn about the other statistics, you can read my posts about the t Stat (i.e., the t-value), df (degrees of freedom), and the t Critical values.

If the p-value is less than your significance level, the difference between means is statistically significant. Excel provides p-values for both one-tailed and two-tailed t-tests.

One-tailed t-tests can detect differences between means in only one direction. For example, a one-tailed test might determine only whether Method B is greater than Method A. Two-tailed tests can detect differences in either direction—greater than or less than. There are additional drawbacks for using one-tailed tests—so I’ll stick with the standard two-tailed results. To learn more, read my post about one-tailed and two-tailed tests.

For our results, we’ll use P(T<=t) two-tail, which is the p-value for the two-tailed form of the t-test. Because our p-value (0.000336) is less than the standard significance level of 0.05, we can reject the null hypothesis. Our sample data support the hypothesis that the population means are different. Specifically, Method B’s mean is greater than Method A’s mean.

## Paired t-Tests in Excel

Paired t-tests assess paired observations, which are often two measurements on the same person or item. Statisticians call these dependent samples. Suppose you gather a random sample of people. You give them all a pretest, administer a treatment, and then perform a posttest. Each subject has a pretest and posttest score. Or, perhaps you have a sample of wood boards, and you paint half of each board with one paint and the other half with different paint. Then, you measure the paint durability for both types of paint on all the boards. Each board has two paint durability scores.

In both cases, you can use a paired t-test to determine whether the difference between the means of the two sets of scores is statistically significant.

Unlike independent t-tests, paired t-tests use the same people or items in both groups. One way to determine whether a paired t-test is appropriate for your data is if each row in the dataset corresponds to one person or item. For our pretest/posttest example, we measure each subject before and after the experiment and placed the measurements for an individual on one row.

**Related post**: Independent and Dependent Samples

## Step-by-Step Instructions for Running the Paired t-Test in Excel

For this example, imagine that we have a training program, and we need to determine whether the difference between the mean pretest score and the mean posttest score is significantly different.

To perform a paired t-test in Excel, arrange your data into two columns so that each row represents one person or item, as shown below. Note that the analysis does not use the subject’s ID number.

- In Excel, click
*Data Analysis*on the Data tab. - From the Data Analysis popup, choose
*t-Test: Paired Two Sample for Means*. - Under
*Input*, select the ranges for both*Variable 1*and*Variable 2*. - In
*Hypothesized Mean Difference*, you’ll typically enter zero. This value is the null hypothesis value, which represents no effect. In this case, a mean difference of zero represents no difference between the two methods, which is no effect. - Check the
*Labels*checkbox if you have meaningful variables labels in row 1. This option helps make the output easier to interpret. Ensure that you include the label row in step #3. - Excel uses a default Alpha value of 0.05, which is usually a good value. Alpha is the significance level. Change this value only when you have a specific reason for doing so.
- Click OK.

For the example data, your popup should look like the image below:

## Interpreting Excel’s Paired t-Test Results

The output indicates that mean for the Pretest is 97.06223 and for the Posttest it is 107.8346.

If the p-value is less than your significance level, the difference between means is statistically significant. Again, Excel provides p-values for both one-tailed and two-tailed t-tests—and we’ll stick with the two-tailed result. For information about the other statistics, click the links in the 2-sample t-test section.

For our results, we’ll use P(T<=t) two-tail, which is the p-value for the two-tailed form of the t-test. Because our p-value (0.002221) is less than the standard significance level of 0.05, we can reject the null hypothesis. Our sample data support the hypothesis that the population means are different. Specifically, the Posttest mean is greater than the Pretest mean.

## What Excel’s t-Tests Do Not Include

As nice as it is to be able to perform t-tests in Excel, it leaves out some essential features. Notably, Excel cannot create confidence intervals. The means in these analyses are the point estimates for the population means. However, thanks to random error, the sample means never precisely equal the population mean. There is a margin of error around the estimates. Confidence intervals use a margin of error to calculate a range of values that is likely to contain the actual population mean for each group. Learn more about confidence intervals.

Excel also doesn’t calculate the estimated difference between the means. The difference between the means is the effect size for the analysis—an important value to know. By using a formula in Excel, you can easily calculate the difference between means. However, it would be nice to have a confidence interval for this difference too. For more information, read my post about using confidence intervals to assess differences between means.

Finally, Excel, strangely, does not provide a one-sample t-test! In some cases, you might have a single sample of data and need to determine whether it is different from a target value. For example, you might measure the strength of a product and use a one-sample t-test to determine whether it is significantly different from an important strength value.

t-Tests can compare up to two groups. If you have three or more groups, you’ll need to use ANOVA. For more information, see my posts about how to do one-way ANOVA in Excel and how to do two-way ANOVA in Excel!

If you’re learning about hypothesis testing and like the approach I use in my blog, check out my eBook!

Nathan says

Very descriptive and easy to understand. The examples really helped. Thanks Jim!

Jim Frost says

You’re very welcome, Nathan! So glad it was helpful!

Frank says

Jim, happy to see that you’ve added video to your content!

I hope you continue to add more of it 🙂

Jim Frost says

Hi Frank,

Thanks so much for the positive feedback! I will keep adding more videos!

Balraj says

Hi Jim, I realised that when I conducted a t-test two sample assuming equal variance, my t-stat value is negative, what could that suggest. The question is linked to “ Is there statistical evidence that the average price of houses with one bathroom is different than the average price of houses with more than one bathroom?”

Jim Frost says

Hi Balraj,

That’s not a problem and the conclusion is still the same. When you’re performing a 2-sample t-test you are subtracting the mean of one group from another like this:

Group A – Group B = Difference

If Group B has a higher mean than group A, you’ll obtain a negative value. I bet for your data, Group B is the set of homes with more than one bathroom, which presumably will have a higher mean price.

To learn more about the t-value, how it is calculated, and what it is for, read my post about T-values and T-distributions.

Balraj says

I understand now. Thank you very much. I appreciate the support you have given me.

Balraj says

Oh I did not know that.I would have to reject the null hypothesis as that value is less than the 5% significance level. Would that mean there is evidence that both averages are different? Based on the question about “Is there statistical evidence…” I stated in my previous comment, I also did a F-test Two Sample for Variances.Is that still fine to do based on the question.

Jim Frost says

Hi Balraj,

Yes, when you have a p-value less than your significance level, you reject the null hypothesis. That’s usually a good thing! Your p-value is lower than any reasonable significance level, which means that your results are statistically significant. To answer your question, it indicates that there is sufficient statistical evidence to conclude that the average housing prices between the two groups are different.

The Variance F-test answers a different type of question. It assesses whether the variability (rather than the mean) is different between the two groups. For more information, read my post about using Excel to analyze variability. If the question you’re answering specifically asks about the means prices, then you don’t want to use the Variances test because it’s answering a different question.

Balraj Sangha says

Hi, I am seeking some assistance,

A question states “Is there statistical evidence that the average price of houses with one bathroom is different

than the average price of houses with more than one bathroom?”. There are 321 observations and I conducted a t-test two sample assuming equal and unequal variances. Both test shows a negative P value(two tailed) for example, was “1.3502E-14”. Is there a solution you could provide?

Thanks.

Jim Frost says

Hi Balraj,

There’s no such thing as a negative p-value. P-values range from 0 to 1. Oh, I see the problem. What you’re seeing is not a negative value, it is scientific notation, which is used for very large and very small numbers. In your case, it’s a very small number, which is good because it’s a p-value!

The number to the right of the E (-14) tells you how many places to move the decimal point. The negative value indicates you need to move the decimal point to the left by 14 places.

It’s equivalent to: 0.000000000000013502. In other words, it is very significant!

It sounds like you’re performing the correct analysis. You need to perform a 2-sample t-test (independent samples), two-tailed. Assess the variation to determine whether to use the equal or unequal variances.

Yang says

Hi Jim,

I’m using 2-sample t-test, so your reply relieves my worry again.

Now I can decide what to do with my samples with confidence.

Thanks a lot! ^_^

Jim Frost says

You’re very welcome! Glad to help!

Best of luck with your analysis!

Yang says

Thanks a lot, Jim. However, there’s another situation confusing me. Consider two groups of observations, each comprising of more than 50 observations, but the exact numbers of observations in each group are not equal. If t-Test is to be carried out in such situation, what kind of additional work should be done? Or that t Test isn’t capable of such situation at all?

Thank you.

Jim Frost says

Hi Yang,

2-sample t-tests with independent samples are perfectly able to handle unequal numbers of observations per group. Paired t-tests use dependent samples and assess the differences between paired observations, so the groups must be equal. I’m not sure which type of t-test you’re using.

If you’re using a paired t-test, you’ll need to remove the items/subjects that have only one observed value. Include only the items with two observations.

If you’re using 2-sample t-tests with independent samples, you don’t need to do anything when you have unequal sample sizes. The test performs fine with unequal sample sizes. However, if you’re planning to conduct a 2-sample t-test, you’d prefer to have equal sample sizes to maximize statistical power. For example, if your limited to 100 observations, you’d like to split those 50/50 to get the most power. If you split it 60/40, the test performs just fine without adjustments, but the power will be slightly less than if it was 50/50. But there is no requirement for equal sample sizes and there is nothing to do if you find yourself with unequal sample sizes.

Yang says

Hi Jim,

You tutorials help me a lot, thank you.

As you explained, t-Test works well with samples comprising of 20 or more observations. However, I wonder would t-Test still be a good choice for samples comprising of more than one hundred observations? Or if there would be other fine technics suits such situation?

Jim Frost says

Hi Yang,

I’m so glad to hear that they’ve been helpful!

No, there’s not really a better method with even more observations. Having more observations increases the statistical power of t-tests as well as increasing its robustness to departures from normality. Also, the width of the confidence interval will be narrower as you increase the sample size, which means you’ll have a more precise estimate of the effect size. So, you don’t need a different test for larger sample sizes, but larger samples improve the t-test and CI!

riya says

Hi Jim, i had a question about reporting the independent sample t-test results. I have reported the significance and reported the differences in the group means, But are we also suppose to report the assumptions as well? like “Levene’s Test of Equality of Error Variance showed a non-significant result, thus the assumption of homogeneity of variance was found and equal variances were assumed”? is this important to report ?

Jim Frost says

Hi Riya,

Yes, that’s entirely acceptable. You can report the Levene test results as support for your decision to use equal variances.

Just as a caution, when you have a small sample size, it’s possible to obtain insignificant Levene’s test results even when the variances are not equal. That occurs because with a small sample size, Levene’s test loses statistical power (along with the t-test). Just something to be aware of with small sample sizes!

Also note that they don’t have to be exactly equal. Problems don’t start to occur until one variance is about twice the other.

rehana s says

Hi Jim,

Adding onto this question I am also facing a similar issue – I’ve got 2 groups (n=50 & n=6) but the output on excel states df=7. I have used your calculation above and got an answer of 29. In this case is it still appropriate to be trusting excel? and if excel has got the df wrong, does this affect the t-stat or p-2-tailed value.

Jim Frost says

Hi Rehana, it depends if you’re assuming equal variances or not. And, it depends on what the actual variances are too. Consequently, I can’t tell from what you write whether Excel calculated it correctly. The degrees of freedom definitely affect the t-value and p-value!

Rehana S says

Hi Jim,

If I am comparing 2 variables with very different sample sizes (n=50 and n=6), should I do a test assuming unequal variances even if the variances are similar (0.058 vs 0.032)?

Jim Frost says

Hi Rehana,

You don’t need to assume unequal variances because of the difference in sample size. However, one your variances is nearly double the other. That’s when you start to worry about the effect of unequal variances. Consequently, I would assume unequal variances for that reason.

Oliver Nyström says

Hi,

I am using this very helpful tool – thank you for your explanation. However, I am a bit worried about the degrees of freedom (df) outputted. I am conducting a 2-tailed t-test assuming unequal variances for 2 samples; “High” and “Low”. Each group has 130 observations included which I assume implies 258 degrees of freedom. However, the output I get for each test (20+ different pairs) varies as I have received df-values from everything between 223 a 258. Should I disregard this or? I believe you also show an output of a t-test of 2 groups, each with 15 observations, but with a df = 14….

Best regards,

Oliver

Jim Frost says

Hi Oliver,

The df gets more complicated for unequal variances. The equation is below for a 2-sample t-test with unequal variances. But I would just trust the output!

Rose says

Hi, i am seeking some advise.

I have done SEC on two differently treated samples, with 20 fractions analysed via NTA. so 20 fractions from method 1 and 20 fractions from method 2.

I am wanting to compare the significant between a specific fractions and the different methods, So say fraction 6 from method 1 and fraction 6 from method 2.

Is the correct thing to do is a one sample T test ?

Thanks

Jim Frost says

Hi Rose, if I understand correctly, you’re comparing two samples, so you’d need a 2-sample t-test. However, if you’re comparing multiple groups, you’ll need to use a post hoc test to control the family error rate.

Jeremy says

Yes, that data is just one column with years 2010-2019, and a second column labeled “Difference in Vehicle Accident Rates: Flip flops Minus All other Footwear”. Sorry my phone made it “slip shoes” in the last post and I didn’t notice. By range, I was just meaning the rate differences in the second column are from 3% to 9%. For more context, we are conducting a very basic multi-discipline case analysis. A lawsuit involving a vehicle collision where the plaintiff is suing the other driver, who was wearing flip flops. The plaintiff has obtained this survey data and there is a question as to whether it can be meaningfully analyzed to support a claim of negligence against plaintiff or possibly manufacturer. So the business law side is a separate component obviously, but I don’t see how to do hypothesis testing with this data. The assignment prompt is pretty vague with regards to statistical methods. The only thing I can think of is maybe regression analysis to show that more flip flop driving accidents are occurring each year, and therefore the manufacturer may need to consider product warnings to avoid strict liability.

Data is:

YEAR/ ACC. DIFF

2010 4%

2011 5%

2012 3%

2012 4%

2014 6%

2015 5%

2016 7%

2017 8%

2018 7%

2019 9%

I can derive a mean, standard deviation, and t-stat, and probability from this info, but would they really be valid given how the data is presented to us?

Jim Frost says

Hi Jeremy,

That’s a tricky dataset given the lack of details. We don’t know the raw counts and sample sizes for both groups for any of those years. If you had more details about the raw data, you might be able to determine whether flip flops had more accidents. This is really more of a proportions problem than a t-test problem. However, using a normal approximation, you might been able to do something along those lines with a t-test. However, proportions close to 0 or 100 are less likely to approximate a normal distribution. A large sample size can help, but we don’t know that either.

Also, whether you’re using a t-test or proportions test, the parameter in question (mean or proportion) has to remain constant over time. It doesn’t look like that’s the case.

It does appear that the difference between flip flop accidents and other shoes is increasing over the years. However, again, not knowing how the data were collected, sample sizes, locations, methodology etc., it’s not safe to draw those conclusions.

Finally, even if we could somehow trust the data and conclude the difference is growing, that still leaves us with a key unanswered question. Are flip flops causing the accidents? The data might be showing a correlation, but that doesn’t imply causation.

Of course, if the data are presented by the plaintiff, he’s not unbiased. We have no idea if the data are valid. Even if he didn’t intentionally bias the data, there are so many questions about the specifics, I’d say it’s impossible to draw any sound conclusions. If you trust the data, you might be able to say that the difference is growing over the years. But, without the raw data, you can’t apply any hypothesis test. Sample sizes might change over the years. Data collection locations might change. Etc. I’d imagine that if this was used in court, it would be ripped to shreds unless it was part of a published study. Any published study would have all those necessary details.

Maybe the point of the assignment is to know when not to use data because it can’t answer your question? Or, maybe you need to access the survey that you mention to verify all of those data and methodology questions?

Jeremy says

Hi Jim,

I just ordered your ebooks and I really appreciate you taking time to help others with their questions. Regarding determining the the appropriate statistical analysis, I am a bit stumped by a problem I am working on. For reference, I am a college sophomore studying business applications, so advanced analyses won’t apply. I am unsure how to approach a statistical analysis when very limited data is given. For example, I am provided only a table with two columns: years (n=10) and “difference in accident rates: slip shoes vs all other shoes”. I’m being asked to interpret the data. The rates are given as percentages, so I know the data is continuous; the range is (9-6=3), so I know there are no outliers and we can expect normal distribution. This seems like a paired t-test would be appropriate, except I’m confused because instead of being given separate data for slip shoes and all other shoes, I’m only given the differences. I’m not sure how I could model this in excel, although I have made basic calculations such as the standard deviation and mean. Any advice to point me in the right direction on analysis methods would be appreciated. Thanks for your time.

Jim Frost says

Hi Jeremy,

I’d guess that the original data were binary, accidents versus no accidents. From that you’d get the proportions/percentages.

When you say the range is 3, I don’t know that’s referring to. Range of what?

Sounds like you need to do a 2-sample proportions test. Although, that would be hard to do without the underlying data. They’re showing you the difference in accident rates over time? And, you have one column for each year that is the difference in accidents as a percentage?

It’s hard for me to picture, so I’m having a difficult time coming up with recommendations. Do you think they’re asking you to look for trends over time?

Dylan Jacklin says

Hi Jim, thanks for the info. I might have a pretty straightforward question but can’t find any specific answer and was hoping you could help. I have a number of groups that I am comparing with a t-test assuming equal variances, due to the multiple comparisons a Bonferroni adjustment is required. Here I know for my data it is alpha/45 = 0.00111.

My question to you is, in using the data analysis package for the t-test in excel the alpha is 0.05 (no problem there) but it can be changed (here you stated that we should try and avoid changing it), when I do this with my 0.00111 bonferroni adjusted p-value the results are the same as with an an alpha of 0.05, so what is the point in changing alpha here?

In addition, is it possible to do the bonferroni like this, or would it be better to two the t-test under alpha = 0.05 and then do the bonferroni adjustment ad hoc?

I look forward to hearing from you.

Kind regards,

Dylan

Jim Frost says

Hi Dylan,

If you’re changing your alpha based on the Bonferroni correction, it makes sense that the rest of the results stay the same. That’s because the significance level is something that you as the researcher sets. It doesn’t enter into the calculations at all. To learn why, read my article about understanding significance levels. I’m not really sure why Excel asks for alpha.

What you need to do is compare the p-values to your alpha. With a lower alpha, the p-values must be lower to be significant. That can change which group comparisons end up being statistically significant. So, it is potentially changing the results in that sense. But, Excel’s output doesn’t change. And, when I say that an alpha of 0.05 is generally a good one to use, I mean for a single test and not multiple comparisons.

45 groups is a large number! As you’re seeing, that dramatically lowers your effective alpha for each comparison. Additionally, Bonferroni is a conservative procedure in that it plays it extra safe, but that causes you to lose statistical power.

Your case is one where Excel’s limitations might be a problem. I’d recommend reading my article about using post hoc tests. There are other comparison procedures that aren’t as conservative and if you don’t need to compare all pairwise groups, that can help too. I discuss all that in that other post. It might give you some ideas. On the other hand, if you’re happy with results you’re getting using Bonferroni for so many groups, it might not be a problem! I write about these post hoc tests in the context of ANOVA, but the ideas apply for multiple comparisons even if you don’t perform the ANOVA test.

I hope this helps!

Jacob says

Hi Jim,

I’ve collected electrical conductivity values of water samples at a site I’m doing a study at and would like to know which statistical test would be appropriate to use please. I have taken readings from 3 sites, and at 4 times in the day corresponding to low tide, high tide, and two intervening periods equally spread out within that. At each site and time of the day, I took two readings which I averaged to get a mean. I would like to know whether the values significantly differ at a certain site (where I have 4 mean readings), so I can say that the tidal impact on electrical conductivity is negligible because this test shows that at site 1, 2 and 3, they don’t vary significantly. I hope this makes sense!

Thanks.

Cerys says

Thank you

Cerys says

Hi Jim, Could you help me on my data interpretation for a t-test. I am confused to whether I have one or two tailed distributions and then whether my results are paired, homoscedastic or heteroscedastic data.

I am evaluating pollution change from 2019 to 2020 due to the covid lockdown. I have pollution data from the same time period 23/3 – 30/4 for 2019 and then 2020, for one pollution type for example nitrogen dioxide data.

Thank you in advance

Cerys

Jim Frost says

Hi Cerys,

I have several posts about whether you should use a one or two-tailed test. I’d recommend reading those and then if you have more detailed questions, post them there. Typically, use a two-tailed test unless you have a strong rationale for using a one-tailed test. The posts describe what constitutes a “strong rationale.”

One-tailed and Two-tailed Tests Explained

When Can I Use One-Tailed Tests?

As for the equal variances assumption, you can do a formal variances test in Excel. However, if you have a very large sample size, the test will detect unsubstantial differences in variances that won’t affect your results. The general rule of thumb is to look at the standard deviations and if one is twice (or more) the size of the other, use Welch’s t-test which does not assume unequal variances. In fact, if you’re not sure, just use the unequal variances version and know that you can trust the results. You only lose a little bit of statistical power.

I hope that helps!

PRASANTH VASANTH PAI says

Excellent! Thank you, Jim. I will check out the pages suggested by you.

PRASANTH VASANTH PAI says

Jim, please do check the question during your free time.

Jim Frost says

Hi Prasanth, sorry for the delay. I’ve replied to your earlier comment!

Sid says

Hey, Jim!

Just got a quick questions cus im kinda confused right now. Im doing a study on The Effectiveness of Face-to-face Classes and Remote Learning among University Students. I honestly have no idea what to do nor what my Ho and Ha should be. I did a descriptive survey with questions about how effective is face to face learning and another set of questions for how effective is remote learning to gather data. I was wondering what the variables should be for x and y and what kind of test i should do cus im tasked to do a hypothesis testing on one population mean.

Thank you

Jim Frost says

Hi Sid,

It’s hard for me to answer some of your questions because I don’t know the type of data you are gathering. However, here’s one possibility.

If you measured effectiveness for in-person and remote classes using a continuous outcome variable, you could compare the mean effectiveness between the two groups. Your hypotheses would be the following:

H0: the mean in-person and remote effectiveness are equal.

HA: the mean in-person and remote effectiveness are not equal.

You could then use a two-sample t-test to compare the means of the two groups. If your p-value is less than your significance level, you can reject H0 and conclude that the results are statistically significant. Significant results would indicate that your sample provides sufficient evidence to conclude the population means are different.

I hope that helps!

REMYA R says

sir if i have 2 groups one with diet and one with diet and yoga, and i want to compare their blood pressure and bmi before and after then what statitical tool should i use

Jim Frost says

Hi Remya, sounds like you need to use repeated measures ANOVA.

AP says

Thank you so much Jim for taking the time to reply to me.

Let me try to explain what I aming for. Let’s assume I have a group of 500,000 customers who hold a specific financial instrument. Out of those customers, 100,000 also hold another financial instrument. I want to test that the 100,000 customer group holding the 2 instruments (hence the uneveness of the two sample sizes), have a higher income than the rest 400,000 customers.

Is the Two-Sample t-Test the right test for this study?

Should I perform an F-test before proceeding with the Two-Sample t-Test?

Or, should I randomly select a 100,000 sample from customers with one instrument only to make the two groups even in sizes?

Thank you very much in advance Jim.

Jim Frost says

Hi,

Be sure that your groups are independent (different people, no people in both groups). For your scenario, you have 500,000 people. You’d split them so that the 100,000 with the other instrument are in one group and the people who don’t have it (400,000) are in the other. The 2-sample t-test is valid.

If you’re unsure about have equal variability between the groups, I actually would not recommend the F-test for such a large sample size. That test will have so much power with such a large sample size that even a trivial difference in variability will be statistically significant. Basically, a false alarm. Instead, just compare the standard deviations. If one standard deviation is twice (or more) the size of the other, then you have problems. And, if you’re unsure, you can perform the form of the t-test which doesn’t require equal variances.

Finally, your theoretically supposed to draw a random sample from a defined population. The results apply to the population that you drew the sample from. I’m guessing from what you wrote that is not how you acquired your data. Existing customers. You’ll have to evaluate how well it approximates a random sample from some population.

I hope that helps!

AP says

Hi Jim, this is extremely useful.

Is there an issue with the reliability of the results when performing a Two-Sample t-Tests with uneven sample sizes? i.e. 87,000 vs 400,000.

Thanks

Jim Frost says

HI AP,

It’s fine to use different sample sizes. The results are still valid. If you’re planning a study and want to maximize your statistical power, you’ll get the maximum power if you split the total sample size in half. For example, if you want a total sample size of 100, you’ll get maximum power for a 2-sample t-test if you split it 50/50 between the two groups.

However, if you’re working with an already uneven split, that’s no problem. And, with such large sample sizes that you have, you probably won’t have problems with low power.

There’s one potential snag you should think about. Why are the sample sizes so uneven? If it’s because subjects dropped out unevenly, you have the potential for selection bias if they dropped out in some manner related to the outcome. That’s not an issue with the hypothesis test but rather the underlying data.

I hope that helps!

Scarlet says

Hi Jim,

Thank you for such handy explanations! I am a fairly amateur with stats, with just a single university unit under my belt. I am unsure how to approach the following data.

We have a collection of data with 4 different grocery store chains, measuring the brix (sugar level) and pressure of peaches form each shop over 8 weeks of the season. How do I go about testing the change data from the 4 different shops over 8 weeks. We would also like to see if there is a relationship between sugar and pressure, but I assume this is too many variables for a single test route, so would split into two tests.

Please note, my head is in a rabbit hole at this point, so I may be blind to some very obvious errors i’m making!

Hope you can help,

cheers.

Kumar says

Hello, Jim, first thank you for creating such a valuable resource. My question is this: Lets say I have three variables in each column: name, gender and height (in stacked column format, so i have one row of data for each unique value: e.g Dave, Male, 23 and then next column Sara, Female 31 and so on).

I want to compare if the mean height of men and women are statistically different in my data. How can i do that using excel. Can this be done? How?

Thank you and wishes for 2021.

Noah

Jim Frost says

Hi Kumar,

There are several things to consider about t-tests. For one thing, you can have at most only one categorical variable, such as gender. And, that categorical variable can only contain two groups. For gender, you can compare the mean of some characteristic between males and females. When using t-tests, you are not able to include a continuous variables, such as age.

It sounds like you need to use regression analysis, which will allow you to include both gender and age in the statistical analysis. Read my post about when to use regression analysis for more information. Also, I’ve written a post about how to perform regression analysis using Excel.

Karine Green says

Great! Thanks so much for your help

PRASANTH VASANTH PAI says

Hey Jim, thank you for the page – I have a doubt – Why can’t we use Confidence.t function in Excel to calculate margin of error for differences of two means of independent populations.

As I understand, confidence.t function can be used to calculate margin of error for difference of two means for matched pair samples.

Jim Frost says

Hi Prasanth,

Sorry for the delay, I looked into the function. The Confidence.t function is designed to calculate the confidence interval for a single sample. In other words, you have one column of values and you want to calculate the CI for that column.

Yes, you can use this function to calculate the CI for the differences between paired samples. The reason this works is because a paired t-test is equivalent to a 1-sample t-test on the paired differences. The calculations are the same, which means it is ok to use the same functions.

However, it is correct that you CANNOT use it for the difference in means between independent groups. That’s because, unlike the example above, a 1-sample t-test is not equivalent to a 2-sample t-test. The calculations are different and you, therefore, can’t use the same commands.

For more details, look at my other post about t-tests and t-values, which illustrates the equivalence between paired t-tests and 1-sample t-tests. Note how 2-sample t-tests (independent groups) are different.

I hope this helps!

Karine Green says

Hi Jim. Im getting a letter in my t-test result: 2.84997767514918E-06

What does this mean please?

Jim Frost says

Hi Karine,

That is scientific notation, which is used for very large and very small numbers. In your case, it’s a very small number, which is good if it’s a p-value!

The number to the right of the E (-06) tells you how many places to move the decimal point. The negative value indicates you need to move the decimal point to the left by 6 places.

It’s equivalent to: 0.000002849. . . .

I hope that helps!

Kelly says

What do I put in the Hypothesized Mean Difference field though if the claim is that there is a difference in the means (the first one is greater than the second)?

Jim Frost says

Hi Kelly, I cover this in step 4 of the procedures.

Dang says

Hi Jim,

I have a problem with my data that is i only have the sum of the samples without specific value for each sample, how should i perform it in excel? Can you help me, please? My data: 6241 worms / 30 samples. Thank you!

Jim Frost says

Hi Dang,

You’d need to tell me more about the goal of your analysis. If you data comprise counts of worms for each sample, you might need to perform a Poisson rate test or Poisson regression. But, I’m not sure what you want to learn from your data.

Stacey says

Hi Jim!!!!

Thank you for the easy to understand explanation. I have used the t-test before but never truly understood it or when to use it the way I do now!

Jim Frost says

Hi Stacey, thanks for writing! I’m so glad to hear that this post was helpful!

Valodia says

Hi Jim. This is an excellent explanation. Thank you so much.

I was wondering if there is a way to calculate the standard deviation from the information that results from the excel output?

Thanks.

Valodia

Jim Frost says

Hi Valodia,

Yes, you can! The standard deviation is just the square root of the variance. So, all you need to do is look for variance in the output and take the square root of it to find the standard deviation. For example, if you look in Excel’s output for the 2-sample t-test, you’ll see variances for each group and the pooled variance. Take the square root of those variances to find the standard deviation for each group and the pooled standard deviation, which is for all the data.

I hope that helps!

Aniket says

Thanks for your comments Jim!! I ended up doing a 3 way ANOVA. I thought to share my results with you. Might interest you. Worked like a charm!!

Jim Frost says

Hi Aniket,

I’m so glad it worked like a charm!! I had to remove the output because it was so long. But, that’s great!

Jeyasushma Veeriah says

Hi Jim, I am a newbie in discovering the wonders of Excel in calculations. I am in a dilemma now. I have this questionnaire with 4 dimensions – so lets say for Dimension A, I have 7 questions, Dimension B (11 questions), Dimension C (10 questions) and Dimension D (7 questions). Now I would like to see the relationship between gender and dimension as a whole. Now, my questions are (1) How do I group the 7 questions in Dimension A as one variable? (2) How do I group all the questions in the 4 dimensions as 1 variable? Thank you Jim.

Aniket says

Holy heck Jim no one has ever replied to me on any other stats message board. Thanks.

I want to pick your brain a little bit more. Below will be the structure of the data. Such data will be collected for 4 weeks at two temperature each. Can i do a one factor anova for one week data and compare if the formulations are different? At the end of the study I will take each formulation at different weeks and do another anova to check how formulations behave over time. Is there a simpler way?

Week0

F1 F2 F3 F4 F5 F6 F7 F8

Outcome-Repeat1 x x x x x x x x

Outcome-Repeat2 x x x x x x x x

Outcome-Repeat3 x x x x x x x x

Jim Frost says

Hi Aniket,

You’re very welcome!

I don’t see why you couldn’t do it that way. However, you’re throwing away valuable information by not including the week 0 (i.e. pretest) information. That would be my concern. If you have that data, you might as well use it in a repeated measures design.

If you go the route of a one-way ANOVA at week one, that would tell if the formulations have different group means at that point. However, that test will have less power than the repeated measures design including the pretest scores.

So, can it be done? Yes. It’s a simpler model too. But, you lose some statistical power. My upcoming post is about independent and dependent groups, which gets into repeated measures designs and the issues involved.

Aniket says

Hi Jim!! Big Fan. I have the following problem. Would like to know what you think

I am screening different liquid formulations for a drug. There are 8 unique formulations studied at two different temperatures. The aim is to identify which formulation is best with respect to stability. I was planning to do an ANOVA for the 8 different formulations to see if the formulations are different from each other and then do a pairwise t test at the two temperatures as it is essentially the same sample at two different temperatures. Is this a good approach?

Jim Frost says

Hi Aniket,

It sounds like you need to perform a repeated measures ANOVA, which is an extension of the paired t test. You’d include Formulation, Temperature, and Time in your model. You’d be particularly interested in the interactions between Formulation*Time (does the relationship between formulation and your DV depend on time). If that term is significant, then you know the formulations behave differently over time. Also, you could include temperature in that interaction. Three-way interactions are rare in analyses, but your model might call for it. It would answer the question whether the relationship between Formulation and the DV depends on both time and temperature.

Additionally, you can perform post hoc analyses on these interactions terms so you can control the familywise error rate for all these comparison. You don’t get that benefit if you use a series of paired t-tests.

I hope that helps!

Anum says

how we will tell that mean of one variable is greater than the other variable in two tailed t test

Jim Frost says

Hi Anum,

There are two ways. One is just to look at the two mean values. The output should display the mean for each group. That’s the easiest way. For example, if the mean of group 1 is greater than group two, and the results are significant, then you know that the mean 1 is significantly larger than mean 2. You can also look at the t-values and then link that to critical areas and how that relates to your hypothesis test.

But, I’d just look at the group means!

Natalia K says

Hi Jim. Could you please clarify if t-test can be used if data have a number of outliers, i.e. distributions are heavily skewed?

Many thanks

Jim Frost says

Hi Natalia,

T-tests are not recommended when you have outliers that skew the distributions. A few values can dramatically change your results. Consider using a nonparametric test instead. I talk about this in my post about parametric vs. nonparametric tests. It’s under one of the benefits of using nonparametric tests.

Geeta Pradeep says

Hello Jim,

I am trying to assess the impact of an intervention in a sample of 30 participants pre and post intervention using a set of clinical outcome measures sample before and after the intervention.

As the intervention was applied at different times for each participant, I have collected different number of pre and post intervention outcome measures.

To clarify: I have 2 pre-intervention outcome measures and only one post intervention measures for participant X while I have 3 pre-intervention and post intervention measures for participant Y.

What is the best way to do a paired sample T test in such a case?

For some participants, there is no corresponding post-intervention outcome recorded as the prescriber did not request for the clinical test.

Do I just choose the last recorded pre and post intervention values or do I calculate an average of pre and post intervention measures for each participant? If I am to calculate the average, then how do I account for a single or no outcome value?

I’d appreciate your advice on this.

Thank you

Geeta

Geeta Pradeep says

Hi Jim,

Your website has been a heaven sent for me! My knowledge of statistics is rudimentary and I cannot thank you enough for the simple way in which you explain concepts.

I am currently writing up my thesis and I had a question about selection of variables in a paired sample t-test. My research looks at the impact ( if any) of having a pharmacist involved in the care of people with type 2 diabetes and I have clinical parameters measured pre and post intervention. I also have a control group who have been exposed to usual care.

My question is – does it matter in what order I choose my sample variable when I do the paired t-test ( using XLSTAT). I have added pre-intervention means as variable 1 and post-intervention as variable 2.

Would the values change if I conduct the test choosing post intervention as variable 1 and pre as variable 2?

Apologies in advance if this is a silly question but I just want to be sure I am on the right path here.

Once again, many thanks.

Cheers

Geeta

Iqra says

Thnks jim

Iqra says

Hey sidra .you find the answer yet or not.i thnk tour student of b.ed i am also looking for ths answer

Jim Frost says

Hi, I answered Sidra’s question in the comments. You might need to scroll down.

Sidra says

Thank you Jim. your answer was really helpful.

Sidra says

Hi jim

i am a new student of statistics. I really need to solve question. Can you give me any idea which test will be suitable for this research. “A researcher conducted a research that majority of the people who died during pandemic bought a new phone during last year.If his assumption is correct which statistical test should be appropriate “

Jim Frost says

Hi Sidra,

It sounds like you need to perform a two proportions test. You need to determine whether the proportion of pandemic deaths is different between the group that bought a new phone within a year versions those who did not. Caution: For these data, you wouldn’t be proving that buying phones caused pandemic deaths. Instead, you’d be showing a correlation (not causation) between them.

In my new ebook about hypothesis testing, I cover proportions tests in great detail.

Brandi C Sturgis says

I am conducting a quantitative research study. The hypothesis is does Variable 1 (intervention) impact Variable 2 (posttest scores). I have a control group and an experimental group and pretest and posttest scores for all participants. I have entered all scores as 1 data set in excel and from my understanding should use the paired mean t-test? But then how do I compare the groups? For example how do I find the mean for the experimental group separately from the control group? Or the separate standard deviations? When reporting results would I use the one p-value calculated for the total group for the null hypothesis testing?

Jim Frost says

Hi Brandi,

Based on what you write, it sounds like you’ll need to use repeated measures ANOVA rather than a t-test. You have pre-test and post-test scores and you have two experimental groups. That’s four groupings based on two independent variables, time and experimental group. T-tests can’t analyze that type of design.

To answer your question about whether the mean difference between pre-test and post-test scores depend on group assignment, you’ll need to include an interaction term for time*group in your repeated measures ANOVA model.

Camryn Froerer says

Hi Jim,

I am attempting to run a t-Test comparing time to complete a task where the pre- and post array data each have 70 days. However, in the post data there are days when the task didn’t occur. Is the correct way to represent this within the array to include a 0 in each of the lines, or to leave it blank?

I have run it each way and in this instance the p-value becomes >0.05 if the 0’s are included for time measures, but become significant if the 0’s aren’t included. However, I want to represent the data correctly regardless of the resulting significance.

Thank you for your help.

Jim Frost says

Hi Camryn,

I’m assuming you’re using a pair t-test with a pre and post test time for each subject. In this case, if a subject doesn’t complete both the pre and post test, you need to exclude that subject from the analysis. You definitely don’t want to enter a zero for missing data because the analysis will interpret that as a time of zero.

I hope that helps. Best of of luck with your analysis!

Kevin says

Hi Jim! Your explanation helped me a lot to understand how a t-Test works! I have one question about multiple t-Tests. If I conducted a t-Test (4 groups 1,2,3,4 and tested 1&4 / 2&3) and ANOVA for all 4 together does this count as multiple Testing? Therefore a Bonferroni correction is needed?

Thanks a lot!

Kevin

Jim Frost says

Hi Kevin, yes, that definitely does count as multiple testing. You shouldn’t be using t-tests for that. Instead, conduct the ANOVA and then use a post hoc test. Click that link to read an article I wrote about them. Bonferroni is a type of post hoc test and that would be an acceptable one to use. However, I don’t usually recommend that particular test because it’s known to be more conservative (less statistical power) than other post hoc tests.

Jim Hardman says

Hi Jim (Great name btw!). I have a question about the one tailed P-Value that Excel reports. It specifically says “P[T <= t]" which if interpreted literally would be P[ T <= – 1.652] if the t test statistic were -1.652, and if there were, say, 13 df, then P[T <= -1.652] would be equal to 0.061235 (rounded off to the 5th decimal place). If, on the other hand, the t test statistic had been +1.652, then P[T <= 1.652] = 0.93877 (rounded off to the 5th decimal place), BUT Excel would still report, "P[T <= t"] as 0.061235. So my question is this: even though Excel prints "P[T <= t] one tailed", it's really calculating the left-hand tail probability if the test statistic is a negative number, and the right-hand tail probability if the test statistic is a positive number? Do I have that correct? Thanks (for the life of me I could not find the answer to this question out there on the "interwebs"!)

Jim Frost says

Hi Jim! Yes, it is indeed a great name!

Those labels in Excel are wonky. My take is that they’re missing symbols for absolute values. I noticed this weirdness before but your comment prompted me to think it through for both one- and two-tailed cases. T are t-values in the t-distribution while t is from your sample.

For both one-tailed tests and two-tailed tests, I believe it should be: P(|T| ≥ |t|).

This indicates the probability of obtaining absolute t-values at least as extreme as the absolute value of the t-value your study obtained. That covers all the cases I think for both positive and negative sample t-values and one-tailed tests in either direction. Excel would include “one-tailed” or “two-tailed” after that label because that tells you whether that probability is distributed between two tails or just one tail.

–The other Jim

Maja says

Hey Jim,

I would like to know if you happen to know which formula Excel are using when calculating the t-value in the unequal variance t-test?

So sorry for my “danglish” 🙂

Lauren Zeglio says

Hi Jim,

Super helpful, i need to create a box plot from my P value, how would i go about doing this.

Thanks

Jim Frost says

Hi Lauren, I don’t understand what you mean by creating a box plot from a p-value. I do have a post that explains box plots. Perhaps that will help. If that post doesn’t answer your question, please comment in the boxplot post and provide more details about what you want to do exactly. Thanks!

Mike H says

Hi Jim. Thank you for the work you are doing here. I read this article to help me better explain how to decide which excel tool to use when for a t-test. It helped much! I was disappointed to learn that excel doesn’t do one sample t-tests… but maybe we have a ray of hope? I read another article you wrote about one or two or paired t-tests (first link in this article) and there you wrote “Here’s a secret! Paired t-tests and 1-sample t-tests are the same hypothesis test incognito!”

So, a question – can one, using excel’s paired t-test function, put sample data in the first column and the null hypothesis value, repeatedly, in the second column and “trick” excel into running a one sample t-test using the paired t-test function?

Jim Frost says

Hi Mike,

That’s a great question. If you enter your 1-sample data in one column and then enter all zeroes in the other column, that

shoulddo the trick. I haven’t tested that to know for sure but will try it when I can. I suppose there’s a chance that Excel might choke on that.If that approach works, entering all zeroes for one column will test the hypothesis of whether your data are significantly different from zero. Entering a different value from zero should test whether your data are different from that value. For example, if you wanted to determine whether your data were significantly different from a mean of 5, you’d enter all 5s in the 2nd column.

As you’ve noted from the other post, paired t-tests simply subtract one observation from another for each subject/item and then performs a 1-sample t-test using those paired differences.

Penn Gao says

Thank you, Jim!

Your explanation is very clear and it helps me a lot.

Marty Shudak says

Jim, I know you are an expert with Minitab. I recently learned that the test of two proportions have two ways to calculate the z score depending upon whether you indicate it use a pooled or separate estimate of p. Can you explain how I would decide that as the differences in results can be substantial. Also, I’m not sure which one is used in Excel.

Mommy Bloggers Phil (@MommyBloggersPh) says

A rather technical post but I’m sure this will be helpful for those mastering Excel.

Jim Frost says

Hi Phil,

I do provide the step-by-step instructions for how to do this in Excel. However, I hope to have accomplished other goals in the process. For instance, there are several t-test to choose from–which one is appropriate. And, how to choose the correct p-value to use? What’s the correct interpretation? There’s also links to other resources of mine the explain the various other statistics you’ll see in the output. And, importantly, what can’t Excel do? Excel doesn’t elaborate on all of that. Hopefully, that information is useful to readers!

Ramesh Chandra Das says

Nice Explanation!!!!!!!!!!!

Muruhura Nickson says

yess… it was so helpful.. thank you