What is the Mann Whitney U Test?
The Mann Whitney U test is a nonparametric hypothesis test that compares two independent groups. Statisticians also refer to it as the Wilcoxon rank sum test. The Kruskal Wallis test extends this analysis so that can compare more than two groups.
If you’re involved in data analysis or scientific research, you’re likely familiar with the t-test. But did you know there’s another method for comparing two independent samples? That method is the Mann-Whitney U Test.
It is a nonparametric analysis named after two statisticians, H.B. Mann and D.R. Whitney. Because it is nonparametric, it makes fewer assumptions about your data than its parametric counterparts.
Many analysts use the Mann Whitney U test to determine whether the difference between the medians of two groups is statistically significant. However, it’s important to note that it only tells us about the median in certain situations. Interpreting the test results can be tricky. More on this later!
If you need a nonparametric test for paired groups or a single sample, consider the Wilcoxon signed rank test.
Learn more about Parametric vs. Nonparametric Tests and Hypothesis Testing Overview.
What Does the Mann Whitney U Test Tell You?
If you search the Internet, you’ll find the following two common interpretations for a statistically significant Mann Whitney U test:
- The difference between the medians is significant.
- The groups come from populations with different distributions.
Unfortunately, neither of these interpretations are necessarily correct—although they can be true some of the time.
In the strictest technical sense, the Mann Whitney U test indicates whether one population tends to produce higher values than the other population. This correct interpretation relates to the two I list above but doesn’t directly translate to them.
Let’s quickly examine how this test works to understand why this is true.
The procedure ranks all the sample data from low to high. Then it sums the ranks for both groups. If the results are statistically significant, one group tends to have higher ranking values than the other.
This analysis doesn’t involve medians or other distributional properties—just the ranks.
Special Case for Same Shapes
However, when the shapes of the two distributions are similar, the Mann Whitney U test does tell us about the median. That’s not a property of the analysis itself but logic. If two distributions have the same shape, but one is shifted higher, its median must be higher. But we can only draw that conclusion about the medians when the distributions have the same shapes.
In essence, the Mann Whitney U test rolls up both the location and shape parameters into a single evaluation of whether one distribution tends to produce higher values, preventing you from drawing conclusions about the location specifically (e.g., the medians). However, when you hold the shape constant, you can make inferences about the location.
These two distributions have the same shape, but the red one is shifted right to higher values. Wherever the median falls on the blue distribution, it’ll be in the corresponding position in the red distribution. In this case, the test can assess the medians.
Analysis Assumptions
The Mann-Whitney U test has a set of assumptions like any other statistical analysis.
- Independent Groups: Each group has a distinct set of subjects or items.
- Independent Observations: Each observation should be independent of others. Essentially, what happens to one shouldn’t affect the others.
- Continuous or Ordinal Data: It can handle continuous or ordinal data because it works with ranks. However, it can’t use categorical data.
- Same Distribution Shape: This assumption applies only when you want to draw inferences about the medians. If this assumption holds, the test can provide insights about the medians.
Violating these assumptions can lead to incorrect conclusions.
When to Use this Test?
Consider using the Mann Whitney U test when your data follow a nonnormal distribution, and you have a small sample size. Learn more about the Normal Distribution.
Alternatively, use it if understanding the median is more pertinent to your subject area than the mean and the distribution shapes are the same.
If you have more than 15 observations in each group, you might want to use the t-test even when you have nonnormal data. The central limit theorem causes the sampling distributions to converge on normality, making the t-test an appropriate choice.
Independent samples t-tests have several advantages over the Mann Whitney U test, including the following:
- More statistical power to detect differences.
- Can handle distributions with different shapes (Use Welch’s t-test).
- Avoids the interpretation issues discussed above.
In short, use this nonparametric analysis when you’re specifically interested in the medians, have ordinal data, or can’t use the t-test because you have a small, nonnormal sample.
Mann Whitney U Test Example
Suppose you’re a paint supplier, and you’re evaluating the median number of months that two paints last.
Let’s perform the analysis! Download the CSV dataset to try it yourself: MannWhitney.
The best way to report values for a Mann Whitney U test is to include both the p-value and confidence interval.
For this analysis, the p-value is 0.0019 and it is statistically significant. The 95.5% confidence interval for the median difference is [-3.000 -0.901]. Because the confidence interval excludes zero, it further illustrates that the results are statistically significant.
Be aware that these interpretations about the median are valid only if the two distributions have the same shape.
For another Mann Whitney U test example, read my post where I use it to analyze data from the Mythbuster’s Battle of the Sexes TV episode.
Reference
Mann-Whitney test is not just a test of medians: differences in spread can be important,
Benson says
Hi Jim,
How to report if by using the Mann-Whitney Test, the medians from the two groups are the same but the test showed the difference is significant?
Jim Frost says
Hi Benson,
From what you write, I can infer that the shapes of the two distributions must be different. Remember, you can only draw conclusions about the median with the Mann-Whitney test when the two groups have the same shaped distributions. If that’s not true, then you can’t draw conclusions about the median.
I suggest graphing the data for your groups to confirm my suspicions. And read portion in this blog post where I talk about the assumption that both groups have the same shaped distribution. The M-W test doesn’t inherently relate to the medians.
So, what can you conclude in this case? One group has an average rank that is higher than the other group. More specifically, the test indicates that difference between the two average ranks is statistical significant. In other words, one group tends to produce higher values than the other. Unfortunately, you can’t draw any conclusions about the medians but knowing that one group tends to have higher values can still be useful.
Nour says
Thanks for responding!
As I understood from you above, does central limit theorem have any benefit when using tests of significance because even if the data is large and not normally distributed and I want to compare between something like quality of life score of males and females, I will be obliged to use Mann Whitney test and not t test (because median is more representative than mean in this situation because the data is not normally distributed).In other words, I can’t benefit from using central limit theorem in this situation 🙁
Jim Frost says
Hi Nour,
You misunderstood. I indicated that your data appear to be normally distributed according to the QQ plot. Therefore, it’s fine for you to use the mean and a t-test!
I probably shouldn’t have included the other information but it did answer your other questions, such as when to use the mean or median.
Go ahead and use the t-test and assess means!
Nour says
Hi Jim,
I have a question.First, If I have large data like 1000 responses about something like anxiety levels and the data is not normally distributed according to tests like Shapiro–Wilk test, but it is normally distributed when using qq plot. Does that mean I can use tests that require normality like t test and if the data isn’t normal according to qq plot, can I still able to use t test?
Second, as I understood from you that we use t test when we want to assess mean and we use Mann Whitney test when we want to assess median. The question is how to know which is suitable mean or median. I thought we should assess normality to know whether we will use mean or median, but you say normality isn’t important in large data ( and I want to know if it is not important because tests like shapiro will give us false results or even if it is not normal graphically won’t also be important). Assume that I want to use median because the data is not normally distributed and I used Mann Whitney, but the shapes of distributions weren’t similar, what should I do then.
Jim Frost says
Hi Nour,
Thanks for the great question! It probably has many more aspects to it than you were thinking.
So, first off, I’m going to assume your response variable is continuous. If it’s Likert scale or something else, we have additional considerations to deal with.
You are likely correct that you can use a t-test for your data for several reasons. For one thing, even though the normality test indicates your data are nonnormally distributed, the QQ plot trumps that. Like all hypothesis tests, normality tests gain power with larger sample sizes. With a very large sample size, it can detect trivial departures from normality. Because your data appear to follow a normal distribution on your QQ plot, it is fairly safe to assume that the normality test is overpowered and your data follow a normal distribution. Read my post about QQ Plots to read more about this aspect (I address it near the end of that post).
Furthermore, the central limit theorem is your friend. Even if your data were nonnormal, with such a large sample size, the sampling distribution of the mean will follow a normal distribution. You can waive the normality assumption when your sample size is sufficiently large. And yours is large enough!
So, it’s safe to say that you can use a t-test.
As for choosing between the median and mean, there several considerations.
The general rule is that the median is better for skewed distributions while the mean is better for symmetric distributions. Your data seem to follow a normal distribution so probably use the mean.
Additionally, as I write about in this post, the Mann-Whitney U test only tests the median when the groups all have the same shaped distribution (which can be nonnormal). You’d need to assess that property to determine what the M-W test will tell you.
Given what you’ve said, using the mean and a t-test is probably a good choice.
Keisha says
I am trying to develop a project and I am struggling with finding a statistical test to use to evaluate the effectiveness on education to improve the use of a treatment option in a clinic among providers. If I’m using retrospective charts reviews to gather data before and after the intervention on if there was increase in the use of a treatment option, would the Mann Whitney test or the 2 sample t-tests be better?
Jim Frost says
Hi Keisha,
Unfortunately, from the limited information provided, I can’t determine which analysis is better.
Generally speaking, use nonparametric tests, such as the Mann Whitney test, when you have a small, nonnormal sample, have an ordinal outcome, or the median is a more relevant measure for your study area. Otherwise use a parametric test, such as the 2 sample t-test. Read my post about choosing between a nonparametric and parametric test for more information.
Because your study is a retrospective study, be sure understand the strengths and weaknesses of retrospective studies.