What is a Trimmed Mean?
The trimmed mean is a statistical measure that calculates a dataset’s average after removing a certain percentage of extreme values from both ends of the distribution. By excluding outliers, this statistic can provide a more accurate representation of a dataset’s typical or central values. Usually, you’ll trim a percentage of values, such as 10% or 20%.
For example, a 10% trimmed mean excludes the highest 10% of values and the lowest 10%. In other words, it uses the middle 80%.
When summarizing a dataset, the mean is often the go-to statistic. It’s simple to calculate, giving us a quick idea of our data’s “average” value. However, outliers can significantly distort the mean, causing it to misrepresent the typical value.
The trimmed mean helps us tame outliers and obtain a robust measure of central tendency. By removing extreme values, this statistic can better represent typical dataset values.
A famous example of a trimmed mean occurs in Olympic figure skating, where officials remove the highest score and lowest score. This method helps limit the effects of a biased judge. Statisticians also refer to removing only the minimum and maximum values as the modified mean.
Trimming the lowest 25 percent and the highest 25 percent of the dataset produces the interquartile mean—the average of the middle half of the dataset.
The median is an extreme form of a trimmed mean because it removes all values except one or two, depending on whether the dataset contains an odd or even number of values. Learn more about the Median Definition and Uses.
For comparison, a winsorized mean takes extreme values and replaces them with less extreme values instead of removing them like a trimmed mean.
In this post, learn how to calculate the trimmed mean, work through an example, see how it improves statistical analyses, and get practical usage tips.
Related post: What are Robust Statistics?
Step-by-Step Guide to Calculate the Trimmed Mean
Follow these steps to find the trimmed mean:
- Sort the dataset: Arrange the data in ascending order to facilitate trimming.
- Determine the percentage of values to trim: Choose the percentage of extreme values you want to exclude from each end of the dataset.
- Calculate the number of observations to trim: Multiply the percentage by the total number of observations. Round the result to the nearest integer to determine how many observations you must discard from each end.
- Trim the dataset: Remove the designated number of observations from both ends of the sorted dataset.
- Calculate the trimmed mean: Add the values and divide by the number of remaining observations.
Excel provides a built-in formula for the trimmed mean: TRIMMEAN. To use this function, enter the range of cells containing your data and the percentage of values to trim. However, note that Excel’s trimming percentage definition differs from the standard statistical definition. The percentage in the standard definition relates to the amount of data removed from each individual side of the distribution. In contrast, Excel’s percentage refers to the total amount removed from both sides.
For example, suppose your data is in cells A1:A10, and you want to calculate the 20% trimmed mean. In Excel, you need to double the percentage to 40% so it can take 20% off each side of the distribution. The formula is the following:
Example of Calculating a Trimmed Mean
Suppose we have the following dataset of 10 values:
2, 3, 4, 5, 7, 8, 9, 10, 12, 15
Let’s find the 20% trimmed mean. Because there are ten values, we need to remove the smallest two values (2 and 3) and the largest two values (12 and 15), leaving us with the following six values:
4, 5, 7, 8, 9, 10
Then we take the mean of these six values to find the trimmed mean, which is 7.1667.
Practical Applications of the Trimmed Mean
Now that we understand its importance let’s explore some real-world scenarios where the trimmed mean proves its mettle.
Financial Analysis: When examining stock returns, extreme values (such as unusually high or low returns) can skew the mean, potentially leading to misinterpretation. By applying the trimmed mean, we obtain a more reliable estimate of the central tendency of returns.
Education Assessment: In an examination where the scores of a few students are significantly higher or lower than the rest, the trimmed mean helps assess the overall performance by reducing the impact of outliers.
Retail Pricing Analysis: In the world of retail, pricing strategies play a crucial role in determining profitability and customer satisfaction. When analyzing price data, outliers can arise due to occasional promotions, errors, or unique product features. By applying the trimmed mean, retailers can obtain a more accurate representation of the typical price, enabling them to make informed decisions about setting competitive prices and maximizing revenue.
Climate Studies: Understanding climate patterns and trends is essential for predicting future weather conditions, assessing environmental impact, and formulating mitigation strategies. However, climate datasets often contain extreme values due to weather anomalies, rare events, or measurement errors. These outliers can distort statistical measures like the mean, hindering accurate trend analysis.
By harnessing the power of the trimmed mean in these diverse fields, we can delve deeper into our data, unraveling meaningful insights while reducing the influence of outliers.
When using the trimmed mean in statistical analyses, discarding between 5 to 25 percent of the dataset’s extreme values is common. But other percentages are possible. As you remove more of the dataset, the trimmed mean becomes more robust to outliers.
However, robustness comes with a tradeoff because you use less of the original data, reducing your effective sample size and information about the original data. While outliers can be problematic, they sometimes represent legitimate variability in your subject matter. Inappropriately removing data points can distort your understanding of the phenomenon you’re studying. For more information, read my Guidelines for Handing Outliers.
Typically, the goal for using a trimmed mean is to minimize the standard error of a dataset containing outliers and small deviations from normality. Reducing the standard error increases the precision of the estimate and the statistical power of hypothesis tests.
In this context, trimmed means help you navigate a tradeoff between the regular mean and the median.
On the one hand, the regular mean provides optimal performance with normal distributions and no outliers. On the other hand, the median provides better performance for datasets with numerous outliers and highly skewed distributions.
Trimmed means provide an effective compromise for scenarios falling between these two conditions where neither the regular mean nor the median are optimal. Based on computer simulations, Wilcox and Keselman (2003) suggest that a 20% trim is a good default choice for minimizing the standard error in these cases.
Use Yuen’s t-test to evaluate trimmed means. This test can handle outliers and nonconstant variance.
Wilcox, R. R., Keselman, H. J. (2003), Modern Robust Data Analysis Methods: Measures of Central Tendency, Psychological Methods, Vol. 8, No. 3, 254—274.
I got it. Thank you very much!
I am still confused. Does the recommended 20% of trimmed mean involve the formula TRIMMEAN(A1:A10;0,4) or the formula TRIMMEAN(A1:A10;0,2)?
Or… Does Wilcox recommend disregard 20% (>Q90 and Q80 and <Q20) of data?
Thank you and congrats Jim!
Jim Frost says
The recommended 20% corresponds to the standard definition of the trimmed mean where you’ll trim the upper 20% and the lower 20%. However Microsoft confuses the issue (why oh why, Microsoft?!) with its nonstandard formula.
So, yes, the recommended 20% trim corresponds to Excel’s formula: TRIMMEAN(A1:A10;0,4)
I hope that helps!
Thank you for this highly informative article. I have 2 questions that the article deals with but I still have some perplexities.
Question #1: when should I prefer the trimmer mean rather than the median? From what I understand, the trimmed mean is a compromise between the mean, which is good for normal distributions and little outliers, and the median, which is good for highly skewed distributions and many outliers. How do I determine if my distribution is highly skewed, or it is just a little skewed? Does it exists a measure of skewed-ness (pardon my terrible english)? And the same for outliers.
Question #2: you say that the default % to trim is 20%. Is there a rule of thumbs to determine the optimal % to trim? I suppose this % could depend on the skewness measure or an outlier-based measure.
Thank you very much and keep on the good work!
Jim Frost says
Good questions. If you have a highly skewed distribution and/or many outliers, use the median. If you have a smaller departure from the normal distribution and/or a smaller number of outliers, consider using the trimmed mean. You’ll need to graph your data and look for outliers to determine which case applies to your data.
As for the 20% trimming, that comes from simulation studies. See the reference I provide for more information about that. The author says that 20% is optimal for a wide range of cases. He also suggests that if you have a smaller dataset or more severe skew/outliers (but still not using the median), you might increase it to 25%. The source article that I reference discusses in a bit more detail and also references other studies if you really need a deep dive on it.
Ian Veldman says
TRIMMEAN(A1:A10;0,2) only exclude 20 % of numbers (in the example, two in total).
To exclude two low and two high, you need to use TRIMMEAN(A1:A10;0,4)
Jim Frost says
Hi Ian, thanks for catching that! Leave it to Microsoft to use a different definition for the percentage of a trimmed mean than everyone else. You’re correct, to get a 20% trimmed mean by the standard definition, you’d need to use the Excel formula you indicate. I’ll change the blog post to reflect that too.