In statistics, the median is the value that splits an ordered list of data values in half. Half the values are below it and half are above—it’s right in the middle of the dataset. The median is the same as the second quartile or the 50th percentile. It is one of several measures of central tendency.
Related post: Percentiles: Interpretations and Calculations
How to Find the Median
Finding the median is easy. Simply sort your data from smallest to largest. Then find the value that has the same number of data points above it and below it.
Suppose the heights of five trees are 5, 5, 6, 7, and 8. The median tree height is 6 because it is the middle value. There are two values above it and two below it.
Finding the middle value differs somewhat depending on whether your data have an even or odd number of values. I’ll show you how to find it for both scenarios. In the following examples, I use integers for simplicity, but the data can have decimal places.
Odd number of data points
When a dataset has an odd number of observations, a middle value exists. This dataset has 13 values. Notice how the number 12 has six data points above it and six below. Consequently, 12 is the median.
Even number of data points
When your data have an even number of values, there is no single data point precisely in the middle. Instead, there is a central pair. In this case, count to the two central values and average them.
27 and 29 are the innermost values in this dataset because six values are above and below this pair. The average of these two values is 28. Consequently, 28 is the median.
When to Use the Median?
The median is less sensitive to skewed data and outliers than the mean. Extreme values pull the mean away from the center of the distribution, making it potentially misleading. It might not be near the most common values in the distribution.
For instance, the mean is not a good statistic for summarizing annual income because that is a right-skewed distribution. A few highly affluent people can increase the mean dramatically, giving a misleading view of yearly incomes. For this type of data, the median is more accurate.
To understand why outliers and skewed data affect the median less, consider the dataset below. Its median is 46. However, imagine we discover data entry errors and correct four values, which I shaded in the fixed dataset. I’ll make them all significantly larger, making it a skewed distribution with severe outliers.
As you can see, the median did not change. It’s still 46. Unlike the mean, it doesn’t depend on all values in the dataset. Therefore, when some values become more extreme, their effect on it is lessened.
Comparing the Mean and Median
Let’s compare the mean and median using symmetrical and skewed distributions to see how they perform.
In this symmetric distribution, both statistics locate the center correctly. They are approximately equal.
In this skewed distribution, the extreme values in the tail pull the mean from the center. It’s outside the area that contains the most typical values. Conversely, the median is near the most common values, which is appropriate when measuring central tendency.
This dataset describes U.S. household incomes for 2006 and illustrates why the mean is not appropriate for incomes. The two statistics differ by over $9,000. The mean overestimates typical household incomes.
For these data, the median indicates that half of all incomes are above $27,581, and half are below.
Statisticians consider the median to be a robust statistic, while the mean is sensitive to skewed distributions and outliers. To learn more, read my post, What are Robust Statistics?
To learn more about the other measures of central tendency, how they compare, and when to use each one, read my post: Measures of Central Tendency: Mean, Median, and Mode.