A skewed distribution occurs when one tail is longer than the other. Skewness defines the asymmetry of a distribution. Unlike the familiar normal distribution with its bell-shaped curve, these distributions are asymmetric. The two halves of the distribution are not mirror images because the data are not distributed equally on both sides of the distribution’s peak.
People are sometimes less comfortable with asymmetrical distributions, but they are a fact of life in some subject areas. They have logical reasons for occurring, such as when natural limits skew the results away from the boundary. We’ll get to that shortly.
In this post, learn about left and right-skewed distributions, how to tell the differences in histograms and boxplots, the implications of these distributions, why they occur, and how to analyze them.
How to Tell if a Distribution is Left Skewed or Right Skewed
Let’s start by contrasting characteristics of the symmetrical normal distribution with skewed distributions.
The normal distribution has a central peak where most observations occur, and the probability of events tapers off equally in both the positive and negative directions on the X-axis. Both halves contain equal numbers of observations. Unusual values are equally likely in both tails.
However, that’s not the case with asymmetrical distributions where probabilities decrease more slowly in one direction relative to the other. In other words, extreme values that fall further away from the peak are more likely to occur in one tail than the other. That’s why you’ll hear about left and right-skewed distributions, also known as negatively and positively skewed distributions.
Right skewed distributions occur when the long tail is on the right side of the distribution. Analysts also refer to them as positively skewed. This condition occurs because probabilities taper off more slowly for higher values. Consequently, you’ll find extreme values far from the peak on the high end more frequently than on the low.
Left skewed distributions occur when the long tail is on the left side of the distribution. Statisticians also refer to them as negatively skewed. This condition occurs because probabilities taper off more slowly for lower values. Therefore, you’ll find extreme values far from the peak on the low side more frequently than the high side.
The crucial point to keep in mind is that the direction of the long tail defines the skew because it indicates where you’ll find the majority of exceptional values.
Related post: Normal Distribution
What Skewed Distributions Look Like in Graphs
Identifying asymmetric distributions is straightforward in graphs. It’s just a matter of finding the longer tail. Let’s see how to do that in histograms and boxplots. Here’s what they look like in graphs.
The two histograms below display right and left-skewed distributions. Histograms make it easy to see the longer tails. You can also see these characteristics in the similar stem and leaf plot.
In boxplots, you’ll need to look more closely than in histograms, but you can still identify the asymmetry. I use the same data in the boxplots as I do for the histograms so you can compare them.
You have a symmetrical distribution when the box centers around the median line and the upper and lower whiskers have approximately equal lengths.
When the median is closer to the box’s lower values and the upper whisker is longer, it’s a right-skewed distribution. Notice how the longer tail extends into the higher values.
When the median is closer to the box’s higher values and the lower whisker is longer, it’s a left-skewed distribution. Notice that the longer tail extends towards the lower values.
Skewed Distributions and the Mean, Median, and Mode
The mean, median, and mode are all equal in the normal distribution and other symmetric distributions.
However, when you have a skewed distribution, it affects the relationship between these measures of central tendency. The mean is sensitive to extreme values. Consequently, the longer tail in an asymmetrical distribution pulls the mean away from the most common values.
The graphs below shows how these measures compare in different distributions.
Right-skewed: The mean is greater than the median. The mean overestimates the most common values.
Left-skewed: The mean is less than the median. The mean underestimates the most common values.
Because the mean over or underestimates the most frequently occurring values in skewed distributions, analysts often use the median in these cases. The median is a more robust statistic in the presence of extreme values.
Examples of Right-Skewed Distributions
Right-skewed distributions are the more common form. These distributions tend to occur when there is a lower limit, and most values are relatively close to the lower bound. Values can’t be less than this bound but can fall far from the peak on the high end, causing them to skew positively.
For example, right-skewed distributions can occur in the following cases:
- Time to failure cannot be less than zero, but there is no upper bound.
- Wait and response times cannot be less than zero, but there are no upper limits.
- Sales data cannot be less than zero but can have unusually large values.
- Humans have a minimum viable weight but can have large extreme values.
- Income cannot be less than zero, but there are some extremely high incomes.
For example, income and wealth are classic examples of right-skewed distributions. Most people earn a modest amount, but some millionaires and billionaires extend the right tail into very high values. Meanwhile, the left tail cannot be less than zero. This situation creates a positive skew. Consequently, reports frequently refer to median incomes because the mean overestimates the most common values.
These data are based on the U.S. household income for 2006. Notice how the mean is greater than the median.
Examples of Left-Skewed Distributions
Left-skewed distributions occur less frequently than their right handed counterparts, but they exist. Frequently, they occur when there is an upper limit that values cannot exceed, and most scores are near that limit. Values can’t exceed the cap, but they can extend relatively far from the peak on the lower side, causing a negative skew.
For example, left-skewed distributions can occur in the following cases:
- Purity cannot exceed 100%, but there is room on the low side for extreme values.
- Maximum test scores cannot exceed 100%.
- Ages of death tend to occur around 70-80. It’s possible to live a little longer, but extreme values are more likely to appear on the lower end.
Skewed Probability Distributions and Hypothesis Tests
When data are skewed, they do not follow a normal distribution. You might need to use a distribution test to identify the distribution of your data. The following probability distributions are skewed:
Click the links to learn more about why those distributions are skewed and the properties they can model.
Many hypothesis tests assume your data follow the normal distribution. However, many are valid with non-normal distributions when your sample size is large enough. You can thank the central limit theorem!
However, when you have a skewed distribution, the median might be a better measure. To learn about hypothesis tests for the mean and median and when to use each type, read my post, Parametric vs. Nonparametric Tests.