What is a Skewed Distribution?
A skewed distribution occurs when one tail is longer than the other. Skewness defines the asymmetry of a distribution. Unlike the familiar normal distribution with its bell-shaped curve, these distributions are asymmetric. The two halves of the distribution are not mirror images because the data are not distributed equally on both sides of the distribution’s peak.
People are sometimes less comfortable with asymmetrical distributions, but they are a fact of life in some subject areas. They have logical reasons for occurring, such as when natural limits skew the results away from the boundary. We’ll get to that shortly.
In this post, learn about left and right skewed distributions, how to tell the differences in histograms and boxplots, the implications of these distributions, why they occur, and how to analyze them.
How to Tell if a Distribution is Left Skewed or Right Skewed
Let’s start by contrasting characteristics of the symmetrical normal distribution with skewed distributions.
Symmetric
The normal distribution has a central peak where most observations occur, and the probability of events tapers off equally in both the positive and negative directions on the X-axis. Both halves contain equal numbers of observations. Unusual values are equally likely in both tails.
However, that’s not the case with asymmetrical distributions where probabilities decrease more slowly in one direction relative to the other. In other words, extreme values that fall further away from the peak are more likely to occur in one tail than the other. That’s why you’ll hear about left and right skewed distributions, also known as negatively and positively skewed distributions.
Right Skewed (Positively Skewed)
Right skewed distributions occur when the long tail is on the right side of the distribution. Analysts also refer to them as positively skewed. This condition occurs because probabilities taper off more slowly for higher values. Consequently, you’ll find extreme values far from the peak on the high end more frequently than on the low.
Left-Skewed (Negatively Skewed)
Left skewed distributions occur when the long tail is on the left side of the distribution. Statisticians also refer to them as negatively skewed. This condition occurs because probabilities taper off more slowly for lower values. Therefore, you’ll find extreme values far from the peak on the low side more frequently than the high side.
The crucial point to keep in mind is that the direction of the long tail defines the skew because it indicates where you’ll find the majority of exceptional values.
Related post: Normal Distribution
What Skewed Distributions Look Like in Graphs
Identifying asymmetric distributions is straightforward in graphs. It’s just a matter of finding the longer tail. Let’s see how to do that in histograms and boxplots. Here’s what they look like in graphs.
Histograms
The two histograms below display asymmetric distributions. Histograms make it easy to see the longer tails. You can also see these positively and negatively skewed characteristics in the similar stem and leaf plot.
Boxplots
In boxplots, you’ll need to look more closely than in histograms, but you can still identify the asymmetry. I use the same data in the boxplots as I do for the histograms so you can compare them.
You have a symmetrical distribution when the box centers around the median line and the upper and lower whiskers have approximately equal lengths.
When the median is closer to the box’s lower values and the upper whisker is longer, it’s a right skewed distribution. Notice how the longer tail extends into the higher values, making it positively skewed.
When the median is closer to the box’s higher values and the lower whisker is longer, it’s a left skewed distribution. Notice that the longer tail extends towards the lower values, making it negatively skewed.
Related posts: Using Histograms to Understand Your Data and Box Plots Explained with Examples
Skewed Distributions and the Mean, Median, and Mode
The mean, median, and mode are all equal in the normal distribution and other symmetric distributions.
However, when you have a asymmetric distribution, it affects the relationship between these measures of central tendency. The mean is sensitive to extreme values. Consequently, the longer tail in an asymmetrical distribution pulls the mean away from the most common values.
The graphs below shows how these measures compare in different distributions.
Right skewed: The mean is greater than the median. The mean overestimates the most common values in a positively skewed distribution.
Left skewed: The mean is less than the median. The mean underestimates the most common values in a negatively skewed distribution.
Because the mean over or underestimates the most frequently occurring values in asymmetric distributions, analysts often use the median in these cases. The median is a more robust statistic in the presence of extreme values.
Related post: What are Robust Statistics?
Examples of Right-Skewed Distributions
Right skewed distributions are the more common form. These distributions tend to occur when there is a lower limit, and most values are relatively close to the lower bound. Values can’t be less than this bound but can fall far from the peak on the high end, causing them to skew positively.
For example, right skewed distributions can occur in the following cases:
- Time to failure cannot be less than zero, but there is no upper bound.
- Wait and response times cannot be less than zero, but there are no upper limits.
- Sales data cannot be less than zero but can have unusually large values.
- Humans have a minimum viable weight but can have large extreme values.
- Income cannot be less than zero, but there are some extremely high incomes.
For example, income and wealth are classic examples of right skewed distributions. Most people earn a modest amount, but some millionaires and billionaires extend the right tail into very high values. Meanwhile, the left tail cannot be less than zero. This situation creates a positive skew. Consequently, reports frequently refer to median incomes because the mean overestimates the most common values.
These data are based on the U.S. household income for 2006. Notice how the mean is greater than the median.
To learn more about incomes and their right skewed distributions, read my post about Global Income Distributions.
Examples of Left-Skewed Distributions
Left skewed distributions occur less frequently than their right-handed counterparts, but they exist. Frequently, they occur when there is an upper limit that values cannot exceed, and most scores are near that limit. Values can’t exceed the cap, but they can extend relatively far from the peak on the lower side, causing a negative skew.
For example, left skewed distributions can occur in the following cases:
- Purity cannot exceed 100%, but there is room on the low side for extreme values.
- Maximum test scores cannot exceed 100%.
- Ages of death tend to occur around 70-80. It’s possible to live a little longer, but extreme values are more likely to appear on the lower end.
Skewed Probability Distributions and Hypothesis Tests
When data are asymmetrical, they cannot follow a normal distribution. You might need to use a distribution test to identify the distribution of your data. The following probability distributions are skewed:
Click the links to learn more about why those distributions are asymmetrical and the properties they can model.
Many hypothesis tests assume your data follow the normal distribution. However, many are valid with non-normal distributions when your sample size is large enough. You can thank the central limit theorem!
However, when you have an asymmetrical distribution, the median might be a better measure. To learn about hypothesis tests for the mean and median and when to use each type, read my post, Parametric vs. Nonparametric Tests.
WILLIAM K SLOUGH says
I’m a residential appraiser. Regarding residential site value, I picture the incremental value of additional square footage to look like a skewed distribution. The difference between 1 and 1000 is negligible since the property could not be developed. At some point (can be developed) additional square footage is highly valuable. Then as one moves out on the curve, the incremental value of surplus land diminishes. I’d like to apply a formula for comparing residential sites based on area. How do I do that?
Lucid Gen says
I have started learning statistics, thank for your sharing.
Jim Frost says
You’re very welcome!
Victor E. Ismine says
Who in there right mind would use a box plot, in its attempt to simplify it actually does the opposite.
Jim Frost says
Hi Victor,
Box plots are great graphs for summarizing general properties about distributions.
As with all tools, they’re better in some situations than others. The trick is to know when to use each tool.
Box plots are particularly useful when you have a large amount of data and want to compare distributions. If you have a small dataset or are looking at an individual distribution, a histogram would be better.
For more information about when and how to use them, read my post about Box Plots Explained with Examples. That should help clear things up.
Aliko Mwaigomole says
nice book
Mohamed says
Thank you.
Your way is simple and great.
Mohd Shehzoor Hussain says
Hi Jim, why is it recommended to use percentiles for skewed data? Can we use any other statistical methodology for skewed data?