Percentiles indicate the percentage of scores that fall below a particular value. They tell you where a score stands relative to other scores. For example, a person with an IQ of 120 is at the 91^{st }percentile, which indicates that their IQ is higher than 91 percent of other scores.

Percentiles are a great tool to use when you need to know the relative standing of a value. Where does a value fall within a distribution of values? While the concept behind percentiles is straight forward, there are different mathematical methods for calculating them. In this post, learn about percentiles, special percentiles and their surprisingly flexible uses, and the various procedures for calculating them.

## Using Percentiles to Understand a Score or Value

Percentiles tell you how a value compares to other values. The general rule is that if value X is at the k^{th} percentile, then X is greater than K% of the values. Let’s see how this information can be helpful.

Often the units for raw test scores are not informative. When you obtain a score on the SAT, ACT, or GRE, the units are meaningless by themselves. A total SAT score of 1340 is not inherently meaningful. Instead, you really want to know the percentage of test takers that you scored better than. For the SAT, a total score of 1340 is approximately the 90^{th} percentile. Congratulations, you scored better than 90% of the other test takers. Only 10% scored better than you. Now that’s helpful!

Sometimes measurement units are meaningful, but you still would like to know the relative standing. For example, if your one-month-old baby weighs five kilograms, you might wonder how that weight compares to other babies. For a one-month old baby girl, that equates to the 77^{th} percentile. Your little girl weighs more than 77% of other girls her age, while 23% weigh more than her. You know right where she fits in with her cohort!

## Special Names and Uses for Percentiles

We give names to special percentiles. The 50^{th} percentile is the median. This value splits a dataset in half. Half the values are below the 50^{th} percentile, and half are above it. The median is a measure of central tendency in statistics.

Quartiles are values that divide your data into quarters, and they are based on percentiles.

- The first quartile, also known as Q1 or the lower quartile, is the value of the 25
^{th}percentile. The bottom quarter of the scores fall below this value, while three-quarters fall above it. - The second quartile, also known as Q2 or the median, is the value of the 50
^{th}percentile. Half the scores are above and half below. - The third quartile, also known as Q3 or the upper quartile, is the value of the 75% percentile. The top quarter of the scores fall above this value, while three-quarters fall below it.

The interquartile range (IQR) is a measure of dispersion in statistics. This range corresponds to the distance between the first quartile and the third quartile (IQR = Q3 – Q1). Larger IQRs indicate that the data are more spread out. The interquartile range represents the middle half of the data. One-quarter of the values fall below the IQR while another quarter of the values are above it.

Percentiles are surprisingly versatile because you can use them not only to obtain a relative standing, but also for dividing your dataset into portions, identifying the central tendency, and measuring the dispersion of a distribution.

**Related posts**: Measures of Central Tendency and Measures of Dispersion

## Calculating Percentiles Using Values in a Dataset

Percentile is a fairly common word. Surprisingly, there isn’t a single standard definition for it. Consequently, there are multiple methods for calculating percentiles. In this post, I cover four procedures. The first three are methods that analysts use to calculate percentiles when looking at the actual data values in relatively small datasets. These three definitions define the k^{th} percentile in the following different ways:

- The smallest value that is greater than k percent of the values.
- The smallest value that is greater than or equal to k percent of values.
- An interpolated value between the two closest ranks.

While the first two definitions might not seem drastically different, they can produce significantly different results, mainly when you are working with a small dataset. As you will see, this difference occurs because the first two definitions use different ranks that correspond to different scores. The third definition mitigates this concern by interpolating between two ranks to estimate a percentile value that falls between two values.

To calculate percentiles using these three approaches, start by ranking your dataset from the lowest to highest values.

Let’s use these three methods with the following dataset (n=11) to find the 70^{th} percentile.

### Definition 1: Greater Than

Using the first definition, we need to find the value that is greater than 70% of the values, and there are 11 values. Take 70% of 11, which is 7.7. Then, round 7.7 up to 8. Using the first definition, the value for the 70^{th} percentile must be greater than eight values. Consequently, we pick the 9^{th} ranked value in the dataset, which is 40.

### Definition 2: Greater Than or Equal To

Using the second definition, we need to find the value that is greater than or equal to 70% of the values. Thanks to the “equal to” portion of the definition, we can use the 8^{th} ranked value, which is 35.

Using the first two definitions, we have found two values for the 70% percentile—35 and 40.

### Definition 3: Using an Interpolation Approach

As you saw above, using either “greater” or “greater than or equal to” changes the results. Depending on the nature and size of your dataset, this difference can be substantial. Consequently, a third approach interpolates between two data values.

To calculate an interpolated percentile, do the following:

- Calculate the rank to use for the percentile. Use: rank = p(n+1), where p = the percentile and n = the sample size. For our example, to find the rank for the 70
^{th}percentile, we take 0.7*(11 + 1) = 8.4. - If the rank in step 1 is an integer, find the data value that corresponds to that rank and use it for the percentile.
- If the rank is not an integer, you need to interpolate between the two closest observations. For our example, 8.4 falls between 8 and 9, which corresponds to the data values of 35 and 40.
- Take the difference between these two observations and multiply it by the fractional portion of the rank. For our example, this is: (40 – 35)0.4 = 2.
- Take the lower-ranked value in step 3 and add the value from step 4 to obtain the interpolated value for the percentile. For our example, that value is 35 + 2 = 37.

Using three common calculations for percentiles, we find three different values for the 70^{th} percentile: 35, 37, and 40.

Next, I’ll show you one more method for calculating percentiles that does not directly use the values in the dataset.

## Using a Probability Distribution Function to Estimate Percentiles

If you know the probability distribution function (PDF) that a population of values follows, you can use the PDF to calculate percentiles. Perhaps the population follows the normal distribution? Or, you might have collected a sample and then identified the PDF that provides the best fit.

Read my post about identifying the distribution of your data. This approach identifies the population distribution that has the highest probability (i.e., maximum likelihood) of producing the distribution that you observe in a random sample from that population.

After you identify the distribution for your sample, you can use your statistical software to calculate the percentage of values in the distribution that falls below a value. I’ll use graphs to show two examples to make the ideas crystal clear. I’m using Minitab statistical software to generate these graphs. The data for one example follows a normal distribution while the other follows a skewed lognormal distribution. Both of these variables were collected from the same sample of middle school girls.

**Related post**: Understanding Probability Distribution Functions

### Using the Normal Distribution to Estimate Height Percentiles

Height tends to follow the normal distribution, which is the case for our sample data. The heights for this population follow a normal distribution with a mean of 1.512 meters and a standard deviation of 0.0741 meters. For normally distributed populations, you can use Z-scores to calculate percentiles. This method is convenient when you have only summary information about a sample and access to a table of Z-scores. I talk about Z-scores and show how to use them to calculate percentiles in my blog post about the Normal Distribution.

However, for this post, I’ll use the probability density function to calculate and graph the percentile. In this type of probability density plot, the proportion of the shaded area under the curve indicates the percentage of the distribution that falls within that range of values. For this graph, I shade the region that contains the lower 70% of the values, and the software calculates the height that corresponds with this percentage, which is the 70^{th} percentile.

The plot above shows that a height of 1.551 meters is at the 70^{th} percentile for this population of middle school girls.

### Using the Lognormal Distribution to Estimate Body Fat Percentiles

Not all data follows the normal distribution. In this vein, the body fat percentage data for the same sample are skewed. In my post about identifying the distribution of your data, I determined that these data follow a lognormal distribution with a location of 3.32317 and a scale of 0.24188.

The graph below clearly shows the right-skew. Below, I use the same process to calculate the 70^{th} percentile for body fat percentage as I did for height. I only need to specify the correct distribution for the software. Using this approach, we’re sure to factor in the skewness of our data when obtaining percentiles.

The plot above shows that having 31.5% body fat is at the 70^{th} percentile for this population of middle school girls.

Percentiles are a very intuitive way to understand where a value falls within a distribution of values. However, if you need to calculate a percentile, you’ll need to decide which method to use!

Nat Kitaw says

How would you calculate percentile rank for something with an underlying exponential distribution?

Jim Frost says

Hi Nat,

You can use any of the methods I discuss in this post. You can calculate the percentiles based on the values in your dataset using one of those three methods. Or, find the distribution that best fits your data (presumably the exponential distribution in your case) and use that to calculate percentiles. In this post, I use a lognormal distribution to illustrate this method, but you can use the exponential distribution.

suttonfelty says

Why are there different percentile calculations?

Jim Frost says

Hi, there are several different reasons. For one thing, you’re starting with slightly different definitions. For whatever reason, there’s not one standard definition. The calculations depend on how you define it (greater than versus greater than or equal to). This problem is exacerbated with smaller datasets where the difference in definition has a larger impact on the end result. There’s also the fact that you can calculate percentiles for values in a dataset or you can use probability distributions to calculate percentiles based on estimates of the population parameters. In short, there are different calculations because of different definitions and different goals (i.e., for values in a dataset vs. for a population).

Appy says

How do I calculate the percentile ranks for data where a lower score means better performance?

Jim Frost says

Hi Appy,

What you need to do is start by ranking the scores accordingly. Put the higher data values with lower ranks and lower data values with higher ranks. The opposite of what I show in this post. Then, where I talk about values being “greater than,” you need to substitute “less than.” I believe with those change you can proceed as I show in this post.

When you report the results, be sure to clarify how you’re using percentiles in this manner. For example, “70% of the scores are worse than X, where high values indicate worse performance.” Something like that because I think it would be easy to get confused given the normal usage of percentiles.

I hope this helps!

Takele says

nice and clear. tell us about logistic and Bayesian analysis

Jim Frost says

Thanks! I’d like to address those in future posts. So many potential topics to cover!