Descriptive and inferential statistics are two broad categories in the field of statistics. In this blog post, I show you how both types of statistics are important for different purposes. Interestingly, some of the statistical measures are similar, but the goals and methodologies are very different.

## Descriptive Statistics

Use descriptive statistics to summarize and graph the data for a group that you choose. This process allows you to understand that specific set of observations.

Descriptive statistics describe a sample. That’s pretty straightforward. You simply take a group that you’re interested in, record data about the group members, and then use summary statistics and graphs to present the group properties. With descriptive statistics, there is no uncertainty because you are describing only the people or items that you actually measure. You’re not trying to infer properties about a larger population.

The process involves taking a potentially large number of data points in the sample and reducing them down to a few meaningful summary values and graphs. This procedure allows us to gain more insights and visualize the data than simply pouring through row upon row of raw numbers!

### Common tools of descriptive statistics

Descriptive statistics frequently use the following statistical measures to describe groups:

**Central tendency**: Use the mean or the median to locate the center of the dataset. This measure tells you where most values fall.

**Dispersion**: How far out from the center do the data extend? You can use the range or standard deviation to measure the dispersion. A low dispersion indicates that the values cluster more tightly around the center. Higher dispersion signifies that data points fall further away from the center. We can also graph the frequency distribution.

**Skewness**: The measure tells you whether the distribution of values is symmetric or skewed.

You can present this summary information using both numbers and graphs. These are the standard descriptive statistics, but there are other descriptive analyses you can perform, such as assessing the relationships of paired data using correlation and scatterplots.

### Example of descriptive statistics

Suppose we want to describe the test scores in a specific class of 30 students. We record all of the test scores and calculate the summary statistics and produce graphs. Here is the CSV data file: Descriptive_statistics.

Statistic | Class value |

Mean | 79.18 |

Range | 66.21 – 96.53 |

Proportion >= 70 | 86.7% |

These results indicate that the mean score of this class is 79.18. The scores range from 66.21 to 96.53, and the distribution is symmetrically centered around the mean. A score of at least 70 on the test is acceptable. The data show that 86.7% of the students have acceptable scores.

Collectively, this information gives us a pretty good picture of this specific class. There is no uncertainty surrounding these statistics because we gathered the scores for everyone in the class. However, we can’t take these results and extrapolate to a larger population of students.

We’ll do that later.

## Inferential Statistics

Inferential statistics takes data from a sample and makes inferences about the larger population from which the sample was drawn. Because the goal of inferential statistics is to draw conclusions from a sample and generalize them to a population, we need to have confidence that our sample accurately reflects the population. This requirement affects our process. At a broad level, we must do the following:

- Define the population we are studying.
- Draw a representative sample from that population.
- Use analyses that incorporate the sampling error.

We don’t get to pick a convenient group. Instead, random sampling allows us to have confidence that the sample represents the population. This process is a primary method for obtaining samples that mirrors the population on average. Random sampling produces statistics, such as the mean, that do not tend to be too high or too low. Using a random sample, we can generalize from the sample to the broader population. Unfortunately, gathering a truly random sample can be a complicated process.

### Pros and cons of working with samples

You gain tremendous benefits by working with a random sample drawn from a population. In most cases, it is simply impossible to measure the entire population to understand its properties. The alternative is to gather a random sample and then use the methodologies of inferential statistics to analyze the sample data.

While samples are much more practical and less expensive to work with, there are tradeoffs. Typically, we learn about the population by drawing a relatively small sample from it. We are a very long way off from measuring all people or objects in that population. Consequently, when you estimate the properties of a population from a sample, the sample statistics are unlikely to equal the actual population value exactly.

For instance, your sample mean is unlikely to equal the population mean exactly. The difference between the sample statistic and the population value is the sampling error. Inferential statistics incorporate estimates of this error into the statistical results.

In contrast, summary values in descriptive statistics are straightforward. The average score in a specific class is a known value because we measured all individuals in that class. There is no uncertainty.

### Standard analysis tools of inferential statistics

The most common methodologies in inferential statistics are hypothesis tests, confidence intervals, and regression analysis. Interestingly, these inferential methods can produce similar summary values as descriptive statistics, such as the mean and standard deviation. However, as I’ll show you, we use them very differently when making inferences.

### Hypothesis tests

Hypothesis tests use sample data answer questions like the following:

- Is the population mean greater than or less than a particular value?
- Are the means of two or more populations different from each other?

For example, if we study the effectiveness of a new medication by comparing the outcomes in a treatment and control group, hypothesis tests can tell us whether the drug’s effect that we observe in the sample is likely to exist in the population. After all, we don’t want to use the medication if it is effective only in our specific sample. Instead, we need evidence that it’ll be useful in the entire population of patients. Hypothesis tests allow us to draw these types of conclusions about entire populations.

**Related post**: Statistical Hypothesis Testing Overview

### Confidence intervals (CIs)

In inferential statistics, a primary goal is to estimate population parameters. These parameters are the unknown values for the entire population, such as the population mean and standard deviation. These parameter values are not only unknown but almost always unknowable. Typically, it’s impossible to measure an entire population. The sampling error I mentioned earlier produces uncertainty, or a margin of error, around our estimates.

Suppose we define our population as all high school basketball players. Then, we draw a random sample from this population and calculate the mean height of 181 cm. This sample estimate of 181 cm is the best estimate of the mean height of the population. However, it’s virtually guaranteed that our estimate of the population parameter is not exactly correct.

Confidence intervals incorporate the uncertainty and sample error to create a range of values the actual population value is like to fall within. For example, a confidence interval of [176 186] indicates that we can be confident that the real population mean falls within this range.

**Related post**: Understanding Confidence Intervals

### Regression analysis

Regression analysis describes the relationship between a set of independent variables and a dependent variable. This analysis incorporates hypothesis tests that help determine whether the relationships observed in the sample data actually exist in the population.

For example, the fitted line plot below displays the relationship in the regression model between height and weight in adolescent girls. Because the relationship is statistically significant, we have sufficient evidence to conclude that this relationship exists in the population rather than just our sample.

**Related post**: When Should I Use Regression Analysis?

### Example of inferential statistics

For this example, suppose we conducted our study on test scores for a specific class as I detailed in the descriptive statistics section. Now we want to perform an inferential statistics study for that same test. Let’s assume it is a standardized statewide test. By using the same test, but now with the goal of drawing inferences about a population, I can show you how that changes the way we conduct the study and the results that we present.

In descriptive statistics, we picked the specific class that we wanted to describe and recorded all of the test scores for that class. Nice and simple. For inferential statistics, we need to define the population and then draw a random sample from that population.

Let’s define our population as 8^{th}-grade students in public schools in the State of Pennsylvania in the United States. We need to devise a random sampling plan to help ensure a representative sample. This process can actually be arduous. For the sake of this example, assume that we are provided a list of names for the entire population and draw a random sample of 100 students from it and obtain their test scores. Note that these students will not be in one class, but from many different classes in different schools across the state.

### Inferential statistics results

For inferential statistics, we can calculate the point estimate for the mean, standard deviation, and proportion for our random sample. However, it is staggeringly improbable that any of these point estimates are exactly correct, and there is no way to know for sure anyway. Because we can’t measure all subjects in this population, there is a margin of error around these statistics. Consequently, I’ll report the confidence intervals for the mean, standard deviation, and the proportion of satisfactory scores (>=70). Here is the CSV data file: Inferential_statistics.

Statistic | Population Parameter Estimate (CIs) |

Mean | 77.4 – 80.9 |

Standard deviation | 7.7 – 10.1 |

Proportion scores >= 70 | 77% – 92% |

Given the uncertainty associated with these estimates, we can be 95% confident that the population mean is between 77.4 and 80.9. The population standard deviation (a measure of dispersion) is likely to fall between 7.7 and 10.1. And, the population proportion of satisfactory scores is expected to be between 77% and 92%.

## Differences between Descriptive and Inferential Statistics

As you can see, the difference between descriptive and inferential statistics lies in the process as much as it does the statistics that you report.

For descriptive statistics, we choose a group that we want to describe and then measure all subjects in that group. The statistical summary describes this group with complete certainty (outside of measurement error).

For inferential statistics, we need to define the population and then devise a sampling plan that produces a representative sample. The statistical results incorporate the uncertainty that is inherent in using a sample to understand an entire population.

A study using descriptive statistics is simpler to perform. However, if you need evidence that an effect or relationship between variables exists in an entire population rather than only your sample, you need to use inferential statistics.

Sol says

Many thanks for this post. You’re a godsend. Have you authored any books?

Jim Frost says

Hi Sol, You’re very welcome! ðŸ™‚ And, that’s a timely question. I’m working on my first book at the moment!

Carlo Lauro says

Very useful presentation of the topic. What about their use in big data analysis?

ANN MARY CHACKO says

Thank you Jim for making things simpler and better. I am Ann, PhD Scholar from India

Jim Frost says

Hi Ann, you’re very welcome! I’m so glad that you find my posts to be helpful! I love India! I’ve been there several times!

Jerry Tuttle says

I have seen definitions of sample standard deviation in social science textbooks using an n denominator for descriptive statistics and an n-1 for inferential statistics. I have never seen a math book using the n denominator for descriptive. Any comment on why the social science world goes off on a different direction here?

Jim Frost says

Hi Jerry, I donâ€™t know why social science takes that route. I can tell you that in statistics the correct formula to use for standard deviation depends on whether the data are the entire group or population or a sample from a larger population.

When the data are the entire group (descriptive statistics), the denominator is n. However, if you are using a sample to estimate the value of a population (inferential), you use n-1. This is because you need to account for the degrees of freedom that you use for the estimate.

Aayush says

Hello sir, l want to know that what is the need of interval estimation while already we have point estimation?

Jim Frost says

Hi Aayush, that is a great question! I talk about this in the Example of Inferential Statistics section. It is possible to calculate the point estimate for the population. However, it’s virtually guaranteed that this estimate is wrong by some amount. So, the question becomes, how far off is the point estimate likely to be?

Confidence intervals answer this question. The narrower the intervals, the more precise the estimate. With narrow intervals, you can be reasonably sure that the point estimate isn’t too far wrong. However, if the CI is wide, you know that you shouldn’t expect the point estimate to be too near the true value. In that case, don’t place to much confidence in the point estimate! Interval estimation provides additional information about the precision of the point estimate.

I hope this helps clarify things!

rama krishna reddy says

I am a data scientist,i enjoy while going through your articles.thank you jim.

Jim Frost says

Hi Rama, I’m glad that you find my posts to be helpful!

daboo says

thank u so much continuously i need such brief explanation about statistics therefore i need another material specially about Bayesian distribution b/c i.m post graduate class a thesis on maternal mortality approach of bayesian model

Anandaraj says

Very good one. Explains the basics well. Thanks

Evelyn says

Just discovered this website today very helpful. Thank you Jim..

Jim Frost says

Hi Evelyn, thank you for you kind words! I’m glad you found it to be helpful!

Carlo Lauro says

Still waiting for your reply

Jim Frost says

Hi Carlo, that’s a very broad question–I could write an entire book about that topic. Is there something more specific you want to know?