Blog

Statistical Inference: Definition, Methods & Example

What is Statistical Inference?

Statistical inference is the process of using a sample to infer the properties of a population. Statistical procedures use sample data to estimate the characteristics of the whole population from which the sample was drawn.

Scientists typically want to learn about a population. When studying a phenomenon, such as the effects of a new medication or public opinion, understanding the results at a population level is much more valuable than understanding only the comparatively few participants in a study.

Unfortunately, populations are usually too large to measure fully. Consequently, researchers must use a manageable subset of that population to learn about it.

By using procedures that can make statistical inferences, you can estimate the properties and processes of a population. More specifically, sample statistics can estimate population parameters. Learn more about the differences between sample statistics and population parameters.

For example, imagine that you are studying a new medication. As a scientist, you’d like to understand the medicine’s effect in the entire population rather than just a small sample. After all, knowing the effect on a handful of people isn’t very helpful for the larger society!

Consequently, you are interested in making a statistical inference about the medicine’s effect in the population.

Read on to see how to do that! I’ll show you the general process for making a statistical inference and then cover an example using real data.

How to Make Statistical Inferences

In its simplest form, the process of making a statistical inference requires you to do the following:

Draw a sample that adequately represents the population.
Measure your variables of interest.
Use appropriate statistical methodology to generalize your sample results to the population while accounting for sampling error.

Of course, that’s the simple version. In real-world experiments, you might need to form treatment and control groups, administer treatments, and reduce other sources of variation. In more complex cases, you might need to create a model of a process. There are many details in the process of making a statistical inference! Learn how to incorporate statistical inference into scientific studies.

Statistical inference requires using specialized sampling methods that tend to produce representative samples. If the sample does not look like the larger population you’re studying, you can’t trust any inferences from the sample. Consequently, using an appropriate method to obtain your sample is crucial. The best sampling methods tend to produce samples that look like the target population. Learn more about Sampling Methods and Representative Samples.

After obtaining a representative sample, you’ll need to use a procedure that can make statistical inferences. While you might have a sample that looks similar to the population, it will never be identical to it. Statisticians refer to the differences between a sample and the population as sampling error. Any effect or relationship you see in your sample might actually be sampling error rather than a true finding. Inferential statistics incorporate sampling error into the results. Learn more about Sampling Error.

Common Inferential Methods

The following are four standard procedures than can make statistical inferences.

Hypothesis Testing: Uses representative samples to assess two mutually exclusive hypotheses about a population. Statistically significant results suggest that the sample effect or relationship exists in the population after accounting for sampling error.
Confidence Intervals: A range of values likely containing the population value. This procedure evaluates the sampling error and adds a margin around the estimate, giving an idea of how wrong it might be.
Margin of Error: Comparable to a confidence interval but usually for survey results.
Regression Modeling: An estimate of the process that generates the outcomes in the population.

Example Statistical Inference

Let’s look at a real flu vaccine study for an example of making a statistical inference. The scientists for this study want to evaluate whether a flu vaccine effectively reduces flu cases in the general population. However, the general population is much too large to include in their study, so they must use a representative sample to make a statistical inference about the vaccine’s effectiveness.

The Monto et al. study* evaluates the 2007-2008 flu season and follows its participants from January to April. Participants are 18-49 years old. They selected ~1100 participants and randomly assigned them to the vaccine and placebo groups. After tracking them for the flu season, they record the number of flu infections in each group, as shown below.

Treatment	Flu count	Group size	Percent infections
Placebo	35	325	10.8%
Vaccine	28	813	3.4%
Effect			7.4%

Monto Study Findings

From the table above, 10.8% of the unvaccinated got the flu, while only 3.4% of the vaccinated caught it. The apparent effect of the vaccine is 10.8% – 3.4% = 7.4%. While that seems to show a vaccine effect, it might be a fluke due to sampling error. We’re assessing only 1,100 people out of a population of millions. We need to use a hypothesis test and confidence interval (CI) to make a proper statistical inference.

While the details go beyond this introductory post, here are two statistical inferences we can make using a 2-sample proportions test and CI.

The p-value of the test is < 0.0005. The evidence strongly favors the hypothesis that the vaccine effectively reduces flu infections in the population after accounting for sampling error.
Additionally, the confidence interval for the effect size is 3.7% to 10.9%. Our study found a sample effect of 7.4%, but it is unlikely to equal the population effect exactly due to sampling error. The CI identifies a range that is likely to include the population effect.

For more information about this and other flu vaccine studies, read my post about Flu Vaccine Effectiveness.

In conclusion, by using a representative sample and the proper methodology, we made a statistical inference about vaccine effectiveness in an entire population.

Reference

Monto AS, Ohmit SE, Petrie JG, Johnson E, Truscon R, Teich E, Rotthoff J, Boulton M, Victor JC. Comparative efficacy of inactivated and live attenuated influenza vaccines. N Engl J Med. 2009;361(13):1260-7.

T Distribution: Definition & Uses

By Jim Frost Leave a Comment

What is the T Distribution?

The t distribution is a continuous probability distribution that is symmetric and bell-shaped like the normal distribution but with a shorter peak and thicker tails. It was designed to factor in the greater uncertainty associated with small sample sizes.

The t distribution describes the variability of the distances between sample means and the population mean when the population standard deviation is unknown and the data approximately follow the normal distribution. This distribution has only one parameter, the degrees of freedom, based on (but not equal to) the sample size. [Read more…] about T Distribution: Definition & Uses

Representative Sample: Definition, Uses & Methods

By Jim Frost Leave a Comment

What is a Representative Sample?

A representative sample is one where the individuals in the sample reflect the properties of an entire population. Use a representative sample when you want to generalize the results from the sample to a population. By studying a representative sample, you can approximate the properties of the population from which it was drawn. [Read more…] about Representative Sample: Definition, Uses & Methods

Difference Between Standard Deviation and Standard Error

By Jim Frost 13 Comments

The difference between a standard deviation and a standard error can seem murky. Let’s clear that up in this post!

Standard deviation (SD) and standard error (SE) both measure variability. High values of either statistic indicate more dispersion. However, that’s where the similarities end. The standard deviation is not the same as the standard error. [Read more…] about Difference Between Standard Deviation and Standard Error

How to Find the P value: Process and Calculations

By Jim Frost 4 Comments

P values are everywhere in statistics. They’re in all types of hypothesis tests. But how do you calculate a p-value? Unsurprisingly, the precise calculations depend on the test. However, there is a general process that applies to finding a p value.

In this post, you’ll learn how to find the p value. I’ll start by showing you the general process for all hypothesis tests. Then I’ll move on to a step-by-step example showing the calculations for a p value. This post includes a calculator so you can apply what you learn. [Read more…] about How to Find the P value: Process and Calculations

Sampling Methods: Different Types in Research

By Jim Frost 2 Comments

What Are Sampling Methods?

Sampling methods are the processes by which you draw a sample from a population. When performing research, you’re typically interested in the results for an entire population. Unfortunately, they are almost always too large to study fully. Consequently, researchers use samples to draw conclusions about a population—the process of making statistical inferences. [Read more…] about Sampling Methods: Different Types in Research

Beta Distribution: Uses, Parameters & Examples

By Jim Frost 6 Comments

The beta distribution is a continuous probability distribution that models random variables with values falling inside a finite interval. Use it to model subject areas with both an upper and lower bound for possible values. Analysts commonly use it to model the time to complete a task, the distribution of order statistics, and the prior distribution for binomial proportions in Bayesian analysis. [Read more…] about Beta Distribution: Uses, Parameters & Examples

Geometric Distribution: Uses, Calculator & Formula

By Jim Frost Leave a Comment

What is a Geometric Distribution?

The geometric distribution is a discrete probability distribution that calculates the probability of the first success occurring during a specific trial. In other words, during a series of attempts, what is the probability of success first occurring during each attempt? Use this distribution when you need to understand how many attempts are necessary to produce the first successful outcome. [Read more…] about Geometric Distribution: Uses, Calculator & Formula

What is Power in Statistics?

By Jim Frost 1 Comment

Power in statistics is the probability that a hypothesis test can detect an effect in a sample when it exists in the population. It is the sensitivity of a hypothesis test. When an effect exists in the population, how likely is the test to detect it in your sample? [Read more…] about What is Power in Statistics?

Conditional Distribution: Definition & Finding

By Jim Frost Leave a Comment

What is a Conditional Distribution?

A conditional distribution is a distribution of values for one variable that exists when you specify the values of other variables. This type of distribution allows you to assess the dispersal of your variable of interest under specific conditions, hence the name. [Read more…] about Conditional Distribution: Definition & Finding

Marginal Distribution: Definition & Finding

By Jim Frost Leave a Comment

What is a Marginal Distribution?

A marginal distribution is a distribution of values for one variable that ignores a more extensive set of related variables in a dataset.

That definition sounds a bit convoluted, but the concept is simple. The idea is that when you have a larger set of related variables that you collected for a study, you might want to focus on one of them to answer a specific question. [Read more…] about Marginal Distribution: Definition & Finding

Content Validity: Definition, Examples & Measuring

By Jim Frost Leave a Comment

What is Content Validity?

Content validity is the degree to which a test or assessment instrument evaluates all aspects of the topic, construct, or behavior that it is designed to measure. Do the items fully cover the subject? High content validity indicates that the test fully covers the topic for the target audience. Lower results suggest that the test does not contain relevant facets of the subject matter. [Read more…] about Content Validity: Definition, Examples & Measuring

Parameter vs Statistic: Examples & Differences

By Jim Frost 3 Comments

Parameters are numbers that describe the properties of entire populations. Statistics are numbers that describe the properties of samples. [Read more…] about Parameter vs Statistic: Examples & Differences

Spurious Correlation: Definition, Examples & Detecting

By Jim Frost 5 Comments

What is a Spurious Correlation?

A spurious correlation occurs when two variables are correlated but don’t have a causal relationship. In other words, it appears like values of one variable cause changes in the other variable, but that’s not actually happening. [Read more…] about Spurious Correlation: Definition, Examples & Detecting

Contingency Table: Definition, Examples & Interpreting

By Jim Frost 8 Comments

What is a Contingency Table?

A contingency table displays frequencies for combinations of two categorical variables. Analysts also refer to contingency tables as crosstabulation and two-way tables. [Read more…] about Contingency Table: Definition, Examples & Interpreting

Permutation vs Combination: Differences & Examples

By Jim Frost 9 Comments

In mathematics and statistics, permutations vs combinations are two different ways to take a set of items or options and create subsets. For example, if you have ten people, how many subsets of three can you make? While permutation and combination seem like synonyms in everyday language, they have distinct definitions mathematically.

Permutations: The order of outcomes matters.
Combinations: The order does not matter.

Let’s understand this difference between permutation vs combination in greater detail. And then you’ll learn how to calculate the total number of each. [Read more…] about Permutation vs Combination: Differences & Examples

Cumulative Frequency: Finding & Interpreting

By Jim Frost Leave a Comment

What is Cumulative Frequency?

Cumulative frequency is the running total of frequencies in a table. Use cumulative frequencies to answer questions about how often a characteristic occurs above or below a particular value. It is also known as a cumulative frequency distribution.

For example, how many students are in the 4^th grade or lower at a school? [Read more…] about Cumulative Frequency: Finding & Interpreting

Chi-Square Goodness of Fit Test: Uses & Examples

By Jim Frost 6 Comments

What is the Chi Square Goodness of Fit Test?

The chi-square goodness of fit test evaluates whether proportions of categorical or discrete outcomes in a sample follow a population distribution with hypothesized proportions. In other words, when you draw a random sample, do the observed proportions follow the values that theory suggests. [Read more…] about Chi-Square Goodness of Fit Test: Uses & Examples

Sampling Error: Definition, Sources & Minimizing

By Jim Frost 7 Comments

What is Sampling Error?

Sampling error is the difference between a sample statistic and the population parameter it estimates. It is a crucial consideration in inferential statistics where you use a sample to estimate the properties of an entire population. [Read more…] about Sampling Error: Definition, Sources & Minimizing

Cohort Study: Definition, Benefits & Examples

By Jim Frost Leave a Comment

What is a Cohort Study?

A cohort study is a longitudinal experimental design that follows a group of participants who share a defining characteristic. For example, a cohort study can select subjects who have exposure to a risk factor, are in the same profession, population or generation, or experience a particular event, such as a medical procedure. This design determines whether exposure to a risk factor affects an outcome. Cohort studies are a type of longitudinal study because they track the same set of subjects over time. [Read more…] about Cohort Study: Definition, Benefits & Examples