A linear regression equation describes the relationship between the independent variables (IVs) and the dependent variable (DV). It can also predict new values of the DV for the IV values you specify. [Read more…] about Linear Regression Equation Explained
What is Relative Risk?
Relative risk is the ratio of the probability of an adverse outcome in an exposure group divided by its likelihood in an unexposed group. This statistic indicates whether exposure corresponds to increases, decreases, or no change in the probability of the adverse outcome. Use relative risk to measure the strength of the association between exposure and the outcome. Analysts also refer to this statistic as the risk ratio. [Read more…] about Relative Risk: Definition, Formula & Interpretation
What is Factor Analysis?
Factor analysis uses the correlation structure amongst observed variables to model a smaller number of unobserved, latent variables known as factors. Researchers use this statistical method when subject-area knowledge suggests that latent factors cause observable variables to covary. Use factor analysis to identify the hidden variables. [Read more…] about Factor Analysis Guide with an Example
What is K Means Clustering?
The K means clustering algorithm divides a set of n observations into k clusters. Use K means clustering when you don’t have existing group labels and want to assign similar data points to the number of groups you specify (K). [Read more…] about What is K Means Clustering? With an Example
What is Cronbach’s Alpha?
Cronbach’s alpha coefficient measures the internal consistency, or reliability, of a set of survey items. Use this statistic to help determine whether a collection of items consistently measures the same characteristic. Cronbach’s alpha quantifies the level of agreement on a standardized 0 to 1 scale. Higher values indicate higher agreement between items. [Read more…] about Cronbach’s Alpha: Definition, Calculations & Example
What is Statistical Inference?
Statistical inference is the process of using a sample to infer the properties of a population. Statistical procedures use sample data to estimate the characteristics of the whole population from which the sample was drawn.
Scientists typically want to learn about a population. When studying a phenomenon, such as the effects of a new medication or public opinion, understanding the results at a population level is much more valuable than understanding only the comparatively few participants in a study.
Unfortunately, populations are usually too large to measure fully. Consequently, researchers must use a manageable subset of that population to learn about it.
By using procedures that can make statistical inferences, you can estimate the properties and processes of a population. More specifically, sample statistics can estimate population parameters. Learn more about the differences between sample statistics and population parameters.
For example, imagine that you are studying a new medication. As a scientist, you’d like to understand the medicine’s effect in the entire population rather than just a small sample. After all, knowing the effect on a handful of people isn’t very helpful for the larger society!
Consequently, you are interested in making a statistical inference about the medicine’s effect in the population.
Read on to see how to do that! I’ll show you the general process for making a statistical inference and then cover an example using real data.
Related post: Descriptive vs. Inferential Statistics
How to Make Statistical Inferences
In its simplest form, the process of making a statistical inference requires you to do the following:
- Draw a sample that adequately represents the population.
- Measure your variables of interest.
- Use appropriate statistical methodology to generalize your sample results to the population while accounting for sampling error.
Of course, that’s the simple version. In real-world experiments, you might need to form treatment and control groups, administer treatments, and reduce other sources of variation. In more complex cases, you might need to create a model of a process. There are many details in the process of making a statistical inference! Learn how to incorporate statistical inference into scientific studies.
Statistical inference requires using specialized sampling methods that tend to produce representative samples. If the sample does not look like the larger population you’re studying, you can’t trust any inferences from the sample. Consequently, using an appropriate method to obtain your sample is crucial. The best sampling methods tend to produce samples that look like the target population. Learn more about Sampling Methods and Representative Samples.
After obtaining a representative sample, you’ll need to use a procedure that can make statistical inferences. While you might have a sample that looks similar to the population, it will never be identical to it. Statisticians refer to the differences between a sample and the population as sampling error. Any effect or relationship you see in your sample might actually be sampling error rather than a true finding. Inferential statistics incorporate sampling error into the results. Learn more about Sampling Error.
Common Inferential Methods
The following are four standard procedures than can make statistical inferences.
- Hypothesis Testing: Uses representative samples to assess two mutually exclusive hypotheses about a population. Statistically significant results suggest that the sample effect or relationship exists in the population after accounting for sampling error.
- Confidence Intervals: A range of values likely containing the population value. This procedure evaluates the sampling error and adds a margin around the estimate, giving an idea of how wrong it might be.
- Margin of Error: Comparable to a confidence interval but usually for survey results.
- Regression Modeling: An estimate of the process that generates the outcomes in the population.
Example Statistical Inference
Let’s look at a real flu vaccine study for an example of making a statistical inference. The scientists for this study want to evaluate whether a flu vaccine effectively reduces flu cases in the general population. However, the general population is much too large to include in their study, so they must use a representative sample to make a statistical inference about the vaccine’s effectiveness.
The Monto et al. study* evaluates the 2007-2008 flu season and follows its participants from January to April. Participants are 18-49 years old. They selected ~1100 participants and randomly assigned them to the vaccine and placebo groups. After tracking them for the flu season, they record the number of flu infections in each group, as shown below.
|Treatment||Flu count||Group size||Percent infections|
Monto Study Findings
From the table above, 10.8% of the unvaccinated got the flu, while only 3.4% of the vaccinated caught it. The apparent effect of the vaccine is 10.8% – 3.4% = 7.4%. While that seems to show a vaccine effect, it might be a fluke due to sampling error. We’re assessing only 1,100 people out of a population of millions. We need to use a hypothesis test and confidence interval (CI) to make a proper statistical inference.
While the details go beyond this introductory post, here are two statistical inferences we can make using a 2-sample proportions test and CI.
- The p-value of the test is < 0.0005. The evidence strongly favors the hypothesis that the vaccine effectively reduces flu infections in the population after accounting for sampling error.
- Additionally, the confidence interval for the effect size is 3.7% to 10.9%. Our study found a sample effect of 7.4%, but it is unlikely to equal the population effect exactly due to sampling error. The CI identifies a range that is likely to include the population effect.
For more information about this and other flu vaccine studies, read my post about Flu Vaccine Effectiveness.
In conclusion, by using a representative sample and the proper methodology, we made a statistical inference about vaccine effectiveness in an entire population.
Monto AS, Ohmit SE, Petrie JG, Johnson E, Truscon R, Teich E, Rotthoff J, Boulton M, Victor JC. Comparative efficacy of inactivated and live attenuated influenza vaccines. N Engl J Med. 2009;361(13):1260-7.
The chi-square goodness of fit test evaluates whether proportions of categorical or discrete outcomes in a sample follow a population distribution with hypothesized proportions. In other words, when you draw a random sample, do the observed proportions follow the values that theory suggests. [Read more…] about Chi-Square Goodness of Fit Test: Uses & Examples
What is Inter-Rater Reliability?
Inter-rater reliability measures the agreement between subjective ratings by multiple raters, inspectors, judges, or appraisers. It answers the question, is the rating system consistent? High inter-rater reliability indicates that multiple raters’ ratings for the same item are consistent. Conversely, low reliability means they are inconsistent. [Read more…] about Inter-Rater Reliability: Definition, Examples & Assessing
What is Linear Regression?
Linear regression models the relationships between at least one explanatory variable and an outcome variable. These variables are known as the independent and dependent variables, respectively. When there is one independent variable (IV), the procedure is known as simple linear regression. When there are more IVs, statisticians refer to it as multiple regression. [Read more…] about Linear Regression
What is the 5 Number Summary?
The 5 number summary is an exploratory data analysis tool that provides insight into the distribution of values for one variable. Collectively, this set of statistics describes where data values occur, their central tendency, variability, and the general shape of their distribution. [Read more…] about 5 Number Summary: Definition, Finding & Using
What is a Paired T Test?
Use a paired t-test when each subject has a pair of measurements, such as a before and after score. A paired t-test determines whether the mean change for these pairs is significantly different from zero. This test is an inferential statistics procedure because it uses samples to draw conclusions about populations.
Paired t tests are also known as a paired sample t-test or a dependent samples t test. These names reflect the fact that the two samples are paired or dependent because they contain the same subjects. Conversely, an independent samples t test contains different subjects in the two samples. [Read more…] about Paired T Test: Definition & When to Use It
What is an Independent Samples T Test?
Use an independent samples t test when you want to compare the means of precisely two groups—no more and no less! Typically, you perform this test to determine whether two population means are different. This procedure is an inferential statistical hypothesis test, meaning it uses samples to draw conclusions about populations. The independent samples t test is also known as the two sample t test. [Read more…] about Independent Samples T Test: Definition, Using & Interpreting
What is Conditional Probability?
A conditional probability is the likelihood of an event occurring given that another event has already happened. Conditional probabilities allow you to evaluate how prior information affects probabilities. For example, what is the probability of A given B has occurred? When you incorporate existing facts into the calculations, it can change the likelihood of an outcome. [Read more…] about Conditional Probability: Definition, Formula & Examples
Use scatterplots to show relationships between pairs of continuous variables. These graphs display symbols at the X, Y coordinates of the data points for the paired variables. Scatterplots are also known as scattergrams and scatter charts. [Read more…] about Scatterplots: Using, Examples, and Interpreting
Use pie charts to compare the sizes of categories to the entire dataset. To create a pie chart, you must have a categorical variable that divides your data into groups. These graphs consist of a circle (i.e., the pie) with slices representing subgroups. The size of each slice is proportional to the relative size of each category out of the whole. [Read more…] about Pie Charts: Using, Examples, and Interpreting
Use bar charts to compare categories when you have at least one categorical or discrete variable. Each bar represents a summary value for one discrete level, where longer bars indicate higher values. Types of summary values include counts, sums, means, and standard deviations. Bar charts are also known as bar graphs. [Read more…] about Bar Charts: Using, Examples, and Interpreting
Use line charts to display a series of data points that are connected by lines. Analysts use line charts to emphasize changes in a metric on the vertical Y-axis by another variable on the horizontal X-axis. Often, the X-axis reflects time, but not always. Line charts are also known as line plots. [Read more…] about Line Charts: Using, Examples, and Interpreting
Use dot plots to display the distribution of your sample data when you have continuous variables. These graphs stack dots along the horizontal X-axis to represent the frequencies of different values. More dots indicate greater frequency. Each dot represents a set number of observations. [Read more…] about Dot Plots: Using, Examples, and Interpreting
Use an empirical cumulative distribution function plot to display the data points in your sample from lowest to highest against their percentiles. These graphs require continuous variables and allow you to derive percentiles and other distribution properties. This function is also known as the empirical CDF or ECDF. [Read more…] about Empirical Cumulative Distribution Function (CDF) Plots
Excel can calculate correlation coefficients and a variety of other statistical analyses. Even if you don’t use Excel regularly, this post is an excellent introduction to calculating and interpreting correlation.
In this post, I provide step-by-step instructions for having Excel calculate Pearson’s correlation coefficient, and I’ll show you how to interpret the results. Additionally, I include links to relevant statistical resources I’ve written that provide intuitive explanations. Together, we’ll analyze and interpret an example dataset! [Read more…] about Using Excel to Calculate Correlation