What is the Mann Whitney U Test?
The Mann Whitney U test is a nonparametric hypothesis test that compares two independent groups. Statisticians also refer to it as the Wilcoxon rank sum test. [Read more…] about Mann Whitney U Test Explained
Making statistics intuitive
The Mann Whitney U test is a nonparametric hypothesis test that compares two independent groups. Statisticians also refer to it as the Wilcoxon rank sum test. [Read more…] about Mann Whitney U Test Explained
Covariance in statistics measures the extent to which two variables vary linearly. It reveals whether two variables move in the same or opposite directions. [Read more…] about Covariance: Definition, Formula & Example
The range rule of thumb allows you to estimate the standard deviation of a dataset quickly. This process is not as accurate as the actual calculation for the standard deviation, but it’s so simple you can do it in your head. [Read more…] about Range Rule of Thumb: Overview and Formula
Joint probability is the likelihood that two or more events will coincide. Knowing how to calculate them allows you to solve problems such as the following. What is the probability of:
[Read more…] about Joint Probability: Definition, Formula & Examples
Independent events in statistics are those in which one event does not affect the next event. More specifically, the occurrence of one event does not affect the probability of the following event happening. [Read more…] about Independent Events: Definition & Probability
A random variable is a variable where chance determines its value. They can take on either discrete or continuous values, and understanding the properties of each type is essential in many statistical applications. Random variables are a key concept in statistics and probability theory. [Read more…] about Random Variable: Discrete & Continuous
A least squares regression line represents the relationship between variables in a scatterplot. The procedure fits the line to the data points in a way that minimizes the sum of the squared vertical distances between the line and the points. It is also known as a line of best fit or a trend line. [Read more…] about Least Squares Regression: Definition, Formulas & Example
ANCOVA, or the analysis of covariance, is a powerful statistical method that analyzes the differences between three or more group means while controlling for the effects of at least one continuous covariate. [Read more…] about ANCOVA: Uses, Assumptions & Example
A cumulative distribution function (CDF) describes the probabilities of a random variable having values less than or equal to x. It is a cumulative function because it sums the total likelihood up to that point. Its output always ranges between 0 and 1. [Read more…] about Cumulative Distribution Function (CDF): Uses, Graphs & vs PDF
The slope intercept form of linear equations is an algebraic representation of straight lines: y = mx + b. [Read more…] about Slope Intercept Form of Linear Equations: A Guide
Monte Carlo simulation uses random sampling to produce simulated outcomes of a process or system. This method uses random sampling to generate simulated input data and enters them into a mathematical model that describes the system. The simulation produces a distribution of outcomes that analysts can use to derive probabilities. [Read more…] about Monte Carlo Simulation: Make Better Decisions
Principal Component Analysis (PCA) takes a large data set with many variables per observation and reduces them to a smaller set of summary indices. These indices retain most of the information in the original set of variables. Analysts refer to these new values as principal components. [Read more…] about Principal Component Analysis Guide & Example
Fishers exact test determines whether a statistically significant association exists between two categorical variables.
For example, does a relationship exist between gender (Male/Female) and voting Yes or No on a referendum? [Read more…] about Fishers Exact Test: Using & Interpreting
Use a Z test when you need to compare group means. Use the 1-sample analysis to determine whether a population mean is different from a hypothesized value. Or use the 2-sample version to determine whether two population means differ. [Read more…] about Z Test: Uses, Formula & Examples
A linear regression equation describes the relationship between the independent variables (IVs) and the dependent variable (DV). It can also predict new values of the DV for the IV values you specify. [Read more…] about Linear Regression Equation Explained
Relative risk is the ratio of the probability of an adverse outcome in an exposure group divided by its likelihood in an unexposed group. This statistic indicates whether exposure corresponds to increases, decreases, or no change in the probability of the adverse outcome. Use relative risk to measure the strength of the association between exposure and the outcome. Analysts also refer to this statistic as the risk ratio. [Read more…] about Relative Risk: Definition, Formula & Interpretation
Factor analysis uses the correlation structure amongst observed variables to model a smaller number of unobserved, latent variables known as factors. Researchers use this statistical method when subject-area knowledge suggests that latent factors cause observable variables to covary. Use factor analysis to identify the hidden variables. [Read more…] about Factor Analysis Guide with an Example
The K means clustering algorithm divides a set of n observations into k clusters. Use K means clustering when you don’t have existing group labels and want to assign similar data points to the number of groups you specify (K). [Read more…] about What is K Means Clustering? With an Example
Cronbach’s alpha coefficient measures the internal consistency, or reliability, of a set of survey items. Use this statistic to help determine whether a collection of items consistently measures the same characteristic. Cronbach’s alpha quantifies the level of agreement on a standardized 0 to 1 scale. Higher values indicate higher agreement between items. [Read more…] about Cronbach’s Alpha: Definition, Calculations & Example
Statistical inference is the process of using a sample to infer the properties of a population. Statistical procedures use sample data to estimate the characteristics of the whole population from which the sample was drawn.
Scientists typically want to learn about a population. When studying a phenomenon, such as the effects of a new medication or public opinion, understanding the results at a population level is much more valuable than understanding only the comparatively few participants in a study.
Unfortunately, populations are usually too large to measure fully. Consequently, researchers must use a manageable subset of that population to learn about it.
By using procedures that can make statistical inferences, you can estimate the properties and processes of a population. More specifically, sample statistics can estimate population parameters. Learn more about the differences between sample statistics and population parameters.
For example, imagine that you are studying a new medication. As a scientist, you’d like to understand the medicine’s effect in the entire population rather than just a small sample. After all, knowing the effect on a handful of people isn’t very helpful for the larger society!
Consequently, you are interested in making a statistical inference about the medicine’s effect in the population.
Read on to see how to do that! I’ll show you the general process for making a statistical inference and then cover an example using real data.
Related posts: Populations vs. Samples and Descriptive vs. Inferential Statistics
In its simplest form, the process of making a statistical inference requires you to do the following:
Of course, that’s the simple version. In real-world experiments, you might need to form treatment and control groups, administer treatments, and reduce other sources of variation. In more complex cases, you might need to create a model of a process. There are many details in the process of making a statistical inference! Learn how to incorporate statistical inference into scientific studies.
Statistical inference requires using specialized sampling methods that tend to produce representative samples. If the sample does not look like the larger population you’re studying, you can’t trust any inferences from the sample. Consequently, using an appropriate method to obtain your sample is crucial. The best sampling methods tend to produce samples that look like the target population. Learn more about Sampling Methods and Representative Samples.
After obtaining a representative sample, you’ll need to use a procedure that can make statistical inferences. While you might have a sample that looks similar to the population, it will never be identical to it. Statisticians refer to the differences between a sample and the population as sampling error. Any effect or relationship you see in your sample might actually be sampling error rather than a true finding. Inferential statistics incorporate sampling error into the results. Learn more about Sampling Error.
The following are four standard procedures than can make statistical inferences.
Let’s look at a real flu vaccine study for an example of making a statistical inference. The scientists for this study want to evaluate whether a flu vaccine effectively reduces flu cases in the general population. However, the general population is much too large to include in their study, so they must use a representative sample to make a statistical inference about the vaccine’s effectiveness.
The Monto et al. study* evaluates the 2007-2008 flu season and follows its participants from January to April. Participants are 18-49 years old. They selected ~1100 participants and randomly assigned them to the vaccine and placebo groups. After tracking them for the flu season, they record the number of flu infections in each group, as shown below.
Treatment | Flu count | Group size | Percent infections |
Placebo | 35 | 325 | 10.8% |
Vaccine | 28 | 813 | 3.4% |
Effect | 7.4% |
From the table above, 10.8% of the unvaccinated got the flu, while only 3.4% of the vaccinated caught it. The apparent effect of the vaccine is 10.8% – 3.4% = 7.4%. While that seems to show a vaccine effect, it might be a fluke due to sampling error. We’re assessing only 1,100 people out of a population of millions. We need to use a hypothesis test and confidence interval (CI) to make a proper statistical inference.
While the details go beyond this introductory post, here are two statistical inferences we can make using a 2-sample proportions test and CI.
For more information about this and other flu vaccine studies, read my post about Flu Vaccine Effectiveness.
In conclusion, by using a representative sample and the proper methodology, we made a statistical inference about vaccine effectiveness in an entire population.
Monto AS, Ohmit SE, Petrie JG, Johnson E, Truscon R, Teich E, Rotthoff J, Boulton M, Victor JC. Comparative efficacy of inactivated and live attenuated influenza vaccines. N Engl J Med. 2009;361(13):1260-7.