Having independent and identically distributed (IID) data is a common assumption for statistical procedures and hypothesis tests. But what does that mouthful of words actually mean? That’s the topic of this post! And, I’ll provide helpful tips for determining whether your data are IID. [Read more…] about Independent and Identically Distributed Data (IID)

# Basics

## Coronavirus Mortality Rates by Country

**UPDATED! April 3, 2020.** The coronavirus mortality rate varies significantly by country. In this post, I look at the mortality rates for ten countries and assess factors that affect these numbers. After discussing the trends, I provide a rough estimate for where the actual fatality rate might lie. [Read more…] about Coronavirus Mortality Rates by Country

## Coronavirus: Exponential Growth and Hospital Beds

**UPDATED March 24, 2020**: As the number of confirmed coronavirus cases continues to grow exponentially, the capacity of the hospital system to treat these cases is becoming a concern. The goal of “flattening the curve” is that testing, isolation, and social distancing will slow the increase of new cases. Hopefully, these efforts reduce the numbers of new patients who require hospitalization to a rate that hospitals can handle.

In this post, I’ll identify the top 10 states in the United States that have the greatest likelihood of experiencing hospital capacity problems if coronavirus cases continue to grow exponentially. To recognize these states, I’ll assess per capita rates for both coronavirus infections and hospital beds. I’m looking for states that have a relatively large number of coronavirus cases given the size of their population and have a relatively low number of hospital beds. [Read more…] about Coronavirus: Exponential Growth and Hospital Beds

## Coronavirus Curves and Different Outcomes

**UPDATED May 9, 2020**. The coronavirus, or COVID19, has swept around the world. However, not all countries have had the same experiences. Outcomes have varied by the number of cases, the rate of increase, and how countries have responded.

In this post, I present coronavirus growth curves for 15 countries and their per capita values, graph their new cases per day, daily coronavirus deaths, and describe how each country approached controlling the virus. You can see the differences in outcomes and when the effects of coronavirus mitigation efforts started taking effect. I also include the per capita values for these countries in a table near the end.

At this time, there is plenty of good news with evidence that many of the 15 countries have slowed the growth rate of new cases. However, several other countries have reason to worry. And, we have one new cautionary tale about a country that had the virus contained but is now seeing a spike in new cases. [Read more…] about Coronavirus Curves and Different Outcomes

## Guidelines for Removing and Handling Outliers in Data

Outliers are unusual values in your dataset, and they can distort statistical analyses and violate their assumptions. Unfortunately, all analysts will confront outliers and be forced to make decisions about what to do with them. Given the problems they can cause, you might think that it’s best to remove them from your data. But, that’s not always the case. Removing outliers is legitimate only for specific reasons. [Read more…] about Guidelines for Removing and Handling Outliers in Data

## 5 Ways to Find Outliers in Your Data

Outliers are data points that are far from other data points. In other words, they’re unusual values in a dataset. Outliers are problematic for many statistical analyses because they can cause tests to either miss significant findings or distort real results.

Unfortunately, there are no strict statistical rules for definitively identifying outliers. Finding outliers depends on subject-area knowledge and an understanding of the data collection process. While there is no solid mathematical definition, there are guidelines and statistical tests you can use to find outlier candidates. [Read more…] about 5 Ways to Find Outliers in Your Data

## New eBook Release! Introduction to Statistics: An Intuitive Guide

Iâ€™m thrilled to release my new book! *Introduction to Statistics: An Intuitive Guide for Analyzing Data and Unlocking Discoveries*. [Read more…] about New eBook Release! Introduction to Statistics: An Intuitive Guide

## Causation in Statistics: Hill’s Criteria

Causation indicates that an event affects an outcome. Do fatty diets cause heart problems? If you study for a test, does it cause you to get a higher score?

In statistics, causation is a bit tricky. As you’ve no doubt heard, correlation doesn’t necessarily imply causation. An association or correlation between variables simply indicates that the values vary together. It does not necessarily suggest that changes in one variable cause changes in the other variable. Proving causality can be difficult.

If correlation does not prove causation, what statistical test do you use to assess causality? That’s a trick question because no statistical analysis can make that determination. In this post, learn about why you want to determine causation and how to do that. [Read more…] about Causation in Statistics: Hill’s Criteria

## What is an Observational Study: Definition & Examples

## What is an Observational Study?

An observational study uses sample data to find correlations in situations where the researchers do not control the treatment, or independent variable, that relates to the primary research question. The definition of an observational study hinges on the notion that the researchers only observe subjects and do not assign them to the control and treatment groups. That’s the key difference between an observational study vs experiment. These studies are also known as quasi-experiments and correlational studies.

True experiments assign subject to the experimental groups where the researchers can manipulate the conditions. Unfortunately, random assignment is not always possible. For these cases, you can conduct an observational study.

In this post, learn about the types of observational studies, why they are susceptible to confounding variables, and how they compare to experiments. I’ll close this post by reviewing a published observational study about vitamin supplement usage. [Read more…] about What is an Observational Study: Definition & Examples

## Random Assignment in Experiments

Random assignment uses chance to assign subjects to the control and treatment groups in an experiment. This process helps ensure that the groups are equivalent at the beginning of the study, which makes it safer to assume the treatments caused any differences between groups that the experimenters observe at the end of the study. [Read more…] about Random Assignment in Experiments

## 5 Steps for Conducting Scientific Studies with Statistical Analyses

The scientific method is a proven procedure for expanding knowledge through experimentation and analysis. It is a process that uses careful planning, rigorous methodology, and thorough assessment. Statistical analysis plays an essential role in this process.

In an experiment that includes statistical analysis, the analysis is at the end of a long series of events. To obtain valid results, it’s crucial that you carefully plan and conduct a scientific study for all steps up to and including the analysis. In this blog post, I map out five steps for scientific studies that include statistical analyses. [Read more…] about 5 Steps for Conducting Scientific Studies with Statistical Analyses

## Percentiles: Interpretations and Calculations

Percentiles indicate the percentage of scores that fall below a particular value. They tell you where a score stands relative to other scores. For example, a person with an IQ of 120 is at the 91^{st }percentile, which indicates that their IQ is higher than 91 percent of other scores.

Percentiles are a great tool to use when you need to know the relative standing of a value. Where does a value fall within a distribution of values? While the concept behind percentiles is straight forward, there are different mathematical methods for calculating them. In this post, learn about percentiles, special percentiles and their surprisingly flexible uses, and the various procedures for calculating them. [Read more…] about Percentiles: Interpretations and Calculations

## Using Histograms to Understand Your Data

Histograms are graphs that display the distribution of your continuous data. They are fantastic exploratory tools because they reveal properties about your sample data in ways that summary statistics cannot. For instance, while the mean and standard deviation can numerically summarize your data, histograms bring your sample data to life.

In this blog post, I’ll show you how histograms reveal the shape of the distribution, its central tendency, and the spread of values in your sample data. You’ll also learn how to identify outliers, how histograms relate to probability distribution functions, and why you might need to use hypothesis tests with them.

[Read more…] about Using Histograms to Understand Your Data

## Central Limit Theorem Explained

The central limit theorem in statistics states that, given a sufficiently large sample size, the sampling distribution of the mean for a variable will approximate a normal distribution regardless of that variable’s distribution in the population.

Unpacking the meaning from that complex definition can be difficult. That’s the topic for this post! I’ll walk you through the various aspects of the central limit theorem (CLT) definition, and show you why it is vital in statistics. [Read more…] about Central Limit Theorem Explained

## Assessing Normality: Histograms vs. Normal Probability Plots

Because histograms display the shape and spread of distributions, you might think they’re the best type of graph for determining whether your data are normally distributed. However, I’ll show you how histograms can trick you! Normal probability plots are a better choice for this task and they are easy to use. Normal probability plots are also known as quantile-quantile plots, or Q-Q Plots for short!

[Read more…] about Assessing Normality: Histograms vs. Normal Probability Plots

## Sample Statistics Are Always Wrong (to Some Extent)!

Here’s some shocking information for you—sample statistics are *always* wrong! When you use samples to estimate the properties of populations, you never obtain the correct values exactly. Don’t worry. I’ll help you navigate this issue using a simple statistical tool! [Read more…] about Sample Statistics Are Always Wrong (to Some Extent)!

## Populations, Parameters, and Samples in Inferential Statistics

Inferential statistics lets you draw conclusions about populations by using small samples. Consequently, inferential statistics provide enormous benefits because typically you can’t measure an entire population.

However, to gain these benefits, you must understand the relationship between populations, subpopulations, population parameters, samples, and sample statistics.

In this blog post, learn the differences between population vs. sample, parameter vs. statistic, and how to obtain representative samples using random sampling.

**Related post**: Difference between Descriptive and Inferential Statistics

[Read more…] about Populations, Parameters, and Samples in Inferential Statistics

## Normal Distribution in Statistics

The normal distribution, also known as the Gaussian distribution, is the most important probability distribution in statistics for independent, random variables. Most people recognize its familiar bell-shaped curve in statistical reports.

The normal distribution is a continuous probability distribution that is symmetrical around its mean, most of the observations cluster around the central peak, and the probabilities for values further away from the mean taper off equally in both directions. Extreme values in both tails of the distribution are similarly unlikely. While the normal distribution is symmetrical, not all symmetrical distributions are normal. For example, the Student’s t, Cauchy, and logistic distributions are symmetric.

As with any probability distribution, the normal distribution describes how the values of a variable are distributed. It is the most important probability distribution in statistics because it accurately describes the distribution of values for many natural phenomena. Characteristics that are the sum of many independent processes frequently follow normal distributions. For example, heights, blood pressure, measurement error, and IQ scores follow the normal distribution.

In this blog post, learn how to use the normal distribution, about its parameters, the Empirical Rule, and how to calculate Z-scores to standardize your data and find probabilities. [Read more…] about Normal Distribution in Statistics

## Probability Distribution: Definition & Calculations

## What is a Probability Distribution?

A probability distribution is a statistical function that describes the likelihood of obtaining all possible values that a random variable can take. In other words, the values of the variable vary based on the underlying probability distribution. Typically, analysts display probability distributions in graphs and tables. There are equations to calculate probability distributions.

Suppose you draw a random sample and measure the heights of the subjects. As you measure heights, you create a distribution of heights. This type of distribution is useful when you need to know which outcomes are most likely, the spread of potential values, and the likelihood of different results.

In this blog post, you’ll learn about probability distributions for both discrete and continuous variables. I’ll show you how they work and examples of how to use them. [Read more…] about Probability Distribution: Definition & Calculations

## Interpreting Correlation Coefficients

## What are Correlation Coefficients?

Correlation coefficients measure the strength of the relationship between two variables. A correlation between variables indicates that as one variable changes in value, the other variable tends to change in a specific direction. Understanding that relationship is useful because we can use the value of one variable to predict the value of the other variable. For example, height and weight are correlated—as height increases, weight also tends to increase. Consequently, if we observe an individual who is unusually tall, we can predict that his weight is also above the average. [Read more…] about Interpreting Correlation Coefficients