UPDATED! April 3, 2020. The coronavirus mortality rate varies significantly by country. In this post, I look at the mortality rates for ten countries and assess factors that affect these numbers. After discussing the trends, I provide a rough estimate for where the actual fatality rate might lie. [Read more…] about Coronavirus Mortality Rates by Country
Coronavirus Curves and Different Outcomes
UPDATED May 9, 2020. The coronavirus, or COVID19, has swept around the world. However, not all countries have had the same experiences. Outcomes have varied by the number of cases, the rate of increase, and how countries have responded.
In this post, I present coronavirus growth curves for 15 countries and their per capita values, graph their new cases per day, daily coronavirus deaths, and describe how each country approached controlling the virus. You can see the differences in outcomes and when the effects of coronavirus mitigation efforts started taking effect. I also include the per capita values for these countries in a table near the end.
At this time, there is plenty of good news with evidence that many of the 15 countries have slowed the growth rate of new cases. However, several other countries have reason to worry. And, we have one new cautionary tale about a country that had the virus contained but is now seeing a spike in new cases. [Read more…] about Coronavirus Curves and Different Outcomes
5 Ways to Find Outliers in Your Data
Outliers are data points that are far from other data points. In other words, they’re unusual values in a dataset. Outliers are problematic for many statistical analyses because they can cause tests to either miss significant findings or distort real results.
Unfortunately, there are no strict statistical rules for definitively identifying outliers. Finding outliers depends on subject-area knowledge and an understanding of the data collection process. While there is no solid mathematical definition, there are guidelines and statistical tests you can use to find outlier candidates. [Read more…] about 5 Ways to Find Outliers in Your Data
Low Power Tests Exaggerate Effect Sizes
If your study has low statistical power, it will exaggerate the effect size. What?!
Statistical power is the ability of a hypothesis test to detect an effect that exists in the population. Clearly, a high-powered study is a good thing just for being able to identify these effects. Low power reduces your chances of discovering real findings. However, many analysts don’t realize that low power also inflates the effect size. Learn more about Statistical Power.
In this post, I show how this unexpected relationship between power and exaggerated effect sizes exists. I’ll also tie it to other issues, such as the bias of effects published in journals and other matters about statistical power. I think this post will be eye-opening and thought provoking! As always, I’ll use many graphs rather than equations. [Read more…] about Low Power Tests Exaggerate Effect Sizes
Using Confidence Intervals to Compare Means
To determine whether the difference between two means is statistically significant, analysts often compare the confidence intervals for those groups. If those intervals overlap, they conclude that the difference between groups is not statistically significant. If there is no overlap, the difference is significant.
While this visual method of assessing the overlap is easy to perform, regrettably it comes at the cost of reducing your ability to detect differences. Fortunately, there is a simple solution to this problem that allows you to perform a simple visual assessment and yet not diminish the power of your analysis.
In this post, I’ll start by showing you the problem in action and explain why it happens. Then, we’ll proceed to an easy alternative method that avoids this problem. [Read more…] about Using Confidence Intervals to Compare Means
Can High P-values Be Meaningful?
Can high p-values be helpful? What do high p-values mean?
Typically, when you perform a hypothesis test, you want to obtain low p-values that are statistically significant. Low p-values are sexy. They represent exciting findings and can help you get articles published.
However, you might be surprised to learn that higher p-values, the ones that are not statistically significant, are also valuable. In this post, I’ll show you the potential value of a p-value that is greater than 0.05, or whatever significance level you’re using. [Read more…] about Can High P-values Be Meaningful?
Using Histograms to Understand Your Data
Histograms are graphs that display the distribution of your continuous data. They are fantastic exploratory tools because they reveal properties about your sample data in ways that summary statistics cannot. For instance, while the mean and standard deviation can numerically summarize your data, histograms bring your sample data to life.
In this blog post, I’ll show you how histograms reveal the shape of the distribution, its central tendency, and the spread of values in your sample data. You’ll also learn how to identify outliers, how histograms relate to probability distribution functions, and why you might need to use hypothesis tests with them.
[Read more…] about Using Histograms to Understand Your Data
Boxplots vs. Individual Value Plots: Comparing Groups
Use boxplots and individual value plots when you have a categorical grouping variable and a continuous outcome variable. The levels of the categorical variables form the groups in your data, and the researchers measure the continuous variable. Both types of charts help you compare distributions of measurements between the groups. Boxplots are also known as box and whisker plots. [Read more…] about Boxplots vs. Individual Value Plots: Comparing Groups
Using Post Hoc Tests with ANOVA
Post hoc tests are an integral part of ANOVA. When you use ANOVA to test the equality of at least three group means, statistically significant results indicate that not all of the group means are equal. However, ANOVA results do not identify which particular differences between pairs of means are significant. Use post hoc tests to explore differences between multiple group means while controlling the experiment-wise error rate.
In this post, I’ll show you what post hoc analyses are, the critical benefits they provide, and help you choose the correct one for your study. Additionally, I’ll show why failure to control the experiment-wise error rate will cause you to have severe doubts about your results. [Read more…] about Using Post Hoc Tests with ANOVA
Central Limit Theorem Explained
The central limit theorem in statistics states that, given a sufficiently large sample size, the sampling distribution of the mean for a variable will approximate a normal distribution regardless of that variable’s distribution in the population.
Unpacking the meaning from that complex definition can be difficult. That’s the topic for this post! I’ll walk you through the various aspects of the central limit theorem (CLT) definition, and show you why it is vital in statistics. [Read more…] about Central Limit Theorem Explained
Introduction to Bootstrapping in Statistics with an Example
Bootstrapping is a statistical procedure that resamples a single dataset to create many simulated samples. This process allows you to calculate standard errors, construct confidence intervals, and perform hypothesis testing for numerous types of sample statistics. Bootstrap methods are alternative approaches to traditional hypothesis testing and are notable for being easier to understand and valid for more conditions.
In this blog post, I explain bootstrapping basics, compare bootstrapping to conventional statistical methods, and explain when it can be the better method. Additionally, I’ll work through an example using real data to create bootstrapped confidence intervals. [Read more…] about Introduction to Bootstrapping in Statistics with an Example
Assessing Normality: Histograms vs. Normal Probability Plots
Because histograms display the shape and spread of distributions, you might think they’re the best type of graph for determining whether your data are normally distributed. However, I’ll show you how histograms can trick you! Normal probability plots are a better choice for this task and they are easy to use. Normal probability plots are also known as quantile-quantile plots, or Q-Q Plots for short!
[Read more…] about Assessing Normality: Histograms vs. Normal Probability Plots
Normal Distribution in Statistics
The normal distribution, also known as the Gaussian distribution, is the most important probability distribution in statistics for independent, random variables. Most people recognize its familiar bell-shaped curve in statistical reports.
The normal distribution is a continuous probability distribution that is symmetrical around its mean, most of the observations cluster around the central peak, and the probabilities for values further away from the mean taper off equally in both directions. Extreme values in both tails of the distribution are similarly unlikely. While the normal distribution is symmetrical, not all symmetrical distributions are normal. For example, the Student’s t, Cauchy, and logistic distributions are symmetric.
As with any probability distribution, the normal distribution describes how the values of a variable are distributed. It is the most important probability distribution in statistics because it accurately describes the distribution of values for many natural phenomena. Characteristics that are the sum of many independent processes frequently follow normal distributions. For example, heights, blood pressure, measurement error, and IQ scores follow the normal distribution.
In this blog post, learn how to use the normal distribution, about its parameters, the Empirical Rule, and how to calculate Z-scores to standardize your data and find probabilities. [Read more…] about Normal Distribution in Statistics
Probability Distribution: Definition & Calculations
What is a Probability Distribution?
A probability distribution is a statistical function that describes the likelihood of obtaining all possible values that a random variable can take. In other words, the values of the variable vary based on the underlying probability distribution. Typically, analysts display probability distributions in graphs and tables. There are equations to calculate probability distributions.
Suppose you draw a random sample and measure the heights of the subjects. As you measure heights, you create a distribution of heights. This type of distribution is useful when you need to know which outcomes are most likely, the spread of potential values, and the likelihood of different results.
In this blog post, you’ll learn about probability distributions for both discrete and continuous variables. I’ll show you how they work and examples of how to use them. [Read more…] about Probability Distribution: Definition & Calculations
Interpreting Correlation Coefficients
What are Correlation Coefficients?
Correlation coefficients measure the strength of the relationship between two variables. A correlation between variables indicates that as one variable changes in value, the other variable tends to change in a specific direction. Understanding that relationship is useful because we can use the value of one variable to predict the value of the other variable. For example, height and weight are correlated—as height increases, weight also tends to increase. Consequently, if we observe an individual who is unusually tall, we can predict that his weight is also above the average. [Read more…] about Interpreting Correlation Coefficients
How to Calculate Sample Size Needed for Power
Determining a good sample size for a study is always an important issue. After all, using the wrong sample size can doom your study from the start. Fortunately, power analysis can find the answer for you. Power analysis combines statistical analysis, subject-area knowledge, and your requirements to help you derive the optimal sample size for your study.
Statistical power in a hypothesis test is the probability that the test will detect an effect that actually exists. As you’ll see in this post, both under-powered and over-powered studies are problematic. Let’s learn how to find a good sample size for your study! Learn more about Statistical Power. [Read more…] about How to Calculate Sample Size Needed for Power
Measures of Variability: Range, Interquartile Range, Variance, and Standard Deviation
A measure of variability is a summary statistic that represents the amount of dispersion in a dataset. How spread out are the values? While a measure of central tendency describes the typical value, measures of variability define how far away the data points tend to fall from the center. We talk about variability in the context of a distribution of values. A low dispersion indicates that the data points tend to be clustered tightly around the center. High dispersion signifies that they tend to fall further away.
In statistics, variability, dispersion, and spread are synonyms that denote the width of the distribution. Just as there are multiple measures of central tendency, there are several measures of variability. In this blog post, you’ll learn why understanding the variability of your data is critical. Then, I explore the most common measures of variability—the range, interquartile range, variance, and standard deviation. I’ll help you determine which one is best for your data. [Read more…] about Measures of Variability: Range, Interquartile Range, Variance, and Standard Deviation
Mean, Median, and Mode: Measures of Central Tendency
What is Central Tendency?
Measures of central tendency are summary statistics that represent the center point or typical value of a dataset. Examples of these measures include the mean, median, and mode. These statistics indicate where most values in a distribution fall and are also referred to as the central location of a distribution. You can think of central tendency as the propensity for data points to cluster around a middle value.
In statistics, the mean, median, and mode are the three most common measures of central tendency. Each one calculates the central point using a different method. Choosing the best measure of central tendency depends on the type of data you have. In this post, I explore the mean, median, and mode as measures of central tendency, show you how to calculate them, and how to determine which one is best for your data.
[Read more…] about Mean, Median, and Mode: Measures of Central Tendency
Guide to Data Types and How to Graph Them in Statistics
In the field of statistics, data are vital. Data are the information that you collect to learn, draw conclusions, and test hypotheses. After all, statistics is the science of learning from data. However, there are different types of variables, and they record various kinds of information. Crucially, the type of information determines what you can learn from it, and, importantly, what you cannot learn from it. Consequently, it’s essential that you understand the different types of data. [Read more…] about Guide to Data Types and How to Graph Them in Statistics
Maximize the Value of Your Binary Data with the Binomial and Other Probability Distributions
Binary data occur when you can place an observation into only two categories. It tells you that an event occurred or that an item has a particular characteristic. For instance, an inspection process produces binary pass/fail results. Or, when a customer enters a store, there are two possible outcomes—sale or no sale. In this post, I show you how to use the binomial, geometric, negative binomial, and the hypergeometric probability distributions to glean more information from your binary data. [Read more…] about Maximize the Value of Your Binary Data with the Binomial and Other Probability Distributions