Chebyshev’s Theorem estimates the minimum proportion of observations that fall within a specified number of standard deviations from the mean. This theorem applies to a broad range of probability distributions. Chebyshev’s Theorem is also known as Chebyshev’s Inequality. [Read more…] about Chebyshev’s Theorem in Statistics
Spearman’s correlation in statistics is a nonparametric alternative to Pearson’s correlation. Use Spearman’s correlation for data that follow curvilinear, monotonic relationships and for ordinal data. Statisticians also refer to Spearman’s rank order correlation coefficient as Spearman’s ρ (rho).
In this post, I’ll cover what all that means so you know when and why you should use Spearman’s correlation instead of the more common Pearson’s correlation. [Read more…] about Spearman’s Correlation Explained
Descriptive statistics summarize your dataset, painting a picture of its properties. These properties include various central tendency and variability measures, distribution properties, outlier detection, and other information. Unlike inferential statistics, descriptive statistics only describe your dataset’s characteristics and do not attempt to generalize from a sample to a population. [Read more…] about Descriptive Statistics in Excel
My background includes working on scientific projects as the data guy. In these positions, I was responsible for establishing valid data collection procedures, collecting usable data, and statistically analyzing and presenting the results. In this post, I describe the excitement of being a statistician helping expand the limits of human knowledge, what I learned about applied statistics and data analysis during the first big project in my career, and the challenges along the way! [Read more…] about Using Applied Statistics to Expand Human Knowledge
The coefficient of variation (CV) is a relative measure of variability that indicates the size of a standard deviation in relation to its mean. It is a standardized, unitless measure that allows you to compare variability between disparate groups and characteristics. It is also known as the relative standard deviation (RSD).
In this post, you will learn about the coefficient of variation, how to calculate it, know when it is particularly useful, and when to avoid it. [Read more…] about Coefficient of Variation in Statistics
When comparing groups in your data, you can have either independent or dependent samples. The type of samples in your design impacts sample size requirements, statistical power, the proper analysis, and even your study’s costs. Understanding the implications of each type of sample can help you design a better study. [Read more…] about Independent and Dependent Samples in Statistics
Having independent and identically distributed (IID) data is a common assumption for statistical procedures and hypothesis tests. But what does that mouthful of words actually mean? That’s the topic of this post! And, I’ll provide helpful tips for determining whether your data are IID. [Read more…] about Independent and Identically Distributed Data (IID)
UPDATED! April 3, 2020. The coronavirus mortality rate varies significantly by country. In this post, I look at the mortality rates for ten countries and assess factors that affect these numbers. After discussing the trends, I provide a rough estimate for where the actual fatality rate might lie. [Read more…] about Coronavirus Mortality Rates by Country
UPDATED March 24, 2020: As the number of confirmed coronavirus cases continues to grow exponentially, the capacity of the hospital system to treat these cases is becoming a concern. The goal of “flattening the curve” is that testing, isolation, and social distancing will slow the increase of new cases. Hopefully, these efforts reduce the numbers of new patients who require hospitalization to a rate that hospitals can handle.
In this post, I’ll identify the top 10 states in the United States that have the greatest likelihood of experiencing hospital capacity problems if coronavirus cases continue to grow exponentially. To recognize these states, I’ll assess per capita rates for both coronavirus infections and hospital beds. I’m looking for states that have a relatively large number of coronavirus cases given the size of their population and have a relatively low number of hospital beds. [Read more…] about Coronavirus: Exponential Growth and Hospital Beds
UPDATED May 9, 2020. The coronavirus, or COVID19, has swept around the world. However, not all countries have had the same experiences. Outcomes have varied by the number of cases, the rate of increase, and how countries have responded.
In this post, I present coronavirus growth curves for 15 countries and their per capita values, graph their new cases per day, daily coronavirus deaths, and describe how each country approached controlling the virus. You can see the differences in outcomes and when the effects of coronavirus mitigation efforts started taking effect. I also include the per capita values for these countries in a table near the end.
At this time, there is plenty of good news with evidence that many of the 15 countries have slowed the growth rate of new cases. However, several other countries have reason to worry. And, we have one new cautionary tale about a country that had the virus contained but is now seeing a spike in new cases. [Read more…] about Coronavirus Curves and Different Outcomes
Outliers are unusual values in your dataset, and they can distort statistical analyses and violate their assumptions. Unfortunately, all analysts will confront outliers and be forced to make decisions about what to do with them. Given the problems they can cause, you might think that it’s best to remove them from your data. But, that’s not always the case. Removing outliers is legitimate only for specific reasons. [Read more…] about Guidelines for Removing and Handling Outliers in Data
Outliers are data points that are far from other data points. In other words, they’re unusual values in a dataset. Outliers are problematic for many statistical analyses because they can cause tests to either miss significant findings or distort real results.
Unfortunately, there are no strict statistical rules for definitively identifying outliers. Finding outliers depends on subject-area knowledge and an understanding of the data collection process. While there is no solid mathematical definition, there are guidelines and statistical tests you can use to find outlier candidates. [Read more…] about 5 Ways to Find Outliers in Your Data
I’m thrilled to release my new ebook! Introduction to Statistics: An Intuitive Guide for Analyzing Data and Unlocking Discoveries. [Read more…] about New eBook Release! Introduction to Statistics: An Intuitive Guide
Causation indicates that an event affects an outcome. Do fatty diets cause heart problems? If you study for a test, does it cause you to get a higher score?
In statistics, causation is a bit tricky. As you’ve no doubt heard, correlation doesn’t necessarily imply causation. An association or correlation between variables simply indicates that the values vary together. It does not necessarily suggest that changes in one variable cause changes in the other variable. Proving causality can be difficult.
If correlation does not prove causation, what statistical test do you use to assess causality? That’s a trick question because no statistical analysis can make that determination. In this post, learn about why you want to determine causation and how to do that. [Read more…] about Causation in Statistics: Hill’s Criteria
Observational studies use samples to draw conclusions about a population when the researchers do not control the treatment, or independent variable, that relates to the primary research question.
In my previous post, I show how random assignment reduces systematic differences between experimental groups at the beginning of the study, which increases your confidence that the treatments caused any differences between groups you observe at the end of the study.
Unfortunately, using random assignment is not always possible. For these cases, you can conduct an observational study. In this post, learn about observational studies, why these studies must account for confounding variables, and how to do so. I’ll close this post by reviewing a published observational study about vitamin supplement usage. [Read more…] about Observational Studies Explained
Random assignment uses chance to assign subjects to the control and treatment groups in an experiment. This process helps ensure that the groups are equivalent at the beginning of the study, which makes it safer to assume the treatments caused any differences between groups that the experimenters observe at the end of the study. [Read more…] about Random Assignment in Experiments
The scientific method is a proven procedure for expanding knowledge through experimentation and analysis. It is a process that uses careful planning, rigorous methodology, and thorough assessment. Statistical analysis plays an essential role in this process.
In an experiment that includes statistical analysis, the analysis is at the end of a long series of events. To obtain valid results, it’s crucial that you carefully plan and conduct a scientific study for all steps up to and including the analysis. In this blog post, I map out five steps for scientific studies that include statistical analyses. [Read more…] about 5 Steps for Conducting Scientific Studies with Statistical Analyses
Percentiles indicate the percentage of scores that fall below a particular value. They tell you where a score stands relative to other scores. For example, a person with an IQ of 120 is at the 91st percentile, which indicates that their IQ is higher than 91 percent of other scores.
Percentiles are a great tool to use when you need to know the relative standing of a value. Where does a value fall within a distribution of values? While the concept behind percentiles is straight forward, there are different mathematical methods for calculating them. In this post, learn about percentiles, special percentiles and their surprisingly flexible uses, and the various procedures for calculating them. [Read more…] about Percentiles: Interpretations and Calculations
Histograms are graphs that display the distribution of your continuous data. They are fantastic exploratory tools because they reveal properties about your sample data in ways that summary statistics cannot. For instance, while the mean and standard deviation can numerically summarize your data, histograms bring your sample data to life.
In this blog post, I’ll show you how histograms reveal the shape of the distribution, its central tendency, and the spread of values in your sample data. You’ll also learn how to identify outliers, how histograms relate to probability distribution functions, and why you might need to use hypothesis tests with them.
[Read more…] about Using Histograms to Understand Your Data