I’m very much an empirical, data, statistics, and science type of guy. So, it might be a surprise to learn that I’ve gone ghost hunting a number of times. Now, I’m not a paranormal enthusiast. I’m definitely a skeptic. However, in my view, being skeptical about something does not preclude collecting data about it. I also have friends I trust completely who are sure they’ve experienced paranormal activity. Plus, I don’t need much of an excuse to try something new and unusual! [Read more…] about Ghost Hunting with a Statistics Mindset

# Blog

## Paired T Test

Use a paired t-test when each subject has a pair of measurements, such as a before and after score. A paired t-test determines whether the mean change for these pairs is significantly different from zero. This test is an inferential statistics procedure because it uses samples to draw conclusions about populations.

Paired t tests are also known as dependent samples t tests. The two samples are dependent because they contain the same subjects. Conversely, an independent samples t test contains different subjects in the two samples. [Read more…] about Paired T Test

## Independent Samples T Test

Use an independent samples t test when you want to compare the means of precisely two groups—no more and no less! Typically, you perform this test to determine whether two population means are different. This procedure is an inferential statistical hypothesis test, meaning it uses samples to draw conclusions about populations. The independent samples t test is also known as the two sample t test. [Read more…] about Independent Samples T Test

## Frequency Table

Frequency is the number of times a specific data value occurs in your dataset. A frequency table lists a set of values and how often each one appears. They help you understand which data values are common and which are rare. These tables organize your data and are an effective way to present the results to others. Frequency tables are also known as frequency distributions because they allow you to understand the distribution of values in your dataset. [Read more…] about Frequency Table

## Mean Absolute Deviation

The mean absolute deviation (MAD) is a measure of variability that indicates the average distance between observations and their mean. MAD uses the original units of the data, which simplifies interpretation. Larger values signify that the data points spread out further from the average. Conversely, lower values correspond to data points bunching closer to it. The mean absolute deviation is also known as the mean deviation and average absolute deviation. [Read more…] about Mean Absolute Deviation

## Stem and Leaf Plot

Stem and leaf plots display the shape and spread of a continuous data distribution. These graphs are similar to histograms, but instead of using bars, they show digits. It’s a particularly valuable tool during exploratory data analysis. They can help you identify the central tendency, variability, and skewness of your distribution. Additionally, they can help you find outliers. Stem and leaf plots are also known as stemplots. [Read more…] about Stem and Leaf Plot

## Conditional Probability

A conditional probability is the likelihood of an event occurring given that another event has already happened. Conditional probabilities allow you to evaluate how prior information affects probabilities. When you incorporate existing facts into the calculations, it can change the probability of an outcome. [Read more…] about Conditional Probability

## Cluster Sampling

Cluster sampling is a method of obtaining a representative sample from a population that researchers have divided into groups. An individual cluster is a subgroup that mirrors the diversity of the whole population while the set of clusters are similar to each other. Typically, researchers use this approach when studying large, geographically dispersed populations because it is a cost-controlling measure. [Read more…] about Cluster Sampling

## Stratified Sampling

Stratified sampling is a method of obtaining a representative sample from a population that researchers have divided into relatively similar subpopulations (strata). Researchers use stratified sampling to ensure specific subgroups are present in their sample. It also helps them obtain precise estimates of each group’s characteristics. Many surveys use this method to understand differences between subpopulations better. Stratified sampling is also known as stratified random sampling. [Read more…] about Stratified Sampling

## Skewed Distribution

A skewed distribution occurs when one tail is longer than the other. Skewness defines the asymmetry of a distribution. Unlike the familiar normal distribution with its bell-shaped curve, these distributions are asymmetric. The two halves of the distribution are not mirror images because the data are not distributed equally on both sides of the distribution’s peak. [Read more…] about Skewed Distribution

## Heterogeneity

Heterogeneity is defined as a dissimilarity between elements that comprise a whole. When heterogeneity is present, there is diversity in the characteristic under study. The parts of the whole are different, not the same. It is an essential concept in science and statistics. Heterogeneous is the opposite of homogeneous. [Read more…] about Heterogeneity

## Control Variables

Control variables are properties that researchers hold constant for all observations in an experiment. While these variables are not the primary focus of the research, keeping their values consistent helps the study establish the true relationships between the independent and dependent variables. Control variables are different from control groups. [Read more…] about Control Variables

## Pareto Charts

A Pareto chart is a specialized bar chart that displays categories in descending order and a line chart representing the cumulative amount. The chart effectively communicates the categories that contribute the most to the total. Frequently, quality analysts use Pareto charts to identify the most common types of defects or other problems.

Learn how to use and interpret these charts and understand the Pareto principle and the 80/20 rule that are behind it. I’ll also show you how to create them using Excel. [Read more…] about Pareto Charts

## Orthogonality

Orthogonality is a mathematical property that is beneficial for statistical models. It’s particularly helpful when performing factorial analysis of designed experiments. [Read more…] about Orthogonality

## Percent Error

Percent error compares an estimate to a correct value and expresses the difference between them as a percentage. This statistic allows analysts to understand the size of the error relative to the true value. It is also known as percentage error and % error. [Read more…] about Percent Error

## Accuracy vs Precision

Accuracy and precision are crucial properties of your measurements when you’re relying on data to draw conclusions. Both concepts apply to a series of measurements from a measurement system.

Measurement systems facilitate the quantification of characteristics for data collection. They include a collection of instruments, software, and personnel necessary to assess the property of interest. For example, a research project studying bone density will devise a measurement system to produce accurate and precise measurements of bone density. [Read more…] about Accuracy vs Precision

## Control Group in an Experiment

A control group in an experiment does not receive the treatment. Instead, it serves as a comparison group for the treatments. Researchers compare the results of a treatment group to the control group to determine the effect size, also known as the treatment effect. [Read more…] about Control Group in an Experiment

## Range of a Data Set

The range of a data set is the difference between the maximum and the minimum values. It measures variability using the same units as the data. Larger values represent greater variability.

The range is the easiest measure of dispersion to calculate and interpret in statistics, but it has some limitations. In this post, I’ll show you how to find the range mathematically and graphically, interpret it, explain its limitations, and clarify when to use it. [Read more…] about Range of a Data Set

## Z-score: Definition, Formula, and Uses

A z-score measures the distance between a data point and the mean using standard deviations. Z-scores can be positive or negative. The sign tells you whether the observation is above or below the mean. For example, a z-score of +2 indicates that the data point falls two standard deviations above the mean, while a -2 signifies it is two standard deviations below the mean. A z-score of zero equals the mean. Statisticians also refer to z-scores as standard scores, and I’ll use those terms interchangeably. [Read more…] about Z-score: Definition, Formula, and Uses

## Pascal’s Triangle

Pascal’s triangle is a number pattern that fits in a triangle. It is named after Blaise Pascal, a French mathematician, and it has many beneficial mathematic and statistical properties, including finding the number of combinations and expanding binomials. [Read more…] about Pascal’s Triangle