Reliability and validity are criteria by which researchers assess measurement quality. Measuring a person or item involves assigning scores to represent an attribute. This process creates the data that we analyze. However, to provide meaningful research results, that data must be good. And not all data are good! [Read more…] about Reliability vs Validity

# Blog

## Nominal, Ordinal, Interval, and Ratio Scales

The nominal, ordinal, interval, and ratio scales are levels of measurement in statistics. These scales are broad classifications describing the type of information recorded within the values of your variables. Variables take on different values in your data set. For example, you can measure height, gender, and class ranking. Each of these variables uses a distinct level of measurement. [Read more…] about Nominal, Ordinal, Interval, and Ratio Scales

## Odds Ratio

An odds ratio (OR) quantifies the relationship between a variable and the likelihood of an event occurring. A common use for odds ratios is identifying risk factors by assessing the relationship between exposure to a risk factor and a medical outcome. For example, is there an association between exposure to a chemical and a disease? [Read more…] about Odds Ratio

## Case-Control Study

A case-control study is a retrospective, observational study that compares two existing groups. Researchers form these groups based on the existence of a condition in the case group and the lack of that condition in the control group. They evaluate the differences in the histories between these two groups looking for factors that might cause a disease. [Read more…] about Case-Control Study

## Five-Number Summary

The five-number summary is an exploratory data analysis tool that provides insight into the distribution of values for one variable. Collectively, this set of statistics describes where data values occur, their central tendency, variability, and the general shape of their distribution. [Read more…] about Five-Number Summary

## Simple Random Sampling

Simple random sampling (SRS) is a probability sampling method where researchers randomly choose participants from a population. All population members have an equal probability of being selected. This method tends to produce representative, unbiased samples. [Read more…] about Simple Random Sampling

## Convenience Sampling

Convenience sampling is a non-probability sampling method where researchers use subjects who are easy to contact and obtain their participation. Researchers find participants in the most accessible places, and they impose no inclusion requirements. Convenience sampling is also known as opportunity or availability sampling. [Read more…] about Convenience Sampling

## Systematic Sampling

Systematic sampling is a probability sampling method for obtaining a representative sample from a population. To use this method, researchers start at a random point and then select subjects at regular intervals of every n^{th} member of the population. Like other probability sampling methods, the researchers must identify their population of interest before sampling from it. [Read more…] about Systematic Sampling

## Lognormal Distribution

The lognormal distribution is a continuous probability distribution that models right-skewed data. The shape of the lognormal distribution is comparable to the Weibull and loglogistic distributions. [Read more…] about Lognormal Distribution

## A Statistical Thanksgiving: Global Income Distributions

In the United States, our Thanksgiving holiday is fast approaching. On this day, we give thanks for the good things in our lives.

For this post, I wanted to quantify how thankful we should be. Ideally, I’d quantify something truly meaningful, like happiness. Unfortunately, most countries are not like Bhutan, which measures the gross national happiness and incorporates it into their five-year development plans.

Instead, I’ll focus on something that is more concrete and regularly measured around the world—income. By examining income distributions, I’ll show that you have much to be thankful for, and so does most of the world! [Read more…] about A Statistical Thanksgiving: Global Income Distributions

## Variance

Variance is a measure of variability in statistics. It assesses the average squared difference between data values and the mean. Unlike some other statistical measures of variability, it incorporates all data points in its calculations by contrasting each value to the mean. [Read more…] about Variance

## Mean Squared Error (MSE)

Mean squared error (MSE) measures the amount of error in statistical models. It assesses the average squared difference between the observed and predicted values. When a model has no error, the MSE equals zero. As model error increases, its value increases. The mean squared error is also known as the mean squared deviation (MSD). [Read more…] about Mean Squared Error (MSE)

## Natural Numbers

Natural numbers are the numbers you use for counting—all the positive integers from 1 to infinity. They are numbers that occur in nature and are the fundamental origins of the number system. [Read more…] about Natural Numbers

## Validity

Validity in research, statistics, psychology, and testing evaluates how well test scores reflect what they’re supposed to measure. Does the instrument measure what it claims to measure? Do the measurements reflect the underlying reality? Or, do they quantify something else? [Read more…] about Validity

## Internal and External Validity

Internal and external validity relate to the findings of studies and experiments. [Read more…] about Internal and External Validity

## Uniform Distribution

The uniform distribution is a symmetric probability distribution where all outcomes have an equal likelihood of occurring. All values in the distribution have a constant probability. This distribution is also known as the rectangular distribution because of its shape in probability distribution plots, as I’ll show you below. [Read more…] about Uniform Distribution

## Discrete vs. Continuous Data

Discrete and continuous are two broad categories of numerical data. Numeric variables represent characteristics that you can express as numbers rather than descriptive language.

When you have a numeric variable, you need to determine whether it is discrete or continuous.

In broad strokes, the critical factor is the following:

- You count discrete data.
- You measure continuous data.

Let’s dig a little deeper into the differences!

## Discrete Data

Discrete data can only assume specific values that you cannot subdivide. Typically, you count discrete values, and the results are integers. For example, if you work at an animal shelter, you’ll count the number of cats.

Discrete variables can only take on specific values. For example, you might count 20 cats at the animal shelter. These variables cannot have fractional or decimal values. You can have 20 or 21 cats, but not 20.5! Natural numbers have discrete values.

Other examples of discrete data include:

- The number of books you check out from the library.
- The number of heads in a sequence of coin tosses.
- The result of rolling a die.
- The number of patients in a hospital.
- The population of a country.

While discrete variables have no decimal places, the average of these values can be fractional. For example, families can have only a discrete number of children: 1, 2, 3, etc. However, the average number of children per family can be 2.2.

Frequently, you’ll use bar charts to graph discrete data because the separate bars emphasize the distinct nature of each value. However, it’s appropriate to use other graphs as well.

When you have discrete values of a qualitative nature (i.e., attributes rather than numbers), it’s called categorical or nominal data.

## Continuous Data

Continuous data can assume any numeric value and can be meaningfully split into smaller parts. Consequently, they have valid fractional and decimal values. In fact, continuous variables have an infinite number of potential values between any two points. Generally, you measure them using a scale.

When you see decimal places for individual data points, you’re looking at a continuous variable.

For example, you have continuous data when you measure weight, height, length, time, and temperature.

Frequently, you’ll use histograms and scatterplots to graph continuous variables. These graphs are designed to handle values that fall on a continuous spectrum and have decimal places.

## Discrete vs. Continuous Variables Summary

Discrete |
Continuous |

Specific values that you cannot divide. | Infinite number of fractional values between any two values. |

Counting | Measuring |

Both types are essential in statistics. At the animal shelter, after counting the cats, you’ll weigh them. The counts are discrete values while their weights are continuous. Chances are you’ll need to analyze both types of data.

It’s vital to recognize which types of variables you have because there are different ways to graph and analyze them. To learn more about how to assess different data types, read the following posts:

## Geometric Mean

The geometric mean is a measure of central tendency that equals the n^{th} root of the product of n numbers.

Like the arithmetic mean, the geometric mean finds the center of a dataset. While the arithmetic mean finds the center by summing the values and dividing by the number of observations, the geometric mean finds the center by multiplying and then taking a root of the product. [Read more…] about Geometric Mean

## Ghost Hunting with a Statistics Mindset

I’m very much an empirical, data, statistics, and science type of guy. So, it might be a surprise to learn that I’ve gone ghost hunting a number of times. Now, I’m not a paranormal enthusiast. I’m definitely a skeptic. However, in my view, being skeptical about something does not preclude collecting data about it. I also have friends I trust completely who are sure they’ve experienced paranormal activity. Plus, I don’t need much of an excuse to try something new and unusual! [Read more…] about Ghost Hunting with a Statistics Mindset

## Paired T Test

Use a paired t-test when each subject has a pair of measurements, such as a before and after score. A paired t-test determines whether the mean change for these pairs is significantly different from zero. This test is an inferential statistics procedure because it uses samples to draw conclusions about populations.

Paired t tests are also known as dependent samples t tests. The two samples are dependent because they contain the same subjects. Conversely, an independent samples t test contains different subjects in the two samples. [Read more…] about Paired T Test