The five-number summary is an exploratory data analysis tool that provides insight into the distribution of values for one variable. Collectively, this set of statistics describes where data values occur, their central tendency, variability, and the general shape of their distribution. [Read more…] about Five-Number Summary

# Basics

## Simple Random Sampling

Simple random sampling (SRS) is a probability sampling method where researchers randomly choose participants from a population. All population members have an equal probability of being selected. This method tends to produce representative, unbiased samples. [Read more…] about Simple Random Sampling

## Convenience Sampling

Convenience sampling is a non-probability sampling method where researchers use subjects who are easy to contact and obtain their participation. Researchers find participants in the most accessible places, and they impose no inclusion requirements. Convenience sampling is also known as opportunity or availability sampling. [Read more…] about Convenience Sampling

## Systematic Sampling

Systematic sampling is a probability sampling method for obtaining a representative sample from a population. To use this method, researchers start at a random point and then select subjects at regular intervals of every n^{th} member of the population. Like other probability sampling methods, the researchers must identify their population of interest before sampling from it. [Read more…] about Systematic Sampling

## Variance

Variance is a measure of variability in statistics. It assesses the average squared difference between data values and the mean. Unlike some other statistical measures of variability, it incorporates all data points in its calculations by contrasting each value to the mean. [Read more…] about Variance

## Natural Numbers

Natural numbers are the numbers you use for counting—all the positive integers from 1 to infinity. They are numbers that occur in nature and are the fundamental origins of the number system. [Read more…] about Natural Numbers

## Validity

Validity in research, statistics, psychology, and testing evaluates how well test scores reflect what they’re supposed to measure. Does the instrument measure what it claims to measure? Do the measurements reflect the underlying reality? Or, do they quantify something else? [Read more…] about Validity

## Internal and External Validity

Internal and external validity relate to the findings of studies and experiments. [Read more…] about Internal and External Validity

## Discrete vs. Continuous Data

Discrete and continuous are two broad categories of numerical data. Numeric variables represent characteristics that you can express as numbers rather than descriptive language.

When you have a numeric variable, you need to determine whether it is discrete or continuous.

In broad strokes, the critical factor is the following:

- You count discrete data.
- You measure continuous data.

Let’s dig a little deeper into the differences!

## Discrete Data

Discrete data can only assume specific values that you cannot subdivide. Typically, you count discrete values, and the results are integers. For example, if you work at an animal shelter, you’ll count the number of cats.

Discrete variables can only take on specific values. For example, you might count 20 cats at the animal shelter. These variables cannot have fractional or decimal values. You can have 20 or 21 cats, but not 20.5! Natural numbers have discrete values.

Other examples of discrete data include:

- The number of books you check out from the library.
- The number of heads in a sequence of coin tosses.
- The result of rolling a die.
- The number of patients in a hospital.
- The population of a country.

While discrete variables have no decimal places, the average of these values can be fractional. For example, families can have only a discrete number of children: 1, 2, 3, etc. However, the average number of children per family can be 2.2.

Frequently, you’ll use bar charts to graph discrete data because the separate bars emphasize the distinct nature of each value. However, it’s appropriate to use other graphs as well.

When you have discrete values of a qualitative nature (i.e., attributes rather than numbers), it’s called categorical or nominal data.

## Continuous Data

Continuous data can assume any numeric value and can be meaningfully split into smaller parts. Consequently, they have valid fractional and decimal values. In fact, continuous variables have an infinite number of potential values between any two points. Generally, you measure them using a scale.

When you see decimal places for individual data points, you’re looking at a continuous variable.

For example, you have continuous data when you measure weight, height, length, time, and temperature.

Frequently, you’ll use histograms and scatterplots to graph continuous variables. These graphs are designed to handle values that fall on a continuous spectrum and have decimal places.

## Discrete vs. Continuous Variables Summary

Discrete |
Continuous |

Specific values that you cannot divide. | Infinite number of fractional values between any two values. |

Counting | Measuring |

Both types are essential in statistics. At the animal shelter, after counting the cats, you’ll weigh them. The counts are discrete values while their weights are continuous. Chances are you’ll need to analyze both types of data.

It’s vital to recognize which types of variables you have because there are different ways to graph and analyze them. To learn more about how to assess different data types, read the following posts:

- Data Types and How to Graph Them
- Comparing Hypothesis Tests by Types of Variables
- Choosing Regression Analysis Based on Data Types
- Probability Distributions for Discrete and Continuous Variables

## Geometric Mean

The geometric mean is a measure of central tendency that equals the n^{th} root of the product of n numbers.

Like the arithmetic mean, the geometric mean finds the center of a dataset. While the arithmetic mean finds the center by summing the values and dividing by the number of observations, the geometric mean finds the center by multiplying and then taking a root of the product. [Read more…] about Geometric Mean

## Frequency Table

Frequency is the number of times a specific data value occurs in your dataset. A frequency table lists a set of values and how often each one appears. They help you understand which data values are common and which are rare. These tables organize your data and are an effective way to present the results to others. Frequency tables are also known as frequency distributions because they allow you to understand the distribution of values in your dataset. [Read more…] about Frequency Table

## Mean Absolute Deviation

The mean absolute deviation (MAD) is a measure of variability that indicates the average distance between observations and their mean. MAD uses the original units of the data, which simplifies interpretation. Larger values signify that the data points spread out further from the average. Conversely, lower values correspond to data points bunching closer to it. The mean absolute deviation is also known as the mean deviation and average absolute deviation. [Read more…] about Mean Absolute Deviation

## Cluster Sampling

Cluster sampling is a method of obtaining a representative sample from a population that researchers have divided into groups. An individual cluster is a subgroup that mirrors the diversity of the whole population while the set of clusters are similar to each other. Typically, researchers use this approach when studying large, geographically dispersed populations because it is a cost-controlling measure. [Read more…] about Cluster Sampling

## Stratified Sampling

Stratified sampling is a method of obtaining a representative sample from a population that researchers have divided into relatively similar subpopulations (strata). Researchers use stratified sampling to ensure specific subgroups are present in their sample. It also helps them obtain precise estimates of each group’s characteristics. Many surveys use this method to understand differences between subpopulations better. Stratified sampling is also known as stratified random sampling. [Read more…] about Stratified Sampling

## Skewed Distribution

A skewed distribution occurs when one tail is longer than the other. Skewness defines the asymmetry of a distribution. Unlike the familiar normal distribution with its bell-shaped curve, these distributions are asymmetric. The two halves of the distribution are not mirror images because the data are not distributed equally on both sides of the distribution’s peak. [Read more…] about Skewed Distribution

## Heterogeneity

Heterogeneity is defined as a dissimilarity between elements that comprise a whole. When heterogeneity is present, there is diversity in the characteristic under study. The parts of the whole are different, not the same. It is an essential concept in science and statistics. Heterogeneous is the opposite of homogeneous. [Read more…] about Heterogeneity

## Control Variables

Control variables are properties that researchers hold constant for all observations in an experiment. While these variables are not the primary focus of the research, keeping their values consistent helps the study establish the true relationships between the independent and dependent variables. Control variables are different from control groups. [Read more…] about Control Variables

## Percent Error

Percent error compares an estimate to a correct value and expresses the difference between them as a percentage. This statistic allows analysts to understand the size of the error relative to the true value. It is also known as percentage error and % error. [Read more…] about Percent Error

## Accuracy vs Precision

Accuracy and precision are crucial properties of your measurements when you’re relying on data to draw conclusions. Both concepts apply to a series of measurements from a measurement system.

Measurement systems facilitate the quantification of characteristics for data collection. They include a collection of instruments, software, and personnel necessary to assess the property of interest. For example, a research project studying bone density will devise a measurement system to produce accurate and precise measurements of bone density. [Read more…] about Accuracy vs Precision

## Control Group in an Experiment

A control group in an experiment does not receive the treatment. Instead, it serves as a comparison group for the treatments. Researchers compare the results of a treatment group to the control group to determine the effect size, also known as the treatment effect. [Read more…] about Control Group in an Experiment