The beta distribution is a continuous probability distribution that models random variables with values falling inside a finite interval. Use it to model subject areas with both an upper and lower bound for possible values. Analysts commonly use it to model the time to complete a task, the distribution of order statistics, and the prior distribution for binomial proportions in Bayesian analysis. [Read more…] about Beta Distribution: Uses, Parameters & Examples
What is a Geometric Distribution?
The geometric distribution is a discrete probability distribution that calculates the probability of the first success occurring during a specific trial. In other words, during a series of attempts, what is the probability of success first occurring during each attempt? Use this distribution when you need to understand how many attempts are necessary to produce the first successful outcome. [Read more…] about Geometric Distribution: Uses, Calculator & Formula
What is a Conditional Distribution?
A conditional distribution is a distribution of values for one variable that exists when you specify the values of other variables. This type of distribution allows you to assess the dispersal of your variable of interest under specific conditions, hence the name. [Read more…] about Conditional Distribution: Definition & Finding
What is a Marginal Distribution?
A marginal distribution is a distribution of values for one variable that ignores a more extensive set of related variables in a dataset.
That definition sounds a bit convoluted, but the concept is simple. The idea is that when you have a larger set of related variables that you collected for a study, you might want to focus on one of them to answer a specific question. [Read more…] about Marginal Distribution: Definition & Finding
What is a Contingency Table?
A contingency table displays frequencies for combinations of two categorical variables. Analysts also refer to contingency tables as crosstabulation and two-way tables. [Read more…] about Contingency Table: Definition, Examples & Interpreting
What is Cumulative Frequency?
Cumulative frequency is the running total of frequencies in a table. Use cumulative frequencies to answer questions about how often a characteristic occurs above or below a particular value. It is also known as a cumulative frequency distribution.
For example, how many students are in the 4th grade or lower at a school?
Cumulative frequency builds on the concepts of frequency and frequency distribution.
- Frequency: The number of times a value occurs in a dataset. For example, there are 12 4th graders in the school.
- Frequency distribution: A table that lists all values in the dataset and how many times each one occurs. Learn more about Frequency Tables.
In this post, learn how to find and construct cumulative frequency distributions for both discrete and continuous data. I’ll also show you how to create less than and greater than versions of these tables.
How to Find Cumulative Frequency
Finding a cumulative frequency distribution makes the most sense when your data have a natural order. The natural ordering allows the cumulative running total to be meaningful. With a minor change, the process works with both discrete and continuous data. Learn more about the differences between Discrete vs. Continuous Data.
For example, the grades in a school, months of a year, or age in years are discrete values with a logical order. Alternatively, when you have continuous data, you can create ranges of values known as classes. In this case, frequencies are counts of how often continuous data fall within each class.
Calculate cumulative frequency by starting at the top of a frequency table and working your way down. Take each row’s frequency and add all preceding rows. By summing the current and previous rows, you calculate the running total.
Let’s use this method to find cumulative frequency for discrete and continuous data.
Construct the Cumulative Frequency Distribution for Discrete Data
The example below shows you how to construct a cumulative frequency distribution for a discrete dataset of school grades (1 – 6). Notice how each row takes the previous cumulative frequency and then adds the frequency for that row to calculate the running total.
For example, if we look at the 3rd grade row of the table, we’ll see that the cumulative frequency is 58. This result tells us that 58 students are in the third grade and lower.
In this table, the cumulative frequency for the highest value equals the total number of observations in the dataset because all values are less than or equal to it. 6th grade is the highest value, and 88 students are less than or equal to it. Hence, we know there are 88 students in this dataset.
Construct the Cumulative Frequency Distribution for Continuous Data
When you have continuous data, you might not have any repeating values.
For example, no values repeat in the portion of the height data below. Consequently, you’d have a series of values, each having a frequency of one. These are actual data from a study I conducted involving preteen girls. The full dataset has 88 values. You can download the Excel file with the data and table: HeightFrequencyTable.
However, you can obtain meaningful information by grouping the values into ranges and finding the frequency for each class, as shown below.
Then, to create the cumulative frequency table, sum each row with all preceding rows just as we did for the discrete data example.
For example, by looking at the row for 1.46 – 1.51m, we know that 49 preteen girls (just over half the sample of 88) have heights that are less than or equal to 1.51m.
Less Than vs. Greater Than Forms of the Table
Both the preceding examples use the “less than” form of the table. When you look at those cumulative frequency tables, the value indicates the total number of observations that are less than or equal to a specific value. For example, 70 students are in 4th grade or lower.
However, what should you do when you need to understand frequencies that are greater than or equal to a particular value? Simply switch the order of values in the table to list them from highest to lowest. This process constructs a greater than cumulative frequency distribution.
In the example below, I’ll recreate the grade level table, but instead of listing the grades 1 → 6, I’ll switch it to 6 → 1. From that point, I’ll use the same method of summing the current row with all previous rows.
In this greater than distribution, the cumulative frequencies indicate the number of observations greater than or equal to a particular value. For example, 30 students are in 4th grade or higher.
In this table, the cumulative frequency for the lowest value equals the total number of observations because all observations are greater than or equal to it. 1st grade is the lowest value with 88 students greater than or equal to it.
The decision to use a less than or greater than cumulative frequency table depends on which form is most helpful for your subject area.
You can also show cumulative frequency on graphs. In the bar chart below, I added the orange cumulative line. By displaying it in a chart, it’s easy to find where most observations occur. Learn more about Bar Charts: Using, Examples and Interpreting.
In the graph, first and second graders comprise nearly half the school. As the grade levels progress from low to high, the orange line rises to the total number of students, 88.
Relative frequencies are a related concept. Click the link to learn about similarities and differences!
The chi-square goodness of fit test evaluates whether proportions of categorical or discrete outcomes in a sample follow a population distribution with hypothesized proportions. In other words, when you draw a random sample, do the observed proportions follow the values that theory suggests. [Read more…] about Chi-Square Goodness of Fit Test: Uses & Examples
A bimodal distribution has two peaks. In the context of a continuous probability distribution, modes are peaks in the distribution. The graph below shows a bimodal distribution. [Read more…] about Bimodal Distribution: Definition, Examples & Analysis
What are Quartiles?
Quartiles are three values that split your dataset into quarters. [Read more…] about Quartile: Definition, Finding, and Using
What is Kurtosis?
Kurtosis is a statistic that measures the extent to which a distribution contains outliers. It assesses the propensity of a distribution to have extreme values within its tails. There are three kinds of kurtosis: leptokurtic, platykurtic, and mesokurtic. Statisticians define these types relative to the normal distribution. Higher kurtosis values indicate that the distribution has more outliers falling relatively far from the mean. Distributions with smaller values have a lower tendency for producing extreme values. When you’re assessing a sample, outliers have the greatest impact on this statistic. [Read more…] about Kurtosis: Definition, Leptokurtic & Platykurtic
What is a Binomial Distribution?
The binomial distribution is a discrete probability distribution that calculates the probability an event will occur a specific number of times in a set number of opportunities. Use the binomial distribution when your outcome is binary. Binary outcomes have only two possible values that are mutually exclusive. [Read more…] about Binomial Distribution: Uses, Calculator & Formula
These F-tables provide the critical values for right-tail F-tests. Your F-test results are statistically significant when its test statistic is greater than this value. [Read more…] about F-table
What is a Sampling Distribution?
A sampling distribution of a statistic is a type of probability distribution created by drawing many random samples of a given size from the same population. These distributions help you understand how a sample statistic varies from sample to sample. [Read more…] about Sampling Distribution: Definition, Formula & Examples
What is a Critical Value?
A critical value defines regions in the sampling distribution of a test statistic. These values play a role in both hypothesis tests and confidence intervals. In hypothesis tests, critical values determine whether the results are statistically significant. For confidence intervals, they help calculate the upper and lower limits. [Read more…] about Critical Value: Definition, Finding & Calculator
This chi-square table provides the critical values for chi-square (χ2) hypothesis tests. The column and row intersections are the right-tail critical values for a given probability and degrees of freedom. [Read more…] about Chi-Square Table
A z-table, also known as the standard normal table, provides the area under the curve to the left of a z-score. This area represents the probability that z-values will fall within a region of the standard normal distribution. Use a z-table to find probabilities corresponding to ranges of z-scores and to find p-values for z-tests. [Read more…] about Z-table
This t-distribution table provides the critical t-values for both one-tailed and two-tailed t-tests, and confidence intervals. Learn how to use this t-table with the information, examples, and illustrations below the table. [Read more…] about T-Distribution Table of Critical Values
What is the 5 Number Summary?
The 5 number summary is an exploratory data analysis tool that provides insight into the distribution of values for one variable. Collectively, this set of statistics describes where data values occur, their central tendency, variability, and the general shape of their distribution. [Read more…] about 5 Number Summary: Definition, Finding & Using
What is the Lognormal Distribution?
The lognormal distribution is a continuous probability distribution that models right-skewed data. The shape of the lognormal distribution is comparable to the Weibull and loglogistic distributions. [Read more…] about Lognormal Distribution: Uses, Parameters & Examples
In the United States, our Thanksgiving holiday is fast approaching. On this day, we give thanks for the good things in our lives.
For this post, I wanted to quantify how thankful we should be. Ideally, I’d quantify something truly meaningful, like happiness. Unfortunately, most countries are not like Bhutan, which measures the gross national happiness and incorporates it into their five-year development plans.
Instead, I’ll focus on something that is more concrete and regularly measured around the world—income. By examining income distributions, I’ll show that you have much to be thankful for, and so does most of the world! [Read more…] about A Statistical Thanksgiving: Global Income Distributions