Discrete vs continuous data are two broad categories of numeric variables. Numeric variables represent characteristics that you can express as numbers rather than descriptive language.
When you have a numeric variable, you need to determine whether it is discrete or continuous.
In broad strokes, the critical factor is the following:
- You count discrete data.
- You measure continuous data.
Let’s dig a little deeper into the differences! I’ll explain the differences and provide examples of discrete vs continuous data.
Related post: What is a Variable?
What is Discrete Data?
Discrete variables can only assume specific values that you cannot subdivide. Typically, you count them, and the results are integers. For example, if you work at an animal shelter, you’ll count the number of cats.
Discrete data can only take on specific values. For example, you might count 20 cats at the animal shelter. These variables cannot have fractional or decimal values. You can have 20 or 21 cats, but not 20.5! Natural numbers have discrete values.
Other examples of discrete variables include the following:
- The number of books you check out from the library.
- The number of heads in a sequence of coin tosses.
- The result of rolling a die.
- The number of patients in a hospital.
- The population of a country.
While discrete data have no decimal places, the average of these values can be fractional. For example, families can have only a discrete number of children: 1, 2, 3, etc. However, the average number of children per family can be 2.2.
Frequently, you’ll use bar charts to graph discrete data because the separate bars emphasize the distinct nature of each value. However, it’s appropriate to use other graphs as well.
When you have discrete values of a qualitative nature (i.e., attributes rather than numbers), it’s called categorical or nominal data.
What is Continuous Data?
Continuous variables can assume any numeric value and can be meaningfully split into smaller parts. Consequently, they have valid fractional and decimal values. In fact, continuous data have an infinite number of potential values between any two points. Generally, you measure them using a scale.
When you see decimal places for individual values, you’re looking at a continuous variable.
Examples of continuous data include weight, height, length, time, and temperature.
Frequently, you’ll use histograms and scatterplots to graph continuous data. These graphs are designed to handle values that fall on a continuous spectrum and have decimal places.
Discrete vs. Continuous Data Summary
Discrete Data | Continuous Data |
Specific values that you cannot divide. | Infinite number of fractional values between any two values. |
Counting | Measuring |
Both types of variables are essential in statistics. At the animal shelter, after counting the cats, you’ll weigh them. The counts are discrete values while their weights are continuous. Chances are you’ll need to analyze both types of variables.
It’s vital to recognize discrete vs continuous data because there are different ways to graph and analyze them. To learn more about how to assess different types of variables, read the following posts:
- Levels of Measurement: Nominal, Ordinal, Interval, and Ratio Scales
- Variable Types and How to Graph Them
- Comparing Hypothesis Tests by Types of Variables
- Choosing Regression Analysis Based on Data Types
- Probability Distributions for Discrete and Continuous Variables
John Drinkwater says
I have a general question about discrete/continuous data. Currently, professional golf is analyzed by a metric known as “strokes gained.” The stat compares golfers’ performance against a benchmark, and aggregates the result. For example, let’s say on a given golf hole, the field of professionals averages 4.37 strokes. If a player plays the hole in 5 strokes, he would accrue -0.63 “strokes gained.” The same principle can be used to analyze each individual shot on the given hole, meaning the tee shot might have a negative value for strokes gained, but the approach shot or the putts might have positive values. Let’s say a player is on the green with a 30-foot putt. The field average for 30 foot putts is 1.7. If the player makes the putt, he is awarded +0.7 strokes gained for his putting. Numbers are typically converted to a per-round number, so that one can says something like “Player X gained 2.3 strokes per round over the field from his tee shots,” and so forth. My question is, are there any limitations to this type of analysis, given that strokes in golf are discrete data, and the statistic yields fractional/decimal results? For example, consider player A, with an average of +1.7 strokes gained putting, and player B, with +1.2 strokes gained putting. Is A really one-half putt better than B? Is half a stroke a meaningful thing? This statistic is used extensively in broadcasting golf and in discussions about golfers, but I have an uneasy feeling that the validity or power of the statistic is limited in some way by the fact that the data used for the benchmarks are discrete and not continuous. Thank you.
Jim Frost says
Hi John,
For starters, these are count data and it’s certainly correct to calculate averages with decimal places. Also those averages are based on a large sample size. So, those are perfectly valid. It’s kind of like how in the U.S. a family used to have an average 2.2 children! (I’m not sure if that’s the correct number still.) But you know the typical family has between 2-3 children, but usually closer to two. Same idea with the average strokes.
I’m not a golfer and don’t following golfing coverage so I’m not completely sure how they’re used. But, it does sound like an effective way to compare golfers in a specific game to the average professional golfer. You get a sense of how they’re doing at any given point compared to a professional golfing standard.
I don’t know enough to say whether a half a stroke difference is meaningful or not. It’s statistically valid but not necessarily meaningful in the real world. It might be but I just don’t know. You’d have to study golf and see cases where players are half a stroke different and then decide if represents a meaningful difference in the quality of play.
We often wrestle with that distinction in statistics where something can be statistically significant but not practically meaningful. There is a line there but you need to use subject-area knowledge to make the determination. But even if a half a stroke doesn’t represent a meaningful difference, the metric as a whole sounds promising to me. You can still determine where players fall relative to the average and some difference value (maybe a whole stroke) will be a meaningful even if half a stroke isn’t.
Overall, I think it’s a valid and good approach. It provides a benchmark for how well players are doing in a specific game compared to the average professional player. Despite the decimal place in the metric, it provides an intuitive value for how much better or worse a player is doing compared to the benchmark: worse, a little better, a lot better, etc.
I hope that helps!
Richard says
Why does one have to add 1 before dividing by 2 to estimate the median position for discrete data, but not for continuous? Surely the middle of N samples, is the (N+1)/2 th sample, irrespective of whether the actual data samples themselves are discrete or continuous.
E.g. if I have 10 shoe sizes, the median would be the 5.5th value. But if I had 10 temperatures, the median value would be the 5th. How can the middle move? Surely the middle is the middle, irrespective of what it’s the actual middle of?
Any enlightenment, gratefully received!
Jim Frost says
Hi Richard,
I don’t know where you heard that rule for discrete vs. continuous data. It certainly wasn’t on my blog! There is no such rule for discrete vs. continuous. However, it does vary depending on whether you have an even vs. odd number of data points. The reason? Because there is no middle for a dataset with even numbers.
As I point out in this post, when you have an even number of data points (continuous or discrete), you take the average of the two innermost points. So, if you have 10 data points, it’s the average of the 5th and 6th values because there is no value at the 5.5th position! Data rank is itself a discrete variable, which means you cannot have a 5.5th ranked data point.
When you have an odd number, it’s the middle point. So, if you have 11 data points, the median is the 6th data point. It’s in this case where you can use the method of (N + 1) / 2. (11 + 1) / 2 = 6.
But that doesn’t work for an even number whether you add one or not. There is no middle point, so you need to take the average.
Reread this blog post carefully and see how you calculate the median for even and odd number of data points. It varies based on even/odd but not discrete vs. continuous.
I hope that helps!
John Holmes says
In a car park, for example , if all the slots from 3 to 7 are filled, the number of cars is 5 occupying those slots ( n2-n1) +1
Is this what you are thinking of ?
John Nikola says
Thank you so much, Jim! I appreciate your help. Rest assured, I’ll share this information with my peers.
Jim Frost says
You’re very welcome! And thanks for sharing. You were quite right in being concerned by that practice!
John Nikola says
Hello Jim! Would it be considered “statistically valid” to transform a categorical variable into discrete values (for instance, from 7 categories to a numerical range of 1-7), and then incorporate it as an axis in a scatter plot. Then add a trendline and assume a potential relationship with the variable it’s plotted against? I’ve observed these types of visualizations among my peers in my data science capstone course.
Jim Frost says
Hi John,
No, that’s not valid. Categorical variables are divided into mutually exclusive categories that statisticians call levels. These levels do not have a natural order, nor do they provide any quantitative information. They are category names and all you can do is name the group to which each observation belongs. In short, you can’t place them on an X-axis with a meaningful order or meaningful distances between the groups. (You can show them spread along the X-axis but the distances between levels are meaningless and the groups have no natural order along it.)
Think of categorical variables like college major, profession, and literature genre. There’s no natural order to list them. There’s no distance between them. They are just different types.
Now, you can measure another variable, say income, and calculate averages based on a categorical variable. For example, average income by profession or college major. You can sort the categories, say college major, by income. However, that’s just for convenience and not based on a natural order for college major. It’s inappropriate to use a trendline in that example because there is no distance along the X-axis for the categorical variable. A trendline has a slope, which is the rise/run. However, with a categorical independent variable, the “run” is meaningless/incalculable.
In these cases, you need to use an analysis like ANOVA, which determines whether the group means are unequal. Using a trendline is inappropriate in cases where you have a categorical independent variable and a continuous dependent variable.
I hope that helps!