In the field of statistics, data are vital. Data are the information that you collect to learn, draw conclusions, and test hypotheses. After all, statistics is the science of learning from data. However, there are different types of variables, and they record various kinds of information. Crucially, the type of information determines what you can learn from it, and, importantly, what you cannot learn from it. Consequently, it’s essential that you understand the different types of data.
The term “data” carries strong preconceived notions with it. It almost becomes something that is separate from reality. Throughout this post, I want you to think about data as information in a study area that you are gathering to answer a question. For example:
- Do flu shots prevent the flu?
- Does exercise improve your health?
- Does a gasoline additive improve gas mileage?
When you assess any of these questions, there’s a wide array of characteristics that you can record. For example, in a study that uses human subjects, you can log numerical measurements such as height and weight. However, you can also designate properties such as gender, marital status, and health concerns. For some characteristics, you can record them in multiple ways. For instance, you can measure a subject’s body fat percentage, or you can indicate whether they are medically obese or not.
In this blog post, you’ll learn about the different types of variables, what you can learn from them, and how to graph the values using intuitive examples. I also include links to more in-depth posts where I show you how to pick the correct statistical analyses based on the types of variables that you have.
Quantitative versus Qualitative Data
The distinction between quantitative and qualitative data is the most fundamental way to divide types of data. Is the characteristic something you can objectively measure with numbers or not?
Quantitative: The information is recorded as numbers and represents an objective measurement or a count. Temperature, weight, and a count of transactions are all quantitative data. Analysts also refer to this type as numerical data.
Qualitative: The information represents characteristics that you do not measure with numbers. Instead, the observations fall within a countable number of groups. In fact, this type of variable can capture information that isn’t easily measured and can be subjective. Taste, eye color, architectural style, and marital status are all types of qualitative variables.
Within these two broad divisions, there are various subtypes.
Types of Quantitative Data: Continuous and Discrete
When you can represent the information you’re gathering with numbers, you are collecting quantitative data. This class encompasses two categories.
Continuous variables can take on any numeric value, and it can be meaningfully divided into smaller increments, including fractional and decimal values. There are an infinite number of possible values between any two values. Typically, you measure continuous variables on a scale. For example, when you measure height, weight, and temperature, you have continuous data.
With continuous variables, you can assess measures of central tendency and variability, such as the mean, median, distribution, range, and standard deviation. For example, the mean height in the U.S. is 5 feet 9 inches for men and 5 feet 4 inches for women.
How to graph continuous data
Histograms are a standard way to graph continuous variables because they show the distribution of the values. The histogram below helps you determine whether the distribution of body fat percentage values for adolescent girls are symmetric or skewed; understand the range of values; and, identify where the most common values fall.
Dot plots provide the same types of information as histograms. For more information, read my Guide to Dot Plots.
Related post: Using Histograms to Understand Your Data
When you have two continuous variables, you can graph them using a scatterplot. The scatterplot shows how the body fat percentage tends to rise as BMI increases. Use correlation to assess the strength of this relationship or regression analysis to derive the equation for the line that provides the best fit for these data. For more information, read my Guide to Scatterplots.
When you have continuous variables that are divided into groups, you can use a boxplot to display the central tendency and spread of each group. Fertilizer Type C is associated with the highest plant growth while Type B produces the greatest variability.
Please notice how with continuous variables you can assess the wide variety of properties that I illustrate above. You’ll see a contrast when we get to qualitative variables.
Discrete quantitative data are a count of the presence of a characteristic, result, item, or activity. These measures cannot be meaningfully divided into smaller increments. For example, a single household can have 1 or 2 cars, but it cannot have 1.6. There are a finite number of possible values that you can record for an observation.
With discrete variables, you can calculate and assess a rate of occurrence or a summary of the count, such as the mean, sum, and standard deviation. For example, U.S. households had an average of 2.11 vehicles in 2014.
Bar charts are a standard way to graph discrete variables. Each bar represents a distinct value, and the height represents its proportion in the entire sample.
See how I used a line plot to graph the count of coronavirus cases by country.
Qualitative Data: Categorical, Binary, and Ordinal
When you record information that categorizes your observations, you are collecting qualitative data. There are three types of qualitative variables—categorical, binary, and ordinal. With these data types, you’re often interested in the proportions of each category. Consequently, bar charts and pie charts are conventional methods for graphing qualitative variables because they are useful for displaying the relative percentage of each group out of the entire sample.
As I mentioned in the section about continuous variables, notice how we learn much less from qualitative data. I highlight this aspect in the section about binary variables. In cases where you have a choice about recording a characteristic as a continuous or qualitative variable, the best practice is to record the continuous data because you can learn so much more.
Categorical data have values that you can put into a countable number of distinct groups based on a characteristic. For a categorical variable, you can assign categories, but the categories have no natural order. Analysts also refer to categorical data as both attribute and nominal variables. For example, college major is a categorical variable that can have values such as psychology, political science, engineering, biology, etc.
The categorical data in the pie chart are the results of a PPG Industries study of new car colors in 2012.
Related post: Guide to Pie Charts
Binary data can have only two values. If you can place an observation into only two categories, you have a binary variable. Statisticians also refer to binary data as both dichotomous and indicator variables. For example, pass/fail, male/female, and the presence/absence of a characteristic are all binary data.
Binary variables are helpful for calculating proportions or percentages, such as the proportion of defective products in a sample. You just take the number of faulty products and divide by the sample size.
The binary yes/no data for the pie chart are based on the continuous body fat percentage data in the histogram above. Compare how much we learn from the continuous data that the histogram displays as a distribution compared to the simple proportion that the binary version of the data provides in the pie chart below.
Related post: Maximizing the Value of Your Binary Data
Ordinal data have at least three categories, and the categories have a natural order. Examples of ordinal variables include overall status (poor to excellent), agreement (strongly disagree to strongly agree), and rank (such as sporting teams).
Analysts often consider ordinal variables to have a combination of qualitative and quantitative properties. Analysts often represent ordinal variables using numbers, such as a 1-5 Likert scale that measures satisfaction. In number form, you can calculate average scores as with quantitative variables. However, the numbers have limited usefulness because the differences between ranks might not be constant.
For example, first, second, and third in a race are ordinal data. The difference in time between first and second place might not be the same the difference between second and third place.
The bar chart below displays the proportion of each service rating category in their natural order.
How to Choose Statistical Analyses Based on Data Types
So, you understand the different types of data, what you can learn from them, and how to graph them—how else can you use this knowledge? In statistics, the type of variable greatly determines which kinds of analyses you can perform. Read the following posts to learn how to choose a statistical analysis based on the types of variables that you have.
Choosing Hypothesis Tests for Continuous, Binary, and Count Data: Hypothesis tests use sample data to evaluate claims about an entire population. The correct test depends on your variables.
Chi-squared test of independence when you have two or more categorical variables: This hypothesis test determines whether there is a statistically significant relationship between categorical variables.
Choosing the Correct Type of Regression Analysis Based on Data Type: Regression analysis describes the relationship between a set of independent variables and a dependent variable. The choice depends on the type of data you have for the dependent variable.