In the field of statistics, data are vital. Data are the information that you collect to learn, draw conclusions, and test hypotheses. After all, statistics is the science of learning from data. However, there are different types of variables, and they record various kinds of information. Crucially, the type of information determines what you can learn from it, and, importantly, what you cannot learn from it. Consequently, it’s essential that you understand the different types of data.

The term “data” carries strong preconceived notions with it. It almost becomes something that is separate from reality. Throughout this post, I want you to think about data as information in a study area that you are gathering to answer a question. For example:

- Do flu shots prevent the flu?
- Does exercise improve your health?
- Does a gasoline additive improve gas mileage?

When you assess any of these questions, there’s a wide array of characteristics that you can record. For example, in a study that uses human subjects, you can log numerical measurements such as height and weight. However, you can also designate properties such as gender, marital status, and health concerns. For some characteristics, you can record them in multiple ways. For instance, you can measure a subject’s body fat percentage, or you can indicate whether they are medically obese or not.

In this blog post, you’ll learn about the different types of variables, what you can learn from them, and how to graph the values using intuitive examples. I also include links to more in-depth posts where I show you how to pick the correct statistical analyses based on the types of variables that you have.

## Quantitative versus Qualitative Data

The distinction between quantitative and qualitative data is the most fundamental way to divide types of data. Is the characteristic something you can objectively measure with numbers or not?

**Quantitative**: The information is recorded as numbers and represents an objective measurement or a count. Temperature, weight, and a count of transactions are all quantitative data. Analysts also refer to this type as numerical data.

**Qualitative**: The information represents characteristics that you do not measure with numbers. Instead, the observations fall within a countable number of groups. In fact, this type of variable can capture information that isn’t easily measured and can be subjective. Taste, eye color, architectural style, and marital status are all types of qualitative variables.

Within these two broad divisions, there are various subtypes.

## Types of Quantitative Data: Continuous and Discrete

When you can represent the information you’re gathering with numbers, you are collecting quantitative data. This class encompasses two categories.

### Continuous data

Continuous variables can take on any numeric value, and it can be meaningfully divided into smaller increments, including fractional and decimal values. There are an infinite number of possible values between any two values. Typically, you measure continuous variables on a scale. For example, when you measure height, weight, and temperature, you have continuous data.

With continuous variables, you can assess properties such as the mean, median, distribution, range, and standard deviation. For example, the mean height in the U.S. is 5 feet 9 inches for men and 5 feet 4 inches for women.

### How to graph continuous data

Histograms are a standard way to graph continuous variables because they show the distribution of the values. The histogram below helps you determine whether the distribution of body fat percentage values for adolescent girls are symmetric or skewed; understand the range of values; and, identify where the most common values fall.

**Related post**: Using Histograms to Understand Your Data

When you have two continuous variables, you can graph them using a scatterplot. The scatterplot shows how the body fat percentage tends to rise as BMI increases. Use correlation to assess the strength of this relationship or regression analysis to derive the equation for the line that provides the best fit for these data.

When you have continuous variables that are divided into groups, you can use a boxplot to display the central tendency and spread of each group. Fertilizer Type C is associated with the highest plant growth while Type B produces the greatest variability.

Please notice how with continuous variables you can assess the wide variety of properties that I illustrate above. You’ll see a contrast when we get to qualitative variables.

**Related posts**: Graphing Continuous Data by Groups: Boxplots vs. Individual Value Plots and Time Series Plots

### Discrete data

Discrete quantitative data are a count of the presence of a characteristic, result, item, or activity. These measures cannot be meaningfully divided into smaller increments. For example, a single household can have 1 or 2 cars, but it cannot have 1.6. There are a finite number of possible values that you can record for an observation.

With discrete variables, you can calculate and assess a rate of occurrence or a summary of the count, such as the mean, sum, and standard deviation. For example, U.S. households had an average of 2.11 vehicles in 2014.

Bar charts are a standard way to graph discrete variables. Each bar represents a distinct value, and the height represents its proportion in the entire sample.

See how I used a line plot to graph the count of coronavirus cases by country.

## Qualitative Data: Categorical, Binary, and Ordinal

When you record information that categorizes your observations, you are collecting qualitative data. There are three types of qualitative variables—categorical, binary, and ordinal. With these data types, you’re often interested in the proportions of each category. Consequently, bar charts and pie charts are conventional methods for graphing qualitative variables because they are useful for displaying the relative percentage of each group out of the entire sample.

As I mentioned in the section about continuous variables, notice how we learn much less from qualitative data. I highlight this aspect in the section about binary variables. In cases where you have a choice about recording a characteristic as a continuous or qualitative variable, the best practice is to record the continuous data because you can learn so much more.

### Categorical data

Categorical data have values that you can put into a countable number of distinct groups based on a characteristic. For a categorical variable, you can assign categories, but the categories have no natural order. Analysts also refer to categorical data as both attribute and nominal variables. For example, college major is a categorical variable that can have values such as psychology, political science, engineering, biology, etc.

The categorical data in the pie chart are the results of a PPG Industries study of new car colors in 2012.

### Binary data

Binary data can have only two values. If you can place an observation into only two categories, you have a binary variable. Statisticians also refer to binary data as both dichotomous and indicator variables. For example, pass/fail, male/female, and the presence/absence of a characteristic are all binary data.

Binary variables are helpful for calculating proportions or percentages, such as the proportion of defective products in a sample. You just take the number of faulty products and divide by the sample size.

The binary yes/no data for the pie chart are based on the continuous body fat percentage data in the histogram above. Compare how much we learn from the continuous data that the histogram displays as a distribution compared to the simple proportion that the binary version of the data provides in the pie chart below.

**Related post**: Maximizing the Value of Your Binary Data

### Ordinal data

Ordinal data have at least three categories, and the categories have a natural order. Examples of ordinal variables include overall status (poor to excellent), agreement (strongly disagree to strongly agree), and rank (such as sporting teams).

Analysts often consider ordinal variables to have a combination of qualitative and quantitative properties. Analysts often represent ordinal variables using numbers, such as a 1-5 Likert scale that measures satisfaction. In number form, you can calculate average scores as with quantitative variables. However, the numbers have limited usefulness because the differences between ranks might not be constant.

For example, first, second, and third in a race are ordinal data. The difference in time between first and second place might not be the same the difference between second and third place.

The bar chart below displays the proportion of each service rating category in their natural order.

## How to Choose Statistical Analyses Based on Data Types

So, you understand the different types of data, what you can learn from them, and how to graph them—how else can you use this knowledge? In statistics, the type of variable greatly determines which kinds of analyses you can perform. Read the following posts to learn how to choose a statistical analysis based on the types of variables that you have.

Choosing Hypothesis Tests for Continuous, Binary, and Count Data: Hypothesis tests use sample data to evaluate claims about an entire population. The correct test depends on your variables.

Chi-squared test of independence when you have two or more categorical variables: This hypothesis test determines whether there is a statistically significant relationship between categorical variables.

Choosing the Correct Type of Regression Analysis Based on Data Type: Regression analysis describes the relationship between a set of independent variables and a dependent variable. The choice depends on the type of data you have for the dependent variable.

Aman Pratap Singh says

i become your fan sir, this is awesome its really cleared all my doubt.

WEN says

Hi, Jim

As a newbie, I found it easier to learn statistic from your excellent writing.

I’m doing reasearch using data of all banks in one country (population, not sample) from year 2000-2015.

Loan ratio is the dependent variable.

I might have issue with this variable due to different measurement. For year 2000-2005, loan ratio includes the lending for productive and consumptive activities, while since 2006 it only covers productive activities. Therefore, the figure of loan ratio drops significantly since 2006.

To handle this issue, can I add dummy variable in regression model that takes value 0 for year<= 2005, and value 1 for years after 2005?

Or should I just use data from 2006, which means fewer observation?

Fyi, there is no different measurement for all independent variables.

I look forward for your help.

Many thanks

WEN

Vaishali says

Thanks a lot Jim. Your notes are so simple to understand statistics. In this lockdown period and after long hours of searching online I finally found your articles which are just wow.

Sincerely wish to thank you .

I really appreciate your efforts.

Ibin Abdo says

Topics were addressed in a brilliant manner. Thanks a lot.

Funmi says

Wow! Statistics made simple. I have never understood this much until now. Thank you for this write-up. Much appreciated!

Jim Frost says

Hi Funmi, you’re very welcome. I strive to present statistics in a simple manner. Consequently, comments like yours absolutely make my day! Thanks for writing!

Dileep Kumar Maurya says

after searching lot of blogs post i found this blogs to get best conceptual start. on every post everyone was teaching math only here i can understand concept behind that method

Werner says

Jim, I really enjoy your blog, as, especially the parts about regression have been extremely valuable for me. That being said, I have to dissent here mainly about pie charts.

I would generally advise against pie charts for various reasons

– they are hogging screen real estate. Face it: a circle is the most uneconomic way of display something on a rectangular screen

– they give you a hard time distinguishing actual proportions, especially when the pie section do not differ that much

– they tend to get messy with legends, descriptions and whatever

– they are plain hell for men (mostly men) with colorsight impairment

I agree that they have some limited use but I wouldn’t use them with more than three data points.

Bar charts/column charts will generally give you a much better view on the data, proportions and all that. Look at the “New cars color” pie chart: a bar chart would allow for a much more intuitive view on the data.

You’ll find this all over the place in the internet, e.g. here: http://www.businessinsider.com/pie-charts-are-the-worst-2013-6?IR=T

I also would advise for distinguishing between bar charts (horizontal) and column charts (vertical). For categorical and ordinal data I always would use the former, as they give you more freedom (and more real estate) for descriptions on the category axis by retaining the general advantages of column charts.

And last but not least: time series data are IMO best depicted on a line chart.

Jim Frost says

Hi Werner,

Thanks for you thoughtful comment. Choosing the best graph to present information clearly can sometimes be as much art as science. The analyst’s preferences will also play a role in that choice.

Personally, I think pie charts are fine in certain cases. In particular, they are the best chart for conveying at a glance the fact that you’re looking at proportions of a whole. On a bar chart, you have to look at the axis carefully to understand this facet. Also, pie charts don’t necessarily have to take up more room than a bar chart. Although, I agree that when you have too many categories, the legend and labels can be too cluttered. Bar charts are better in those cases. I have to admit, I didn’t think of the color blindness issue. You have to weigh all of these factors!

There are defenders of pie charts as well.

I definitely agree about time series charts. At some point I’ll add it to this post!

Thanks again for the insightful comment!

Jim

Sami econ says

Dear jim,

Your written information about types of data are very beneficial and valuable. I proud of you that you are helping us by posting such types of lectures. Welldone sir.

Sir as i commented on one of your earler post about binary data analysis. I, once again request you to please share more detailed information on catagorical regression in you own words or

Sir if you have no time then please suggest me a relevant book name and its author name please. I want to learn more detail about categorical data analysis. Thanks

Regards

Sami ullah,

Ph.D student of economics,

Pakistan.

Jim Frost says

Hi Sami, this is definitely on my list of topics to write about, but you’ll need a little patience! It’ll probably be a couple of months before I can get to it. If you need information earlier, most regression textbooks should talk about logistic regression analysis.

Chuck says

Hi Jim,

I really appreciate this guide! But I have a question. Elsewhere I’ve seen data types differentiated by the acronym NOIR: Nominal, Ordinal, Interval, and Ratio. In these situations “Qualitative” is replaced by “Categorical” (making two major groups Quantitative and Categorical instead of Quantitative and Qualitative), followed by two subgroups in each: Nominal and Ordinal as subgroups of Categorical; and Interval and Ratio as subgroups of Quantitative.

These differences can drive confusion on how to properly identify data types. It would be helpful to know how to appropriately combine all these terms into one cohesive Data type model. Could you offer any clarity on this?

Kind regards,

Chuck Wynn

Jim Frost says

Hi Chuck, I’m glad you found this guide to be useful. As you’ve noticed, there are different ways to classify data types. I’ve tried to include several alternative names for some of the data types. I did think of possibly including categorical as an AKA under qualitative. However, I already have a categorical group and I thought that would be confusing to have that twice! I did list “nominal” along with “attribute” as AKAs for categorical data.

The Nominal, Ordinal, Interval, and Ratio classification system was created by a psychologist and I wonder if this system is used more frequently is the field of psychology?

The difference between interval and ratio is that ratio has an absolute zero point while interval does not. While that is crucial for calculating ratios, it’s often not crucial when you’re graphing and statistically analyzing data. But, it can be an important point in terms of other types of interpretation. For instance 20 degrees Celsius is not twice 10 degrees Celsius.

I’ve tried to combine the two systems below. Parentheses indicate the NOIR classification terminology.

Quantitative:-Continuous data (Ratio and Interval)

-Discrete data (Ratio but not Interval. Counts do have an absolute zero.)

Qualitative (Categorical):-Categorical (Nominal)

-Binary (Nominal)

-Ordinal

I hope this helps and thanks for the interesting question!

Jim

yeshambel chekol says

It is really informative in statistics.

Jim Frost says

Thank you Yeshambel!

Khursheed Ahmad Ganaie says

Thnks a lot

I am honestly saying that I get a lot of concepts. ……by u

Keep on

God bless u. …

Jim Frost says

Thank you Khursheed! I’m very happy that you found it to be helpful!