Use bar charts to compare categories when you have at least one categorical or discrete variable. Each bar represents a summary value for one discrete level, where longer bars indicate higher values. Types of summary values include counts, sums, means, and standard deviations. Bar charts are also known as bar graphs.
Bar charts highlight differences between categories or other discrete data. Look for differences between categories as a screening method for identifying possible relationships. If your dataset includes multiple categorical variables, bar charts can help you understand the relationship between them.
Use bar charts to do the following:
- Compare counts by categories.
- Display a variable function (sum, average, standard deviation) by categories.
- Understand relationships between categorical variables.
Unlike histograms, the bars in bar charts have spaces between them to emphasize that each bar represents a discrete value, whereas histograms are for continuous data. For more information about the difference between bar charts and histograms, please read my Guide to Histograms.
At a minimum, bar charts require one categorical variable but frequently use two of them. To learn about other graphs, read my Guide to Data Types and How to Graph Them. If you’re mainly interested in comparing and contrasting qualitive properties of different groups, consider using a Venn diagram.
A Pareto chart is a special type of bar chart that identifies categories that contribute the most to all outcomes. Please read my post about Pareto charts.
Example Bar Chart
A delivery service promises that deliveries will occur within a specified time. The service wants to determine how well they are meeting this promise during peak hours and non-peak hours.
The dataset for this graph uses two categorical variables, each having two values, which produces four possible combinations that observations can fall within:
- Delivery time
- Delivery status
- On time
Bar charts typically contain the following elements:
- Y-axis representing counts, variable function (average, sum, standard deviation), or other summary value.
- Categories or discrete values on the x-axis.
- Vertical bars representing the value for each category.
- Optionally, the bars can be clustered in groups and/or stacked to facilitate comparisons.
For the delivery data, the bars indicate the counts of observations having each of the four possible combinations of categorical values. The graph shows that more deliveries occur during peak hours than off-peak hours. Late deliveries are rare during off-peak hours. However, the number of late deliveries increases markedly during peak hours. The service should focus on improving delivery times during peak hours.
Because this chart has two categorical variables, it helps us understand the relationship between them.
Interpreting Bar Charts and Comparing Categories
Bar charts often compare categories, but that’s not always the case. You just need a discrete variable for the horizontal X-axis. For instance, the bar chart below uses a five-point Likert scale for satisfaction. Likert scale data are ordinal and have discrete values.
Assess the differences between bars to evaluate how the metric changes between discrete values. Identify the groups that have the highest and lowest values. The service provider must be pleased with the results!
Using clustering and stacking, you can compare groups within groups. To understand relationships between categorical variables, assess how the proportions of subgroups change between groups. In the plot of ice cream flavor preferences, females prefer chocolate, males prefer vanilla, and they equally enjoy strawberry.
Keep in mind that the length of the bars can represent different characteristics, such as counts, total, average, and so on. Be sure to notice which metric the graph displays while interpreting it.
Bar charts are also a fantastic way to display relative frequency distributions.
Use Bar Charts with the Appropriate Hypothesis Tests
You can use bar charts to compare summary values between categories or understand the relationships between categorical variables. However, if you want to use your sample to infer the properties of a larger population, be sure to perform the appropriate hypothesis tests to determine statistical significance.
Graphs are somewhat subjective because statistical software allows you to edit their properties, such as the graph’s scaling. Changing these settings can alter the appearance of bar charts and the conclusions you draw from them. Conversely, hypothesis tests provide an objective assessment of statistical significance. These tests also account for the possibility of random error explaining the observed patterns.
The hypothesis tests that you can use with bar charts depend on whether you are comparing summary statistics between groups or exploring the relationship between categorical variables.
When you are comparing summary statistics, consider the following hypothesis tests:
- Tests that compare group means
- Variances tests that assess variability between groups
When you’re assessing the relationship between categorical variables, consider using the chi-square test of independence.
Click the link to see how I use the chi-square test to assess the data in the graph below! I determine whether there is a relationship between uniform color and survival status in the original Star Trek TV series.