Use bar charts to compare categories when you have at least one categorical or discrete variable. Each bar represents a summary value for one discrete level, where longer bars indicate higher values. Types of summary values include counts, sums, means, and standard deviations. Bar charts are also known as bar graphs.
Bar charts highlight differences between categories or other discrete data. Look for differences between categories as a screening method for identifying possible relationships. If your dataset includes multiple categorical variables, bar charts can help you understand the relationship between them.
Use bar charts to do the following:
- Compare counts by categories.
- Display a variable function (sum, average, standard deviation) by categories.
- Understand relationships between categorical variables.
Unlike histograms, the bars in bar charts have spaces between them to emphasize that each bar represents a discrete value, whereas histograms are for continuous data. For more information about the difference between bar charts and histograms, please read my Guide to Histograms.
At a minimum, bar charts require one categorical variable but frequently use two of them. To learn about other graphs, read my Guide to Data Types and How to Graph Them. If you’re mainly interested in comparing and contrasting qualitive properties of different groups, consider using a Venn diagram.
A Pareto chart is a special type of bar chart that identifies categories that contribute the most to all outcomes. Please read my post about Pareto charts.
Example Bar Chart
A delivery service promises that deliveries will occur within a specified time. The service wants to determine how well they are meeting this promise during peak hours and non-peak hours.
The dataset for this graph uses two categorical variables, each having two values, which produces four possible combinations that observations can fall within:
- Delivery time
- Peak
- Off-peak
- Delivery status
- On time
- Late
Bar charts typically contain the following elements:
- Y-axis representing counts, variable function (average, sum, standard deviation), or other summary value.
- Categories or discrete values on the x-axis.
- Vertical bars representing the value for each category.
- Optionally, the bars can be clustered in groups and/or stacked to facilitate comparisons.
For the delivery data, the bars indicate the counts of observations having each of the four possible combinations of categorical values. The graph shows that more deliveries occur during peak hours than off-peak hours. Late deliveries are rare during off-peak hours. However, the number of late deliveries increases markedly during peak hours. The service should focus on improving delivery times during peak hours.
Because this chart has two categorical variables, it helps us understand the relationship between them.
Learn more about the X and Y Axis.
Interpreting Bar Charts and Comparing Categories
Bar charts often compare categories, but that’s not always the case. You just need a discrete variable for the horizontal X-axis. For instance, the bar chart below uses a five-point Likert scale for satisfaction. Likert scale data are ordinal and have discrete values. Learn more about Likert Scale: Survey Use & Examples and Ordinal Data: Definition, Examples & Analysis.
Assess the differences between bars to evaluate how the metric changes between discrete values. Identify the groups that have the highest and lowest values. The service provider must be pleased with the results!
Using clustering and stacking, you can compare groups within groups. To understand relationships between categorical variables, assess how the proportions of subgroups change between groups. In the plot of ice cream flavor preferences, females prefer chocolate, males prefer vanilla, and they equally enjoy strawberry.
Keep in mind that the length of the bars can represent different characteristics, such as counts, total, average, and so on. Be sure to notice which metric the graph displays while interpreting it.
Bar charts are also a fantastic way to display cumulative frequency, relative frequency distributions, and can really make contingency tables pop! In fact, the preceding graph is based on a contingency table in my post, Contingency Table: Definition, Examples & Interpreting.
Use Bar Charts with the Appropriate Hypothesis Tests
You can use bar charts to compare summary values between categories or understand the relationships between categorical variables. However, if you want to use your sample to infer the properties of a larger population, be sure to perform the appropriate hypothesis tests to determine statistical significance.
Related post: Descriptive versus Inferential Statistics
Graphs are somewhat subjective because statistical software allows you to edit their properties, such as the graph’s scaling. Changing these settings can alter the appearance of bar charts and the conclusions you draw from them. Conversely, hypothesis tests provide an objective assessment of statistical significance. These tests also account for the possibility of random error explaining the observed patterns.
The hypothesis tests that you can use with bar charts depend on whether you are comparing summary statistics between groups or exploring the relationship between categorical variables.
When you are comparing summary statistics, consider the following hypothesis tests:
- Tests that compare group means
- Variances tests that assess variability between groups
When you’re assessing the relationship between categorical variables, consider using the chi-square test of independence.
Click the link to see how I use the chi-square test to assess the data in the graph below! I determine whether there is a relationship between uniform color and survival status in the original Star Trek TV series.
Sadiya says
Please what theory supports the use of bar charts for hypothesis testing if you don’t want to use the chi-square to test your hypothesis?
Jim Frost says
Hi Sadiya,
When you just want to describe a sample and you’re not using your sample to infer properties of a population, then using only bar charts is fine.
However, if you want to draw conclusions about a population, you’ll need to use a hypothesis test. In this scenario, bar charts can help illustrate your results but they can’t draw conclusions themselves.
It basically comes down to whether you’re performing descriptive or inferential statistics. You don’t need hypothesis tests for descriptive statistics but you do need them for inferential.
RABIA NOUSHEEN says
Hi Jim
I have a data set and I am just struggling to find out the right way to present it. I have exposed larvae to say 500 particles until they develop into their final larval form. During this period, I am checking mortalities each day and removing dead nauplii to investigate if they ingested particles. Now for mortalities, I have taken sum of larve dead each day to get the cumulative mortalities. I am interested to relate mortality with no. of particles ingested by dead nauplii. I have no. of particles ingested/ dead larvae for each day like 2 particles/dead larve for day 1, 0 particles/dead larvae for day 2 and so on. My question is Should I take the average or sum of no. of particles/dead larvae over days and then relate to cumulative mortality? Note my unit is no. of particles per dead larvae
Thanks in advance for being so helpful.
Regards
Jim Frost says
Hi Rabia,
I wonder if you should use a line chart with a line for each batch. Number of dead for the y-axis a days along the x-axis. You would then record the number of deaths per day for each batch. Then compare the lines on the chart. Is one line steeper than others? That sounds promising if your main goal is to compare batches. But, you’d also see if the death rate changes over time.