What is a Dendrogram?
A dendrogram is a tree-like diagram that shows how items or groups are clustered together based on their similarity. Hierarchical clustering, an analysis that groups data into nested clusters, commonly uses them to display the results.
In a dendrogram:
- Each leaf (or end node) represents a single item or observation.
- Branches connect similar items or clusters.
- The height at which two branches join reflects the distance or dissimilarity between those groups. The lower the connection point, the more similar the items are.
As you move down the diagram, larger clusters split into smaller, more specific groups of similar items. The top of the dendrogram shows all observations combined into a single cluster, while the bottom displays each observation as its own individual cluster. In between, the diagram reveals groupings of varying sizes based on similarity.
Analysts can explore the structure of the data at different levels of detail by drawing a horizontal line across the dendrogram at a chosen height. This “cut” determines how many clusters to keep. The lower the cut, the more finely divided the groups—each representing a tighter, more specific cluster of similar observations.
Using a Dendrogram
Dendrograms are especially useful for:
- Visualizing relationships among groups.
- Deciding how many clusters to keep in a hierarchical clustering analysis.
- Understanding structure in datasets where natural groupings might exist.
In addition to visual inspection, analysts often use statistical measures to help choose the most meaningful number of clusters when interpreting a dendrogram. Methods such as the inconsistency coefficient, gap statistic, or distance thresholds can highlight where large jumps in dissimilarity occur in the hierarchy. These metrics guide analysts in deciding where to “cut” the dendrogram to form distinct clusters that balance detail with simplicity.
For example, a biologist might use a dendrogram to group species based on genetic similarity, or a market researcher might use it to segment customers based on purchasing behavior. The visual nature of a dendrogram makes it a powerful tool for interpreting complex clustering results.
