• Skip to secondary menu
  • Skip to main content
  • Skip to primary sidebar
  • My Store
  • Glossary
  • Home
  • About Me
  • Contact Me

Statistics By Jim

Making statistics intuitive

  • Graphs
  • Basics
  • Hypothesis Testing
  • Regression
  • ANOVA
  • Probability
  • Time Series
  • Fun
  • Calculators

Hierarchical Clustering

By Jim Frost

« Back to Glossary Index

What is Hierarchical Clustering?

Hierarchical clustering is a method of cluster analysis that builds a tree-like structure (called a dendrogram) that groups observations by similarity. This method can be agglomerative (starting with individual items and merging clusters) or divisive (starting with all items and splitting clusters). One of the key benefits of hierarchical clustering is that the tree-like structure provides a full picture of how clusters form at different similarity levels, allowing researchers to explore relationships at multiple scales rather than just ending with a single final set of clusters.

Hierarchical clustering helps analysts manage a tradeoff between simplifying the data into a manageable number of clusters while preserving the underlying structure and meaningful variation within the dataset.

Unlike K-means clustering, which requires analysts to specify the number of clusters in advance, hierarchical clustering builds a nested structure of groupings first, allowing analysts to decide how many clusters to keep after examining the results.

Using Hierarchical Clustering

There are several types of hierarchical clustering algorithms, including single linkage, which merges clusters based on the shortest distance between points; complete linkage, which uses the largest distance between points; and average linkage, which considers the average distance between all points in two clusters. These methods are used in various fields — for example, single linkage is often used in genetics to detect long, chain-like clusters, while complete linkage can create more compact, evenly sized groups. Average linkage offers a compromise between the two. Hierarchical clustering is widely applied in biology (e.g., classifying species or genes), marketing (e.g., customer segmentation), and text analysis.

Beyond just building a cluster tree, analysts often apply statistical criteria to decide how many clusters to keep. Tools like the inconsistency coefficient, gap statistic, and elbow method help evaluate where meaningful separations exist in the hierarchy. These metrics aim to identify a point in the clustering process where combining clusters would start to group dissimilar items, signaling a natural stopping point for defining distinct groups.

For example, a researcher might use hierarchical clustering to group customers based on their purchase histories, revealing distinct customer segments for targeted marketing. By examining the full dendrogram, the researcher can explore how customer groups combine or split at different similarity levels — for instance, identifying broad categories like high-, medium-, and low-value customers or drilling down into more detailed subgroups within each category.

Hierarchical clustering example of customers for marketing strategy.

By exploring the full dendrogram, the researcher can adjust the cutoff height to examine either broad categories or more detailed subgroups — gaining flexibility to choose the clustering level that best fits the marketing goals.

Related

Related Articles:
  • Glossary: Dendrogram
« Back to Glossary Index

Primary Sidebar

Meet Jim

I’ll help you intuitively understand statistics by focusing on concepts and using plain English so you can concentrate on understanding your results.

Read More...

Buy My Introduction to Statistics Book!

Cover of my Introduction to Statistics: An Intuitive Guide ebook.

Buy My Hypothesis Testing Book!

Cover image of my Hypothesis Testing: An Intuitive Guide ebook.

Buy My Regression Book!

Cover for my ebook, Regression Analysis: An Intuitive Guide for Using and Interpreting Linear Models.

Subscribe by Email

Enter your email address to receive notifications of new posts by email.

    I won't send you spam. Unsubscribe at any time.

    Buy My Thinking Analytically Book!

    Cover for my book, Thinking Analytically: An Guide for Making Data-Driven Decisions.

    Top Posts

    • F-table
    • Z-table
    • Cronbach’s Alpha: Definition, Calculations & Example
    • How To Interpret R-squared in Regression Analysis
    • Box Plot Explained with Examples
    • Multicollinearity in Regression Analysis: Problems, Detection, and Solutions
    • How to Interpret P-values and Coefficients in Regression Analysis
    • Interpreting Correlation Coefficients
    • Root Mean Square Error (RMSE)
    • Benford’s Law Explained with Examples

    Recent Posts

    • Data Collection Methods: Step-By-Step Guide with Examples
    • ANOVA Calculator
    • Positive Predictive Value: Meaning, Formula, and Interpretation
    • Median Absolute Deviation Calculator
    • Median Absolute Deviation: Definition, Finding & Formula
    • Outlier Calculator

    Recent Comments

    • Skata na fas on Comparing Regression Lines with Hypothesis Tests
    • Jim Frost on Comparing Regression Lines with Hypothesis Tests
    • Skata na fas on Comparing Regression Lines with Hypothesis Tests
    • Skata na fas on Comparing Regression Lines with Hypothesis Tests
    • Jim Frost on Pareto Chart: Making, Reading & Examples

    Copyright © 2026 · Jim Frost · Privacy Policy