Hierarchical Clustering

What is Hierarchical Clustering?

Hierarchical Clustering is an unsupervised learning algorithm that builds a hierarchy of clusters. It operates in two main ways: agglomerative (a 'bottom-up' approach where each data point starts in its own cluster and pairs of clusters are merged) and divisive (a 'top-down' approach where all data points start in one cluster and are recursively split). The result is a tree-like structure called a dendrogram, which visualizes the hierarchy and allows for choosing the number of clusters after the analysis is complete.

Where did the term "Hierarchical Clustering" come from?

The foundational methods for hierarchical clustering were developed in the 1960s and saw early application in fields like biology for creating taxonomies (e.g., classifying plants and animals) and in social sciences for analyzing complex data structures.

How is "Hierarchical Clustering" used today?

Hierarchical clustering is widely used in various fields, including bioinformatics for gene expression analysis, market research for customer segmentation, and social network analysis. Unlike partitional clustering methods like k-means, it does not require the number of clusters to be specified in advance, making it particularly useful for exploratory data analysis where the underlying structure of the data is unknown.

Related Terms