Agglomerative Clustering

Brian T. Luke, Ph.D.

   

In Agglomerative Clustering, each object is initially placed into its own group. Therefore, if we have N objects to cluster, we start with N groups. Each of these groups contains only a single object, and is known as a singleton.

Before we start the clustering, we need to decide on a threshold distance. Once this is done, the procedure is as follows:

  1. Compare all pairs of groups and mark the pair that is closest.
  2. The distance between this closest pair of groups is compared to the threshold value.
    • If the distance btween this closest pair is less than the threshold distance, these groups become linked and are merged into a single group. Return to Step 1 to continue the clustering.
    • If the distance between the closest pair is greater than the threshold, the clustering stops.

If the threshold value is too small, there will still be many goups present at the end, and many of them will be singletons. Conversely, if the threshold is too large, objects that are not very similar may end up in the same cluster.

To run an agglomerative clustering, you need to decide upon a method of measuring the distance between two objects. In addition, you need a measure to determine which groups should be linked. Some options are simple linkage, average linkage, complete linkage, and Wards method.

Example of Agglomerative Clustering