Data Mining Discussion 6 c

  1. What is the essence of hierarchical methods?


In hierarchical clustering, the data is not partitioned into a particular cluster in a single step. Instead, a series of partitions takes place, which may run from a single cluster containing all objects to n clusters that each contain a single object. Hierarchical Clustering is subdivided into agglomerative methods and divisive methods. Agglomerative techniques are more commonly used. Hierarchical clustering may be represented by a two-dimensional diagram known as a dendrogram, which illustrates the fusions or divisions made at each successive stage of analysis.


Advantages:
- Easy to implement
- Hierarchical clustering outputs a hierarchy, ie a structure that is more informative than the unstructured set of flat clusters returned by k-means. Therefore, it is easier to decide on the number of clusters by looking at the dendrogram.


Disadvantages:
- Very sensitive to outliers.
-The order of the data has an impact on the final results.
- Time complexity: not suitable for large datasets.
- It is not possible to undo the previous step: once the instances have been assigned to a cluster, they can no longer be moved around.

2. Contrast/compare agglomerative hierarchical clustering methods vs. divisive hierarchical clustering methods.


Agglomerative Hierarchical clustering method allows the clusters to be read from bottom to top and it follows this approach so that the program always reads from the sub-component first then moves to the parent whereas, a divisive uses top-bottom approach in which the parent is visited first then the child.
The agglomerative hierarchical method consists of objects in which each object creates its own clusters and these clusters are grouped together to create a large cluster. It defines a process of merging that carries on till all the single clusters are merged together into a complete big cluster that will consist of all the objects of child clusters whereas, in divisive the parent cluster is divided into a smaller cluster and it keeps on dividing till each cluster has a single object to represent.