Data Mining Discussion 6 b

What is the essence of the method of partitioning?

Most partitioning methods are distance-based. The clusters are formed in an optimized way such that the objects within a cluster are “close”, meaning that they are related to each other, while objects in different clusters are “far apart”, they are very different.

What is the k-medoids method?

It is a modification of the k-means algorithm that diminished sensitivity to outliers. Instead of using the mean value of the objects in a cluster as a reference point, it picks actual objects to represent the clusters. Each remaining object gets placed in the cluster with the representative object that is most similar to it. The result is that it groups n objects into k clusters by minimizing the absolute error.

Which method is more robust (k-means or k-medoids) and why?

If there is noise and outliers then k-medoids is more robust than k-means because a medoid is less influenced by them. However, each iteration in the k-medoids algorithm is of complexity O(k(n-k)^2), which becomes very costly for large values of n and k. In such situation, it’s better to use the k-means method.