Data Mining Discussion 6 b

Dec 11, 2018 1 min read
Data Mining Discussion 6 b
  • What is the essence of the method of partitioning?

Most partitioning methods are distance-based. The clusters are formed in an optimized way such that the objects within a cluster are “close”, meaning that they are related to each other, while objects in different clusters are “far apart”, they are very different.

  • What is the k-medoids method?

It is a modification of the k-means algorithm that diminished sensitivity to outliers. Instead of using the mean value of the objects in a cluster as a reference point, it picks actual objects to represent the clusters. Each remaining object gets placed in the cluster with the representative object that is most similar to it. The result is that it groups n objects into k clusters by minimizing the absolute error.

  • Which method is more robust (k-means or k-medoids) and why?

If there is noise and outliers then k-medoids is more robust than k-means because a medoid is less influenced by them. However, each iteration in the k-medoids algorithm is of complexity O(k(n-k)^2), which becomes very costly for large values of n and k. In such situation, it’s better to use the k-means method.

Great! Next, complete checkout for full access to ArturoFM.
Welcome back! You've successfully signed in.
You've successfully subscribed to ArturoFM.
Success! Your account is fully activated, you now have access to all content.
Success! Your billing info has been updated.
Your billing was not updated.