Data Mining Discussion 6 d


Full article

What is the essence of density-based methods?

Density-based clustering methods are data-driven, they partition the set of data objects and then adapt to the distribution of said objects in the embedding space. To find clusters of arbitrary shape, clusters are modeled as dense regions in the data space, separated by sparse regions. The density of an object can be measured by the number of objects close to it.

What is the essence of grid-based methods?

Grid-based clustering methods take a space-driven approach by partitioning the embedding space into cells which are independent of the distribution of the input objects. These methods divide the object space into a finite number of cells that form a grid structure. All of the operations for clustering are performed on this grid structure, resulting in a fast processing time.

What is clustering tendency assessment?

Applying a clustering method to a data set will return clusters, but these clusters may be meaningless and random. With clustering tendency assessment we can determine if a data set has a non-random structure, which may lead to meaningful clusters.

How can the number of clusters be determined?

Determining the number of clusters is not easy, it depends on the distribution’s shape and scale in the data set, and the clustering resolution that the user requires. A simple estimate is to set the number of clusters to about sqrt(n/2) for a data set of n points.