What is Cluster Analysis?
• Cluster: a collection of data objects
– Similar to one another within the same cluster
– Dissimilar to the objects in other clusters
• Cluster analysis
– Grouping a set of data objects into clusters
• Clustering is unsupervised classification: no predefined classes
• Typical applications
– As a stand-alone tool to get insight into data distribution
– As a pre-processing step for other algorithms
What are some of the typical requirements of clustering in data mining?
The following points throw light on why clustering is required in data mining −
- Scalability − We need highly scalable clustering algorithms to deal with large databases.
- Ability to deal with different kinds of attributes − Algorithms should be capable to be applied on any kind of data such as interval-based (numerical) data, categorical, and binary data.
- Discovery of clusters with attribute shape − The clustering algorithm should be capable of detecting clusters of arbitrary shape. They should not be bounded to only distance measures that tend to find spherical cluster of small sizes.
- High dimensionality − The clustering algorithm should not only be able to handle low-dimensional data but also the high dimensional space.
- Ability to deal with noisy data − Databases contain noisy, missing or erroneous data. Some algorithms are sensitive to such data and may lead to poor quality clusters.
- Interpretability − The clustering results should be interpretable, comprehensible, and usable.
How is clustering used in applications? (discuss one application in your post)
Clustering can be used in a shopping application to group customers based on their shopping habits. Once the clusters are formed we can observe what are the most often shopped items for said group and we can supply them with coupons and special offers for those items. This can result in higher sale rates of the items and increased customer retention rate.