- Regarding data mining methodology and user interaction, explain one challenge to data mining.
Mining different kinds of knowledge in databases. Different users are interested in different kinds of knowledge and will require a wide range of data analysis and knowledge discovery tasks such as data characterization, discrimination, association, classification, clustering, trend and deviation analysis, and similarity analysis. Each of these tasks will use the same database in different ways and will require different data mining techniques.
- Explain one challenge of mining a huge amount of data in comparison with mining a small amount of data.
A challenge with data mining large sets of data are in regards to the performance issues that arise with the data mining algorithms. The algorithms must be efficient and scalable in order to effectively extract information from large amounts of data in databases within predictable and acceptable running times.
Another challenge is the parallel, distributed, and incremental processing of data mining algorithms. Due to the high cost of some data mining processes, incremental data mining algorithms incorporate database updates without the need to mine the entire data again from scratch.
- What is an outlier? Does an outlier need to be discarded always?
An outlier is data that doesn't follow the normal trend. A "surprise" or an "exception." Outliers, while often discarded when in pursuit of identifying trends, is used to identify any patterns from the anomaly itself. Studying the characteristics of outliers can, at times, provide insightful information in reference to why they are outliers.