- Please discuss data generalization and some of the concepts associated to it.Data generalization is used to summarize data by replacing relatively low-level values such as numeric value for age, with higher-level concepts such as young, middle-aged, and senior. It is also used to reduce the number of dimensions to summarize data, like removing birth_date when summarizing how a group of students behaves. Characterization gives a detailed summary of the given data collection, while discrimination provides descriptions comparing two or more data collections.
- What is attribute-oriented induction?It is an online data analysis technique that query-oriented and generalization-based. The overall idea of attribute-oriented induction is to collect the data relevant to the task using a database query and then perform either attribute removal or attribute generalization. This results in a size reduction of the generalized data set and becomes easier to present to the user.
- What is it understood by the “curse of dimensionality”?A data cube is a lattice of cuboids (represent group-by's in SQL terms). Precomputation of these cuboids leads to a fast response time and avoids some redundant computation. As the number of dimensions, cardinality, or conceptual hierarchies increases, the required storage space for the cuboid precomputations will greatly exceed the size of the input relation.