- What do we understand by data reduction and what is its importance?
Data reduction is the practice of reducing huge amounts of data to a smaller size while maintaining the same results, or close to it. This is done because mining a huge data set is time consuming and impractical. The idea of data reduction is to increase efficiency and keep the same results. Different strategies to data reduction include dimensionality reduction, numerosity reduction, and data compression
- Discuss one of the data reduction strategies.
One of the data reduction strategies is data compression. The idea behind data compression is to obtain a reduced representation of the original data. If the huge data can be reconstructed from the reduced data without any information loss then the reduction process is called lossless. If only a portion of the original data can be reconstructed, the reduction process is called lossy.
- Discuss one of the data transformation strategies.
One of the data transformation strategies is discretization. This strategy converts the actual value of a numeric attribute into an interval label or conceptual label. The labels will then be organized into a hierarchical structure to make it easier to mine.
- What do we understand by data normalization? What are some of its methods?
Normalization is scaling down values to fall within a smaller range. Normalizing data gives all attributes equal weight and could be useful for classification algorithms involving neural networks and distance measurements. Different methods include Min-max normalization, z-score normalization, and Normalization by decimal scaling,