Data Mining Discussion 3 c
Discuss the steps associated to the design of a data warehouse.
The first step is to choose a business process to model. Depending on whether the business process is organizational multiple complex collections one would choose a data warehouse model. If it’s departmental and analyzes only one kind of business process, then a data mart model would instead be chosen.
The next step is to choose the business process grain, which is essentially the atomic level of data to be present. This refers to the data that will be represented in the fact table.
Then it’s time to choose the dimensions that will apply to each fact table record.
Lastly, you choose the measures that will populate each fact table record. Often they are numeric additive quantities.
Compare the waterfall and the spiral methods as methodologies to develop a data warehouse.
The waterfall method used to design a data warehouse performs structured and systematic analysis at each step before proceeding to then next, which is how it gets the name, “waterfall.” In comparison, the spiral method involves the rapid generation of increasingly functional systems with short intervals between successive releases.
Compare/contrast the three main types of data warehouse usage: information processing, analytical processing, and data mining.
All three of the main types of data warehouse usage have a common factor: they all analyze data in some way. For example, information processing supports the use of querying, statistical analysis, and reports using tables, charts, and/or graphs. On the other hand, analytical processing generally operates on historic data. The benefit it has over information processing is the multidimensional data analysis of data warehouse data. Lastly, data mining is different from the two in that it supports knowledge discovery by finding hidden patterns and associations by constructing analytical models.
Please discuss the following statement given on page 155 of our textbook: “among the many different paradigms and architectures of data mining systems, multidimensional data mining is particularly important”.
The statement alludes to the explanation given as to why multidimensional data mining is so important. The book begins by explaining how high-quality data is stored within data warehouses because it has already gone through preprocessing steps to ensure quality. Another important part of multidimensional data mining is how data analysis infrastructures have been (or will be) systematically constructed surrounding data warehouses. It also provides an online selection of data mining functions. Because users may not always know the specific kinds of knowledge they want to mine, integrating OLAP with various mining functions provides users with the flexibility to select the desired mining functions and swap data mining tasks dynamically.