Data Mining Discussion 1 a
- What is data mining?
Data Mining is the process of extracting information from large amount of data to obtain some valuable knowledge, potentially useful, and understandable.
- What are the steps involved in data mining when viewed as a process of knowledge discovery?
Knowledge Discovery in Databases (KDD) is generally used to refer to the overall process of
discovering useful knowledge from data, where data mining is a particular step in this process.
The steps in the KDD process:
ensure that useful knowledge is derived from the data.
- Describe at least two of the data mining functionalities and provide examples of each.
First we have to know that there are 2 categories of functions:
- Descriptive
- Class/Concept Description
- Mining of Frequent Patterns
- Mining of Associations
- Mining of Correlations
- Mining of Clusters
- Classification and Prediction
- Classification (IF-THEN) Rules
- Decision Trees
- Mathematical Formulae
- Neural Networks
I'll explain the two that sound more appealing to my ears:
- Mining of Frequent Patterns
This type occur frequently in transactional data.
- Frequent Item Set − set of items that frequently appear together, for example, milk and bread.
- Frequent Subsequence − A sequence of patterns that occur frequently such as purchasing a camera is followed by memory card.
- Frequent Sub Structure − Substructure refers to different structural forms, such as graphs, trees, or lattices, which may be combined with item-sets or subsequences.
The examples are above, for instance: for the Frequent Patterns we can have a buyer that purshases a camera and then it follows by buying a memory card.
- Mining of Clusters
Clusters refers to similar objects, so this kind of mining points to the analysis of very similar objects to each other, but highly different from objects in other clusters.
An example of this that comes to my mind could be... ex: someone buying shoes, different models are similar, a pen is totally different from a shoe.