Data Mining Discussion 4 a


Full article

What do we understand by “frequent patterns”? How are they used in data mining? Please provide examples.

Frequent patterns are relationships within a given data set that repeatedly show up. They are used in data mining to discover associations and correlations between items in data sets. The market basket analysis is an example of frequent pattern mining. For example, if a customer buys milk, how often do they buy cereal, and if they do, what kind of cereal do they buy? The discovery of this pattern can be used to increase sales by creating special promotions for the purchase of such products together. If a customer buys ice cream and cookies, placing a stand with cookies near the ice cream will increase the sale of both products.

What is an association rule? What do the concepts of support and confidence associated to association rules mean? Please provide examples.

An association rule is a way to represent the relation between items that are frequently associated or purchased together, with both support and confidence measures (which are measures of interestingness). For example: bananas => ice cream [support = 10%, confidence 20%]. This support value means that 10% of the time, ice cream and bananas are bought together. This confidence value means that 20% of the time, customers who buy bananas also buy ice cream.

What is P(A|B) -probability of A given B-?

It is equivalent to confidence(B=>A), meaning what is the percentage of transactions containing B that also contain A.

Explain using examples the definitions of closed itemset, closed frequent itemset, and maximal frequent itemset.

Closed itemset: If an itemset is frequent, each of its subsets is also frequent. An itemset X in data set D is closed if there isn’t an itemset Y in D such that Y and X share the same support count.

Closed frequent itemset: An itemset X is a closed frequent itemset in data set D if it’s both closed and frequent in D.

Maximal frequent itemset: An itemset X is a maximal frequent itemset in data set D if X is frequent and there isn’t an itemset Y such that X is a subset of Y and Y is frequent in D.

What are the steps of association rule mining?

  1. Find all frequent itemsets, each of these itemsets will show up at least as frequently as the minimum support count (min_sup).
  2. Generate strong association rules from the frequent itemsets, these rules must meet the minimum support and minimum confidence.