Data Mining Discussion 4 b
-What is the Apriori property and how is it employed to improve the efficiency of the algorithm?
The Apriori property states that "all nonempty subsets of a frequent itemset must also be frequent". To elaborate, if an itemset called I does satisfy the minimum support threshold then I is said to be not frequent. If an item called A is added to I, then the resulting itemset cannot be more frequent than I. This means that I ∪ A must therefore not be frequent either. This property belongs to the antimonotonicity category of properties, which means that if a set cannot pass this property's test, then all of its supersets will also fail that test. Furthermore, the Apriori property uses two steps to improve the efficiency of the algorithm using join and prune actions.
-What are some of the techniques that are used to improve the efficiency of the Apriori algorithm?
Some techniques used to improve efficiency include the hash-based technique and transaction reduction. In the hash-based technique, during the scanning of each transaction in the database to create the 1-itemset, a hash table is formed using the 2-itemsets of each transaction while increasing the bucket count. A 2-itemset hash table where the bucket count is below the support threshold cannot be frequent and therefore should be removed from the possible candidate set.
Transaction reduction, is the removal of transactions that do not contain any frequent k-itemsets because they cannot contain (k+1)-itemsets as well. This removes all the unrelated itemsets from consideration.