Loading presentation...

Present Remotely

Send the link below via email or IM


Present to your audience

Start remote presentation

  • Invited audience members will follow you as you navigate and present
  • People invited to a presentation do not need a Prezi account
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can follow your presentation
  • Learn more about this feature in our knowledge base article

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.


Max-Miner Algorithm

No description

Abhi Sharma

on 26 March 2013

Comments (0)

Please log in to add your comment.

Report abuse

Transcript of Max-Miner Algorithm

Apriori Algorithm Max-Miner Algorithm Max-Miner Algorithm Efficiently mining long patterns
from databases
- Roberto J. Bayardo Jr. NOTE The pseudo code being presented is max-miner at its top level and the most basic implementation out of many others available.

Other versions of max-miner may implement features such as:

*Support lower-bounding *Max-miner LO
*Additional constraints Example: The Blueprint... Evaluation (A,B,C,D,E,F) A (B,C,D,E) AC (D) B (C,D,E) AD () C (D,E) D (E) E () TID ITEMS 10 20 30 A , B , C , D , E B , C , D , E A , C , D , F (A,B,C,D,E,F) A (B,C,D,E) AC (D) B (C,D,E) AD () C (D,E) D (E) E () TID ITEMS 10 20 30 A , B , C , D , E B , C , D , E A , C , D , F Maximal Item-sets: B,C,D,E A,C,D Note: * process runs node by node from one tier to the next Apriori follows a candidate 'generation-and-test' approach and generates length (k+1) candidate itemsets from length k frequent itemsets that satisfy the support value. Max-Miner has a similar implementation to the apriori with the difference being that max-miner only concerns itself with finding the frequent maximal itemsets and not all frequent itemsets Motivation: Apriori employs bottom-up search which enumerates every single frequent item-set (i.e.: size 1,2,3,4,5... etc.) while also generating candidate sets for each. This exponential complexity restricts apriori (and similar) mining algortihms to discovering relatively short patterns. This is where max-miner comes in to help efficiently extract only the maximal frequent itemsets. Since, any frequent itemset is a subset of a maximal frequent itemset, max-miner's output indirectily represents all frequent itemsets. Another apriori issue that max-miner attempts to reslove is that of a low support value. When using large databases, a low support value may not generate enough candidate itemsets for analysis of frequent patterns. Max-Miner provides reliable results even with a low support by attempting to "look ahead" in order to quickly identify long frequent patterns. By doing this in the early stages, max-miner is able to prune all subsets of the long frequent pattern from further considertaion. Support: 2 Note: * process runs node by node from one tier to the next (A,B,C,D,E,F) A (B,C,D,E) AC (D) B (C,D,E) AD () C (D,E) D (E) E () TID ITEMS 10 20 30 A , B , C , D , E B , C , D , E A , C , D , F Support: 2 TID ITEMS 10 20 30 A , B , C , D , E B , C , D , E A , C , D , F - A data-set is inserted into the program - User specifies a minimum support value - A set-enumeration tree is formed from the data-set, that holds every frequent itemset encountered as long as it is potentially maximal (shown in following example) - The while loop is used to search through the set-enumeration tree. - This function performs an initial scan over the input data-set to identify all unique items (to get the domain)

- It also seeds the first and second level of the set-enumeration tree, Initial set-enumeration tree (A,B,C,D,E,F) A (B,C,D,E,F) B (C,D,E,F) C (D,E,F) D (E,F) E (F) F () Support: 2 - To evaluate the performance of max-miner, data-sets from several domains were selected

- The table provides the width (avg. length) and height (no. of records) of the selected data-sets Retail: data-set provided by a retailer containing records of items purchased by customers over a period of time. (Not available publically)

Pumsb: Census data-set

Pumsb*: Census data-set with all items 80% or greater support discarded

Splice: DNA data-set

Mushroom: data-set of characteristics of various mushroom species

Connect-4 & Chess: compiled from game state information Conclusion It applies techniques such as superset-frequency based pruning for reducing the number of itemsets considered. The result is orders of magnitude in performance improvements over apriori-like algorithms, especially when frequent itemsets are long. Max-Miner is a higly customizable mining algorithm that implicitly represents all frequent itemsets by extracting only the maximal frequent itemsets.
Full transcript