Send the link below via email or IMCopy
Present to your audienceStart remote presentation
- Invited audience members will follow you as you navigate and present
- People invited to a presentation do not need a Prezi account
- This link expires 10 minutes after you close the presentation
- A maximum of 30 users can follow your presentation
- Learn more about this feature in our knowledge base article
Do you really want to delete this prezi?
Neither you, nor the coeditors you shared it with will be able to recover it again.
Make your likes visible on Facebook?
Connect your Facebook account to Prezi and let your likes appear on your timeline.
You can change this under Settings & Account at any time.
Transcript of Max-Miner Algorithm
- Roberto J. Bayardo Jr. NOTE The pseudo code being presented is max-miner at its top level and the most basic implementation out of many others available.
Other versions of max-miner may implement features such as:
*Support lower-bounding *Max-miner LO
*Additional constraints Example: The Blueprint... Evaluation (A,B,C,D,E,F) A (B,C,D,E) AC (D) B (C,D,E) AD () C (D,E) D (E) E () TID ITEMS 10 20 30 A , B , C , D , E B , C , D , E A , C , D , F (A,B,C,D,E,F) A (B,C,D,E) AC (D) B (C,D,E) AD () C (D,E) D (E) E () TID ITEMS 10 20 30 A , B , C , D , E B , C , D , E A , C , D , F Maximal Item-sets: B,C,D,E A,C,D Note: * process runs node by node from one tier to the next Apriori follows a candidate 'generation-and-test' approach and generates length (k+1) candidate itemsets from length k frequent itemsets that satisfy the support value. Max-Miner has a similar implementation to the apriori with the difference being that max-miner only concerns itself with finding the frequent maximal itemsets and not all frequent itemsets Motivation: Apriori employs bottom-up search which enumerates every single frequent item-set (i.e.: size 1,2,3,4,5... etc.) while also generating candidate sets for each. This exponential complexity restricts apriori (and similar) mining algortihms to discovering relatively short patterns. This is where max-miner comes in to help efficiently extract only the maximal frequent itemsets. Since, any frequent itemset is a subset of a maximal frequent itemset, max-miner's output indirectily represents all frequent itemsets. Another apriori issue that max-miner attempts to reslove is that of a low support value. When using large databases, a low support value may not generate enough candidate itemsets for analysis of frequent patterns. Max-Miner provides reliable results even with a low support by attempting to "look ahead" in order to quickly identify long frequent patterns. By doing this in the early stages, max-miner is able to prune all subsets of the long frequent pattern from further considertaion. Support: 2 Note: * process runs node by node from one tier to the next (A,B,C,D,E,F) A (B,C,D,E) AC (D) B (C,D,E) AD () C (D,E) D (E) E () TID ITEMS 10 20 30 A , B , C , D , E B , C , D , E A , C , D , F Support: 2 TID ITEMS 10 20 30 A , B , C , D , E B , C , D , E A , C , D , F - A data-set is inserted into the program - User specifies a minimum support value - A set-enumeration tree is formed from the data-set, that holds every frequent itemset encountered as long as it is potentially maximal (shown in following example) - The while loop is used to search through the set-enumeration tree. - This function performs an initial scan over the input data-set to identify all unique items (to get the domain)
- It also seeds the first and second level of the set-enumeration tree, Initial set-enumeration tree (A,B,C,D,E,F) A (B,C,D,E,F) B (C,D,E,F) C (D,E,F) D (E,F) E (F) F () Support: 2 - To evaluate the performance of max-miner, data-sets from several domains were selected
- The table provides the width (avg. length) and height (no. of records) of the selected data-sets Retail: data-set provided by a retailer containing records of items purchased by customers over a period of time. (Not available publically)
Pumsb: Census data-set
Pumsb*: Census data-set with all items 80% or greater support discarded
Splice: DNA data-set
Mushroom: data-set of characteristics of various mushroom species
Connect-4 & Chess: compiled from game state information Conclusion It applies techniques such as superset-frequency based pruning for reducing the number of itemsets considered. The result is orders of magnitude in performance improvements over apriori-like algorithms, especially when frequent itemsets are long. Max-Miner is a higly customizable mining algorithm that implicitly represents all frequent itemsets by extracting only the maximal frequent itemsets.