|
|
Closed Frequent Itemset Mining Based on MapReduce |
CHEN Guang-Peng, YANG Yu-Bin, GAO Yang, SHANG Lin |
State Key Laboratory for Novel Software Technology,Nanjing University,Nanjing 210093 |
|
|
Abstract Closed frequent itemset mining is an useful way for discovering association rules from data. Cloud computing infrastructure based on MapReduce provides a promising solution to address the problem. A parallel algorithm for mining closed frequent itemset is presented based on the Hadoop cloud computing platform. The method consists of four steps: parallel counting, global F-List constructing, parallel mining of local closed frequent itemset and parallel filtrating of global closed frequent itemset. The experimental results validate the method and show that it is effective with a satisfied speedup.
|
Received: 14 February 2011
|
|
|
|
|
[1] Aouad L M,Le-Khac N A,Kechadi T M.Performance Study of Distributed Apriori-Like Frequent Itemsets Mining.Knowledge and Information Systems,2010,23(1): 55-72 [2] Yu Kunming,Zhou Jiayi,Hong T P,et al.A Load-Balanced Distributed Parallel Mining Algorithm.Expert Systems with Applications,2010,37(3): 2459-2464 [3] Shankar S,Purusothaman T.Utility Sentient Frequent Itemset Mining and Association Rule Mining: A Literature Survey and Comparative Study.International Journal of Soft Computing Applications,2009,4: 81-95 [4] Tao Limin,Huang Linpeng.Cherry: An Algorithm for Mining Frequent Closed Itemsets without Subset Checking.Journal of Software,2008,19(2): 379-388 (in Chinese) (陶利民,黄林鹏.Cherry:一种无须子集检查的闭合频繁集挖掘算法.软件学报,2008,19(2): 379-388) [5] Han Jiawei,Kamber M.Data Mining: Concepts and Techniques.London,UK: Morgan Kaufmann,2006 [6] Pei Jian,Han Jiawei,Mao Runying.CLOSET: An Efficient Algorithm for Mining Frequent Closed Itemsets // Proc of the ACM SIGMOD International Workshop on Data Mining and Knowledge Discovery.Dallas,USA,2000: 21-30 [7] Wang Jianyong,Han Jiawei,Pei Jian.CLOSET+: Searching for the Best Strategies for Mining Frequent Closed Itemsets // Proc of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.Washington,USA,2003: 236-245 [8] Liu Guimei,Lu Hongjun,Yu J X,et al.AFOPT: An Efficient Implementation of Pattern Growth Approach [EB/OL]. [2003-12-19]. http://ftp.informatik.rwth-aachen.de/Publications/ CEURWS/Vol-90/liu.pdf [9] Liu Guimei,Lu Hongjun,Xu Yabo,et al.Ascending Frequency Ordered Prefixtree: Efficient Mining of Frequent Patterns // Proc of the 8th International Conference on Database Systems for Advanced Applications.Kyoto,Japan,2003: 65-72 [10] Li Haoyuan,Wang Yi,Zhang Dong,et al.PFP: Parallel FP-Growth for Query Recommendation // Proc of the ACM Conference on Recommender Systems.Lausanne,Switzerland,2008: 107-111 |
|
|
|