一种基于MapReduce的频繁闭项集挖掘算法

Abstract
Figure/Table
References
Related Citation (15)

Download: PDF (356 KB) HTML (1 KB)
Export: BibTeX | EndNote (RIS)

Abstract Closed frequent itemset mining is an useful way for discovering association rules from data. Cloud computing infrastructure based on MapReduce provides a promising solution to address the problem. A parallel algorithm for mining closed frequent itemset is presented based on the Hadoop cloud computing platform. The method consists of four steps: parallel counting, global F-List constructing, parallel mining of local closed frequent itemset and parallel filtrating of global closed frequent itemset. The experimental results validate the method and show that it is effective with a satisfied speedup.

Key words： Cloud Computing Parallel Algorithm Data Mining Closed Frequent Itemset MapReduc

Received: 14 February 2011

ZTFLH:

TP311

	Service

	E-mail this article
	Add to my bookshelf
	Add to citation manager
	E-mail Alert
	RSS
	Articles by authors
	CHEN Guang-Peng
	YANG Yu-Bin
	GAO Yang
	SHANG Lin

Cite this article:

CHEN Guang-Peng,YANG Yu-Bin,GAO Yang等. Closed Frequent Itemset Mining Based on MapReduce[J]. , 2012, 25(2): 220-224.

URL:

http://manu46.magtech.com.cn/Jweb_prai/EN/ OR http://manu46.magtech.com.cn/Jweb_prai/EN/Y2012/V25/I2/220

[1] Aouad L M,Le-Khac N A,Kechadi T M.Performance Study of Distributed Apriori-Like Frequent Itemsets Mining.Knowledge and Information Systems,2010,23(1): 55-72
[2] Yu Kunming,Zhou Jiayi,Hong T P,et al.A Load-Balanced Distributed Parallel Mining Algorithm.Expert Systems with Applications,2010,37(3): 2459-2464
[3] Shankar S,Purusothaman T.Utility Sentient Frequent Itemset Mining and Association Rule Mining: A Literature Survey and Comparative Study.International Journal of Soft Computing Applications,2009,4: 81-95
[4] Tao Limin,Huang Linpeng.Cherry: An Algorithm for Mining Frequent Closed Itemsets without Subset Checking.Journal of Software,2008,19(2): 379-388 (in Chinese)
(陶利民,黄林鹏.Cherry:一种无须子集检查的闭合频繁集挖掘算法.软件学报,2008,19(2): 379-388)
[5] Han Jiawei,Kamber M.Data Mining: Concepts and Techniques.London,UK: Morgan Kaufmann,2006
[6] Pei Jian,Han Jiawei,Mao Runying.CLOSET: An Efficient Algorithm for Mining Frequent Closed Itemsets // Proc of the ACM SIGMOD International Workshop on Data Mining and Knowledge Discovery.Dallas,USA,2000: 21-30
[7] Wang Jianyong,Han Jiawei,Pei Jian.CLOSET+: Searching for the Best Strategies for Mining Frequent Closed Itemsets // Proc of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.Washington,USA,2003: 236-245
[8] Liu Guimei,Lu Hongjun,Yu J X,et al.AFOPT: An Efficient Implementation of Pattern Growth Approach [EB/OL]. [2003-12-19]. http://ftp.informatik.rwth-aachen.de/Publications/ CEURWS/Vol-90/liu.pdf
[9] Liu Guimei,Lu Hongjun,Xu Yabo,et al.Ascending Frequency Ordered Prefixtree: Efficient Mining of Frequent Patterns // Proc of the 8th International Conference on Database Systems for Advanced Applications.Kyoto,Japan,2003: 65-72
[10] Li Haoyuan,Wang Yi,Zhang Dong,et al.PFP: Parallel FP-Growth for Query Recommendation // Proc of the ACM Conference on Recommender Systems.Lausanne,Switzerland,2008: 107-111