N个最频繁项集挖掘算法<sup>*</sup>

摘要
图/表
参考文献
相关文章 (15)

全文: PDF (457 KB) HTML (1 KB)
输出: BibTeX | EndNote (RIS)

摘要频繁项集挖掘算法的计算复杂性和生成的频繁项集数量随着事务集项数的增加呈指数增长，最小支持度阈值成为控制这种增长的关键.然而，实际应用中仅使用支持度阈值难以有效控制频繁项集的规模.为此定义N个最频繁项集挖掘问题，并提出基于支持度阈值动态调整策略的宽度优先搜索算法NApriori和深度优先搜索算法IntvMatrix挖掘N个最频繁项集.实验表明，本文的2种方法的效率比朴素方法高2倍以上，特别当N值较低时，本文方法的效率优势更为明显.

	服务

	把本文推荐给朋友
	加入我的书架
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章
	陈晓云
	胡运发

关键词 ：数据挖掘, N个最频繁项集, 支持度阈值, 倒排矩阵

Abstract：The computing complexity of the frequent itemsets mining algorithm and the number of frequent itemsets are increased exponentially with the number of items in a transaction set. The minimum support threshold becomes a key to control such an increase. However, in practical application it will be difficult to control frequent itemsets scale, if only support threshold is used. The problem of Nmost frequent itemsets is introduced, and the breadthfirstsearch algorithm NApriori and the depthfirstsearch algorithm IntvMatrix based on the dynamic minimum support threshold are presented to solve the problem. Experimental result shows the proposed algorithms are faster than nave method, and the improvement of the speed is remarkable when N is low.

Key words： Data Mining NMost Frequent Itemsets Support Threshold Inverted Matrix

收稿日期: 2005-11-22

ZTFLH:

TP311

基金资助:国家自然科学基金(No. 60473070)、福建省自然科学基金(No.S0650013)资助项目

作者简介: 陈晓云，女，1970年生，副教授，主要研究方向为数据挖掘、机器学习、信息检索.Email:c_xiaoyun@21cn.com.胡运发，男，1940年生，教授，博士生导师，主要研究方向为数据与知识工程.

引用本文:

陈晓云，胡运发. N个最频繁项集挖掘算法^*[J]. 模式识别与人工智能, 2007, 20(4): 512-518. CHEN XiaoYun , HU YunFa. Mining Algorithms of NMost Frequent Itemsets. , 2007, 20(4): 512-518.

链接本文:

http://manu46.magtech.com.cn/Jweb_prai/CN/ 或 http://manu46.magtech.com.cn/Jweb_prai/CN/Y2007/V20/I4/512

[1] Agrawal R, Imielinski T, Swami A. Mining Association Rules between Sets of Items in Large Databases // Proc of the ACM SIGMOD Conference on Management of Data. Washington, USA, 1993: 207216
[2]Agrawal R, Srikant R. Fast Algorithms for Mining Association Rules // Proc of the International Conference on Very Large Databases. Santiago, USA, 1994: 487499
[3]Han Jiawei, Pei Jian, Yin Yiwen. Mining Frequent Patterns without Candidate Generation: A FrequentPattern Tree Approach. Data Mining and Knowledge Discovery, 2004, 8(1): 5387
[4]Hipp J, Guntzer U, Nakhaeizadeh G. Algorithms for Association Rule Mining-A General Survey and Comparison. SIGKDD Explorations, 2000, 2(2): 5864
[5]Pei Jian, Han Jiawei, AslMortazavi B, et al. PrefixSpan: Mining Sequential Patterns Efficiently by PrefixProjected Pattern Growth // Proc of the 17th International Conference on Data Engineering. Heidelberg, Germany, 2001: 215224
[6]Chen Xiaoyun, Chen Yi, Wang Lei, et al. Text Categorization Based on Classification Rules Tree by Frequent Patterns. Journal of Software, 2006, 17(5): 10171025 (in Chinese)
(陈晓云,陈袆,王雷,等.基于分类规则树的频繁模式文本分类.软件学报. 2006, 17(5): 10171025)
[7]Beil F, Ester M, Xu X. Frequent TermBased Text Clustering // Proc of the 8th International Conference on Knowledge Discovery and Data Mining. New York, USA, 2002: 436442
[8]Fu A W C, Kwong R W W, Tang Jian. Mining NMost Interesting Itemsets // Proc of the International Symposium on Methodologies for Intelligent Systems. Lyon, France, 2000:5967
[9]ElHajj M, Zaiane O R. Inverted Matrix: Efficient Discovery of Frequent Items in Large Datasets in the Context of Interactive Mining // Proc of the International Conference on Data Mining and Knowledge Discovery. Washington, USA, 2003: 109118
[10]Richrdo B Y, Berthier R N. Modern Information Retrieval. Milan, Italy: AddisonWesley, 1999
[11]Borgelt C,Kruse R. Induction of Association Rules: Apriori Implementation // Proc of the 15th Conference on Computational Statistics. Berlin, Germany, 2001: 395400