1.School of Mathematics and Computer Science, Fuzhou University, Fuzhou 350002 2.Department of Computer and Information Technology, Fudan University, Shanghai 200433
Abstract:The computing complexity of the frequent itemsets mining algorithm and the number of frequent itemsets are increased exponentially with the number of items in a transaction set. The minimum support threshold becomes a key to control such an increase. However, in practical application it will be difficult to control frequent itemsets scale, if only support threshold is used. The problem of Nmost frequent itemsets is introduced, and the breadthfirstsearch algorithm NApriori and the depthfirstsearch algorithm IntvMatrix based on the dynamic minimum support threshold are presented to solve the problem. Experimental result shows the proposed algorithms are faster than nave method, and the improvement of the speed is remarkable when N is low.
[1] Agrawal R, Imielinski T, Swami A. Mining Association Rules between Sets of Items in Large Databases // Proc of the ACM SIGMOD Conference on Management of Data. Washington, USA, 1993: 207216 [2]Agrawal R, Srikant R. Fast Algorithms for Mining Association Rules // Proc of the International Conference on Very Large Databases. Santiago, USA, 1994: 487499 [3]Han Jiawei, Pei Jian, Yin Yiwen. Mining Frequent Patterns without Candidate Generation: A FrequentPattern Tree Approach. Data Mining and Knowledge Discovery, 2004, 8(1): 5387 [4]Hipp J, Guntzer U, Nakhaeizadeh G. Algorithms for Association Rule Mining-A General Survey and Comparison. SIGKDD Explorations, 2000, 2(2): 5864 [5]Pei Jian, Han Jiawei, AslMortazavi B, et al. PrefixSpan: Mining Sequential Patterns Efficiently by PrefixProjected Pattern Growth // Proc of the 17th International Conference on Data Engineering. Heidelberg, Germany, 2001: 215224 [6]Chen Xiaoyun, Chen Yi, Wang Lei, et al. Text Categorization Based on Classification Rules Tree by Frequent Patterns. Journal of Software, 2006, 17(5): 10171025 (in Chinese) (陈晓云,陈 袆,王 雷,等.基于分类规则树的频繁模式文本分类.软件学报. 2006, 17(5): 10171025) [7]Beil F, Ester M, Xu X. Frequent TermBased Text Clustering // Proc of the 8th International Conference on Knowledge Discovery and Data Mining. New York, USA, 2002: 436442 [8]Fu A W C, Kwong R W W, Tang Jian. Mining NMost Interesting Itemsets // Proc of the International Symposium on Methodologies for Intelligent Systems. Lyon, France, 2000:5967 [9]ElHajj M, Zaiane O R. Inverted Matrix: Efficient Discovery of Frequent Items in Large Datasets in the Context of Interactive Mining // Proc of the International Conference on Data Mining and Knowledge Discovery. Washington, USA, 2003: 109118 [10]Richrdo B Y, Berthier R N. Modern Information Retrieval. Milan, Italy: AddisonWesley, 1999 [11]Borgelt C,Kruse R. Induction of Association Rules: Apriori Implementation // Proc of the 15th Conference on Computational Statistics. Berlin, Germany, 2001: 395400