用模式增长方法挖掘嵌入式频繁子树<sup>*</sup>

摘要
图/表
参考文献
相关文章 (15)

全文: PDF (404 KB) HTML (1 KB)
输出: BibTeX | EndNote (RIS)

摘要提出用模式增长方法在带标记有序树构成的森林中挖掘嵌入式频繁子树.算法利用最右路径扩展方法构造完整的模式增长空间,然后根据待增长模式的拓扑结构确定其增长点并构造相应投影库,从而将挖掘频繁子树问题转化为在各投影库中寻找频繁节点问题.这大大降低算法的复杂性.实验表明其具有较高的时空效率.

	服务

	把本文推荐给朋友
	加入我的书架
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章
	马海兵
	李荣陆
	胡运发

关键词 ：数据挖掘, 频繁模式, 模式增长, 频繁子树

Abstract：In this paper, an efficient pattern growth algorithm for mining frequent embedded subtrees in rooted, labeled, and ordered trees is presented. It uses rightmost path expansion schema to construct complete pattern growth space, and creats a projection database for every grow point of the treepattern. So the problem is transformed from mining frequent trees to finding frequent nodes in the projected database. Thus the complexity of the algorithm is considerably reduced. Experimental results show that it is efficient for both time and space.

Key words： Data Mining Frequent Pattern Pattern Growth Frequent SubTree

收稿日期: 2004-09-13

ZTFLH:

TP311

基金资助:国家自然科学基金(No.60473070)、国家863高技术研究发展计划基金资助项目

作者简介: 马海兵,男,1971年生,博士后,主要研究方向为数据挖掘、政治工作信息化.E-mail: martin0721@163.com.李荣陆,男,1976年生,博士研究生,主要研究方向为文本数据库.胡运发,男,1940年生,教授,博士生导师,主要研究方向为数据与知识工程.

引用本文:

马海兵，李荣陆，胡运发. 用模式增长方法挖掘嵌入式频繁子树^*[J]. 模式识别与人工智能, 2006, 19(2): 208-214. MA HaiBing, LI RongLu, HU YunFa. Pattern Growth Method for Mining Embedded Frequent Trees. , 2006, 19(2): 208-214.

链接本文:

http://manu46.magtech.com.cn/Jweb_prai/CN/ 或 http://manu46.magtech.com.cn/Jweb_prai/CN/Y2006/V19/I2/208

[1] Agrawal R, Imielinski T, Swami A. Mining Association Rules between Sets of Items in Large Databases. In: Proc of the ACM SIGMOD International Conference on Management of Data. Washington, USA, 1993, 207-216
[2] Agrawal R, Srikan R. Fast Algorithms for Mining Association Rules. In: Proc of the 20th International Conference on Very Large Data Bases. Santiago, Chile, 1994, 487-499
[3] Wang K, Liu H Q. Schema Discovery for Semistructured Data. In: Proc of the 3rd International Conference on Knowledge Discovery and Data Mining. Newport Beach, USA, 1997, 271-274
[4] Chi Y, Yang Y, Muntz R R. Index and Mining Free Trees. In: Proc of the 3rd International Conference on Data Mining. Melbourne, USA, 2003, 509-512
[5] Han J W, Pei J, Yin Y W. Mining Frequent Patterns without Candidate Generation. In: Proc of the ACM SIGMOD International Conference on Management of Data. Dallas, USA, 2000, 1-2
[6] Zaki M J. Efficiently Mining Frequent Trees in a Forest. In: Proc of the International Conference on Knowledge Discovery and Data Mining. Edmonton, Canada, 2002, 71-80
[7] Asia T, et al. Efficient Substructure Discovery from Large Semi-Structured Data. In: Proc of the 2nd SIMA International Conference on Data Mining. Arlington, USA, 2002, 158-174
[8] Pei J, et al. H-Mine: Hyper-Structure Mining of Frequent Patterns in Large Databases. In: Proc of the 1st International Conference on Data Mining. San Jose, USA, 2001, 441-448
[9] Pei J, et al. PrefixSpan: Mining Sequential Patterns Efficiently by Prefix-Projected Pattern Growth. In: Proc of the International Conference on Data Engineering. Heidelberg, Germany, 2001, 215-224
[10] Ma H B, Zhang J, Ying J F, Yun F H. Mining Frequent Patterns Based on IS⁺-Tree. In: Proc of the International Conference on Machine Learning and Cybernetics. Shanghai, China, 2004, 497-503
[11] Dehaspe L, et al. Finding Frequent Substructures in Chemical Compounds. In: Proc of the 4th International Conference on Knowledge Discovery and Data Mining. New York, USA, 1998, 30-36
[12] Miyahara T, et al. Discovery of Frequent Tree Structured Patterns in Semistructured Web Documents. In: Proc of the 5th Pacific-Asia Conference on Knowledge Discovery and Data Mining. Hong Kong, China, 2001, 47-52
[13] Yan X, Han J. gSpan: Graph-Based Substructure Pattern Mining. In: Proc of the 2nd International Conference on Data Mining. Maebashi City, Japan, 2002, 721-724
[14] Zaki M J, Aggarwal C C. XRules: An Effective Structural Classifier for XML Data. In: Proc of the 9th International Conference on Knowledge Discovery and Data Mining. Washington, USA, 2003, 316-325
[15] Yan X F, Yu S P, Han J W. Graph Indexing: A Frequent Structure-Based Approach. In: Proc of the ACM SIGMOD International Conference on Management of Data. Paris, Fance, 2004, 335-346