Pattern Growth Method for Mining Embedded Frequent Trees
MA HaiBing1,2, LI RongLu1, HU YunFa1
1.Department of Computer and Information Technology, Fudan University, Shanghai 200433 2.Shanghai Branch of PLA Nanjing Political College, Shanghai 200433
Abstract:In this paper, an efficient pattern growth algorithm for mining frequent embedded subtrees in rooted, labeled, and ordered trees is presented. It uses rightmost path expansion schema to construct complete pattern growth space, and creats a projection database for every grow point of the treepattern. So the problem is transformed from mining frequent trees to finding frequent nodes in the projected database. Thus the complexity of the algorithm is considerably reduced. Experimental results show that it is efficient for both time and space.
[1] Agrawal R, Imielinski T, Swami A. Mining Association Rules between Sets of Items in Large Databases. In: Proc of the ACM SIGMOD International Conference on Management of Data. Washington, USA, 1993, 207-216 [2] Agrawal R, Srikan R. Fast Algorithms for Mining Association Rules. In: Proc of the 20th International Conference on Very Large Data Bases. Santiago, Chile, 1994, 487-499 [3] Wang K, Liu H Q. Schema Discovery for Semistructured Data. In: Proc of the 3rd International Conference on Knowledge Discovery and Data Mining. Newport Beach, USA, 1997, 271-274 [4] Chi Y, Yang Y, Muntz R R. Index and Mining Free Trees. In: Proc of the 3rd International Conference on Data Mining. Melbourne, USA, 2003, 509-512 [5] Han J W, Pei J, Yin Y W. Mining Frequent Patterns without Candidate Generation. In: Proc of the ACM SIGMOD International Conference on Management of Data. Dallas, USA, 2000, 1-2 [6] Zaki M J. Efficiently Mining Frequent Trees in a Forest. In: Proc of the International Conference on Knowledge Discovery and Data Mining. Edmonton, Canada, 2002, 71-80 [7] Asia T, et al. Efficient Substructure Discovery from Large Semi-Structured Data. In: Proc of the 2nd SIMA International Conference on Data Mining. Arlington, USA, 2002, 158-174 [8] Pei J, et al. H-Mine: Hyper-Structure Mining of Frequent Patterns in Large Databases. In: Proc of the 1st International Conference on Data Mining. San Jose, USA, 2001, 441-448 [9] Pei J, et al. PrefixSpan: Mining Sequential Patterns Efficiently by Prefix-Projected Pattern Growth. In: Proc of the International Conference on Data Engineering. Heidelberg, Germany, 2001, 215-224 [10] Ma H B, Zhang J, Ying J F, Yun F H. Mining Frequent Patterns Based on IS+-Tree. In: Proc of the International Conference on Machine Learning and Cybernetics. Shanghai, China, 2004, 497-503 [11] Dehaspe L, et al. Finding Frequent Substructures in Chemical Compounds. In: Proc of the 4th International Conference on Knowledge Discovery and Data Mining. New York, USA, 1998, 30-36 [12] Miyahara T, et al. Discovery of Frequent Tree Structured Patterns in Semistructured Web Documents. In: Proc of the 5th Pacific-Asia Conference on Knowledge Discovery and Data Mining. Hong Kong, China, 2001, 47-52 [13] Yan X, Han J. gSpan: Graph-Based Substructure Pattern Mining. In: Proc of the 2nd International Conference on Data Mining. Maebashi City, Japan, 2002, 721-724 [14] Zaki M J, Aggarwal C C. XRules: An Effective Structural Classifier for XML Data. In: Proc of the 9th International Conference on Knowledge Discovery and Data Mining. Washington, USA, 2003, 316-325 [15] Yan X F, Yu S P, Han J W. Graph Indexing: A Frequent Structure-Based Approach. In: Proc of the ACM SIGMOD International Conference on Management of Data. Paris, Fance, 2004, 335-346