Evolutionary Algorithm Based Pattern Discovery in Graphical Databases
CHANG XinGong1,2, LI MingQiang1, KOU JiSong1
1.School of Management, Tianjin University, Tianjin 3000722. Faculty of Information and Management, Shanxi University of Finance and Economics, Taiyuan 030006
Abstract:The greedy search is often used in some existing prevalent graphical data mining systems which often ends up with suboptimal solutions. To overcome its limits, an evolutionary algorithms based system is developed to perform data mining on databases represented as graphs. New operators of mutation and crossover on graphical databases are defined, and the way of collecting instances of a certain substructure is improved. In addition, a variant of hillclimbing is integrated into the design of mutation operator to improve the capability of local search of evolutionary algorithm. Experimental results show that these measures successfully improve the searching capability of the algorithm and the qualities of solutions.
[1] Inokuchi A, Washio T, Motoda H. An AprioriBased Algorithm for Mining Frequent Substructures from Graph Data // Proc of the 4th European Conference on Principles and Practices of Knowledge Discovery in Databases. Lyon,France, 2000: 1323 [2] Kuramochi M, Karypis G. An Efficient Algorithm for Discovering Frequent Subgraphs. IEEE Trans on Knowledge and Data Engineering, 2004, 16(9): 10381051 [3] Yan Xifeng, Han Jiawei. Gspan: GraphBased Substructure Pattern Mining // Proc of the IEEE International Conference on Data Mining. Maebashi City, Japan, 2002: 721724 [4] Dehaspe L, Toivonen H. Discovery of Frequent Datalog Patterns. Data Mining and Knowledge Discovery, 1999, 3(1): 736 [5] de Raedt L, Kramer S. The Levelwise Version Space Algorithm and Its Application to Molecular Fragment Finding // Proc of the 17th International Joint Conferences on Artificial Intelligence. Seattle, USA, 2001: 853862 [6] Cook D J, Holder L B. Substructure Discovery Using Minimum Description Length and Background Knowledge. Journal of Artificial Intelligence Research, 1994, 1: 231255 [7] Jonyer I, Cook D J, Holder L B. GraphBased Hierarchical Conceptual Clustering. International Journal of Artificial Intelligence Tools, 2001, 10(1/2): 107135 [8] Bandyopadhyay S, Maulik U, Cook D J, et al. Enhancing Structure Discovery for Data Mining in Graphical Databases Using Evolutionary Programming // Proc of the 15th International Florida Artificial Intelligence Research Society Conference. Pensacola Beach, USA, 2002: 232236 [9] Grunwald P. A Tutorial Introduction to the Minimum Description Length Principle [EB/OL]. [20060801]. http://homepages.cwi.nl/pdg/ftp/mdlintro.pdf [10]Li Minqiang, Kou Jisong, Lin Dan, et al. Genetic Algorithms'Theory and Applications. Beijing, China: Science Press, 2002 (in Chinese) (李敏强,寇纪淞,林 丹,等.遗传算法的基本理论与应用.北京:科学出版社, 2002)