Abstract:Aiming at the problems of low efficiency and accuracy of data mining, a graph mining model using Markov clustering based on annular network motifs is proposed. Firstly, the Erdo″s-Rényi model is employed to generate random graphs according to the vertices set of the input graph. Annular sub-graphs are judged by the additive property of vectors in the process of sub-graph mining from input and random graphs. In the next step, the motif statistical characteristics are calculated and used to label the annular motifs. Then, the correlation matrix of absolute contribution of edges is solved in the graph, and the threshold is obtained by dynamic threshold method for the binarization of matrix. Finally, inflation and expansion processes are carried out on the sparsified graph data to achieve the state of convergence. Experimental results show that the proposed model can effectively reduce the running time and improve the mining efficiency of the graph with the guaranteed clustering quality.
任永功,索全明,刘洋. 基于环型网络模体应用马尔科夫聚类的图挖掘模型*[J]. 模式识别与人工智能, 2017, 30(9): 803-814.
REN Yonggong, SUO Quanming, LIU Yang. Graph Mining Model Using Markov Clustering Based on Annular Network Motifs. , 2017, 30(9): 803-814.
[1] BOYD D M, ELLISON N B. Social Network Sites: Definition, History, and Scholarship. Journal of Computer-Mediated Communication, 2007, 13(1): 210-230. [2] CHAN E Y, SAQIB N U. Online Social Networking Increases Financial Risk-Taking. Computers in Human Behavior, 2015, 51(A): 224-231. [3] WU Y B, ZHU X F, LI L, et al. Mining Dual Networks: Models, Algorithms, and Applications. ACM Transactions on Knowledge Discovery from Data, 2016, 10(4). DOI: 10.1145/2785970. [4] DJIDJEV H N, ONUS M. Scalable and Accurate Graph Clustering and Community Structure Detection. IEEE Transactions on Parallel and Distributed Systems, 2013, 24(5): 1022-1029. [5] 袁 超,柴 毅.复杂网络的局部社团结构挖掘算法.自动化学报, 2014, 40(5): 921-934. (YUAN C, CHAI Y. Method for Local Community Mining in the Complex Networks. Acta Automatica Sinica, 2014, 40(5): 921-934.) [6] LIU J L, WANG C, DANILEVSKY M, et al. Large-Scale Spectral Clustering on Graphs // Proc of the 23rd International Joint Conference on Artificial Intelligence. Palo Alto, USA: AAAI Press, 2013: 1486-1492. [7] REN J, WANG J X, LI M, et al. Identifying Protein Complexes Based on Density and Modularity in Protein-Protein Interaction Network. BMC Systems Biology, 2013, 7(S4). DOI: 10.1186/1752-0509-7-S4-S12. [8] DHILLON I S, GUAN Y Q, KULIS B. Weighted Graph Cuts without Eigenvectors: A Multilevel Approach. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2007, 29(11): 1944-1957. [9] CUI W Y, XIAO Y H, WANG H X, et al. Local Search of Communities in Large Graphs // Proc of the ACM SIGMOD International Conference on Management of Data. New York, USA: ACM, 2014: 991-1002. [10] SCHAEFFER S E. Survey: Graph Clustering. Computer Science Review, 2007, 1(1): 27-64. [11] ZHAO P X. gSparsify: Graph Motif Based Sparsification for Graph Clustering // Proc of the 24th ACM International Conference on Information and Knowledge Management. New York, USA: ACM, 2015: 373-382. [12] 柴变芳,赵晓鹏,贾彩燕,等.大规模网络的三角形模体社区发现模型.南京大学学报(自然科学), 2014, 50(4): 466-473. (CHAI B F, ZHAO X P, JIA C Y, et al. A Model for Community Detection Based on Triangular Motifs in Large-Scale Networks. Journal of Nanjing University(Natural Science), 2014, 50(4): 466-473.) [13] ORECCHIA L, ZHU Z A. Flow-Based Algorithms for Local Graph Clustering // Proc of the 25th Annual ACM-SIAM Symposium on Discrete Algorithms. Philadelphia, USA: Society for Industrial and Applied Mathematics, 2013: 1267-1286. [14] MILO R, SHEN-ORR S S, ITZKOVITZ S, et al. Network Motifs: Simple Building Blocks of Complex Networks. Science, 2002, 298(5594): 824-827. [15] SZE S H, GELFAND M S, PEVZNER P A. Finding Weak Motifs in DNA Sequences // Proc of the Pacific Symposium on Biocomputing. London, UK: World Scientific, 2002: 235-246. [16] LIANG S, SAMANTA M P, BIEGEL B A. cWINNOWER Algorithm for Finding Fuzzy DNA Motifs. Journal of Bioinformatics and Computational Biology, 2004, 2(1): 47-60. [17] FRATKIN E, NAUGHTON B T, BRUTLAG D L, et al. MotifCut: Regulatory Motifs Finding with Maximum Density Subgraphs. Bioinformatics, 2006, 22(14): e150-e157. [18] NEUWALD A F, LIU J S, LAWRENCE C E. Gibbs Motif Sampling: Detection of Bacterial Outer Membrane Protein Repeats. Protein Science, 1995, 4(8): 1618-1632. [19] ROTH F P, HUGHES J D, ESTEP P W, et al. Finding DNA Re-gulatory Motifs within Unaligned Noncoding Sequences Clustered by Whole-Genome mRNA Quantitation. Nature Biotechnology, 1998, 16(10): 939-945. [20] BADER D A, MEYERHENKE H, SANDERS P, et al. Benchmarking for Graph Clustering and Partitioning // ALHAJJ R, ROKNE J, eds. Encyclopedia of Social Network Analysis and Mi-ning. New York, USA: Springer-Verlag, 2014: 73-82. [21] AXELBERG P G V, GU I Y H, BOLLEN M H J. Support Vector Machine for Classification of Voltage Disturbances. IEEE Transactions on Power Delivery, 2007, 22(3): 1297-1303. [22] ALBERT R, BARABSI A L. Statistical Mechanics of Complex Networks[J/OL]. [2016-12-20]. https://arxiv.org/pdf/cond-mat/0106096.pdf. [23] 张淮声,张佑生,方贤勇.基于矢量积的二维封闭图形轮廓信息提取方法.计算机工程与应用, 2002, 38(8): 93-94. (ZHANG H S, ZHANG Y S, FANG X Y. The Method of Pick-Upping Outline Information in 2D Closed Graph by Vector Multiplication. Computer Engineering and Applications, 2002, 38(8): 93-94.) [24] LAHAV G, ROSENFELD N, SIGAL A, et al. Dynamics of the p53-Mdm2 Feedback Loop in Individual Cells. Nature Genetics, 2004, 36(2): 147-150. [25] MILO R, ITZKOVITZ S, KASHTAN N, et al. Superfamilies of Evolved and Designed Networks. Science, 2004, 303(5663): 1538-1542. [26] OTSU N. A Threshold Selection Method from Gray-Level Histograms. IEEE Transactions on Systems, Man, and Cybernetics, 1979, 9(1): 62-66. [27] ZACHARY W W. An Information Flow Model for Conflict and Fi-ssion in Small Groups. Journal of Anthropological Research, 1977, 33(4): 452-473. [28] BERNARDES J S, VIEIRA F R, COSTA L M, et al. Evaluation and Improvements of Clustering Algorithms for Detecting Remote Homologous Protein Families. BMC Bioinformatics, 2015, 16(1). DOI: 10.1186/s12859-014-0445-4. [29] 殷新春.数据结构学习与解题指南.武汉:华中科技大学出版社, 2001.
(YIN X C. Guide to Data Structure Learning and Problem Solving. Wuhan, China: Huazhong University of Science and Technology Press, 2001.) [30] XENARIOS I, SALWINSKI , DUAN X J, et al. DIP, the Database of Interacting Proteins: A Research Tool for Studying Cellular Networks of Protein Interactions. Nucleic Acids Research, 2002, 30(1): 303-305. [31] 刘珂男,童 薇,冯 丹,等.一种灵活高效的虚拟CPU调度算法.软件学报, 2017, 28(2): 398-410. (LIU K N, TONG W, FENG D, et al. A Flexible and Efficient VCPU Scheduling Algorithm. Journal of Software, 2017, 28(2): 398-410.)