1.School of Computer Science and Technology, Guangdong University of Technology, Guangzhou 510006. 2.School of Mathematics and Big Data, Foshan University, Foshan 528000
Abstract:Information diffusion modeling is the basis of the community mining and community influence research. Based on a user interest related information diffusion model, a microscopic pattern mining method is proposed to detect the information diffusion features using frequent subtree mining in this paper. Firstly, microscopic information diffusion pattern is converted into frequent subtrees mining by formulating social network in microblog as a series of graphs with users multiple labels. In terms of the microblog social network characteristics of multiple labels on single node, an efficient frequent subtrees mining algorithm on the tree with multiple labels tree miner (MLTreeMiner) is proposed. Finally, combined with topic information extraction method, MLTreeMiner is used to mine information diffusion patterns. Experiments on synthetic data demonstrate that MLTreeMiner is efficient for frequent subtrees mining on the tree with multiple labels. Experiments are also carried out on real data from Sina Weibo, and the validity of the MLTreeMinner is verified.
[1] G TZ M, LESKOVEC J, MCGLOHON M, et al. Modeling Blog Dynamics // Proc of the 3rd International ICWSM Conference. Menlo Park, USA: AAAI Press, 2009: 26-33. [2] GOMEZ-RODRIGUEZ M, LESKOVEC J, KRAUSE A. Inferring Networks of Diffusion and Influence. ACM Trans on Knowledge Discovery from Data, 2010, 5(4): 1019-1028. [3] GOMEZ-RODRIGUEZ M, BALDUZZI D, SCH LKOPF B. Uncovering the Temporal Dynamics of Diffusion Networks // Proc of the 28th International Conference on Machine Learning. New York, USA: ACM, 2011: 561-568. [4] EGHLIDI N A, AFSHAR A, ASHENAGAR B, et al. A Lightweight Method to Investigate Unknown Social Network Structure // Proc of the 5th International Conference on Computer and Knowledge Engineering. New York, USA: IEEE, 2015: 262-267. [5] TSUR O, RAPPOPORT A. What's in a Hashtag?: Content Based Prediction of the Spread of Ideas in Microblogging Communities // Proc of the 5th ACM International Conference on Web Search and Data Mining. New York, USA: ACM, 2012: 643-652. [6] YANG Z, GUO J Y, CAI K K, et al. Understanding Retweeting Behaviors in Social Networks // Proc of the 19th International Conference on Information and Knowledge Management. New York, USA: ACM, 2010: 1633-1636. [7] PENG H K, ZHU J, PIAO D Z, et al. Retweet Modeling Using Conditional Random Fields // Proc of the 11th IEEE International Conference on Data Mining Workshops. Washington, USA: IEEE, 2011: 336-343. [8] BARABSI A L, ALBERT R. Emergence of Scaling in Random Networks. Science, 1999, 286(5439): 509-512. [9] JIN E M, GIRVAN M, NEWMAN M E J. The Structure of Growing Social Networks. Physical Review Letters, 2001, 8: 132-136. [10] NEWMAN M E J. The Structure and Function of Complex Networks. SIAM Review, 2003, 45(2): 167-256. [11] HONG L J, DOUMITH A S, DAVISON B D. Co-factorization Machines: Modeling User Interests and Predicting Individual Decisions in Twitter // Proc of the 6th ACM International Conference on Web Search and Data Mining. New York, USA: ACM, 2013: 557-566. [12] LIN C X, MEI Q Z, HAN J W, et al. The Joint Inference of Topic Diffusion and Evolution in Social Communities // Proc of the 13th IEEE International Conference on Data Mining. Washington, USA: IEEE, 2011: 378-387. [13] LESKOVEC J, BACKSTROM L, KLEINBERG J. Meme-Tracking and the Dynamics of the News Cycle // Proc of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, USA: ACM, 2009: 497-506. [14] LIBEN-NOWELL D, KLEINBERG J. Tracing Information Flow on a Global Scale Using Internet Chain-Letter Data. Proceedings of the National Academy of Sciences of the United States of America, 2008, 105(12): 4633-4638. [15] ZAKI M J. Efficiently Mining Frequent Trees in a Forest: Algorithms and Application. IEEE Trans on Knowledge and Data Engineering, 2005, 17(8): 1021-1035. [16] BLEI D M, NG A Y, JORDAN M I. Latent Dirichlet Allocation. Journal of Machine Learning Research, 2003, 3: 993-1022. [17] 张晨逸,孙建伶,丁轶群.基于MB-LDA模型的微博主题挖掘.计算机研究与发展, 2011, 48(10): 1795-1802. (ZHANG C Y, SUN J L, DING Y Q. Topic Mining for Microblog Based on MB-LDA Model. Journal of Computer Research and Development, 2011, 48(10): 1795-1802.) [18] ZHAO W X, JIANG J, WENG J S, et al. Comparing Twitter and Traditional Media Using Topic Models // Proc of the 33rd Euro-pean Conference on IR Research. Berlin, Germany: Springer-Verlag, 2011: 338-349. [19] INOKUCHI A, WASHIO T, MOTODA H. An Apriori-Based Algorithm for Mining Frequent Substructures from Graph Data // Proc of the 4th European Conference on Principles of Data Mining and Knowledge Discovery. Berlin, Germany: Springer-Verlag, 2000: 13-23. [20] KURAMOCHI M, KARYPIS G. Frequent Subgraph Discovery // Proc of the 1st IEEE Internation Conference on Data Mining. Washington, USA: IEEE, 2001: 313-320. [21] YAN X F, HAN J W. gSpan: Graph-Based Substructure Pattern Mining // Proc of the 2nd IEEE Internation Conference on Data Mining. Washington, USA: IEEE, 2002: 721-724. [22] NIJSSEN S, KOK J N. Efficient Discovery of Frequent Unordered Trees [C/OL]. [2016-04-22]. http://www.ar.sanken.osakatc.ac.jp/~washio/lict/6.pdf. [23] CHI Y, YANG Y R, MUNTZ R R. Indexing and Mining Free Trees // Proc of the 3rd IEEE Internation Conference on Data Mining. Wa-shington, USA: IEEE, 2003: 509-512. [24] DEEPAK A, FERNNDEZ-BACA D, TIRTHAPURA S, et al. EvoMiner: Frequent Subtree Mining in Phylogenetic Databases. Knowledge and Information Systems, 2014, 41(3): 559-590. [25] MOUGEL P N, RIGOTTI C, GANDRILLON O. Finding Collections of k-Clique Percolated Components in Attributed Graphs // Proc of the 16th Pacific-Asia Conference on Advances in Knowledge Discovery & Data Mining. Berlin, Germany: Springer-Verlag, 2012: 181-192. [26] MIKOLOV T, SUTSKEVER I, CHEN K, et al. Distributed Representations of Words and Phrases and Their Compositionality // BURGES C J C, BOTTOU L, WELLING M, et al., eds. Advances in Neural Information Processing Systems 26. Cambridge, USA: MIT Press, 2013: 3111-3119.