Convolutional Neural Network and User Information Based Model for Microblog Topic Tracking
FU Peng, LIN Zheng, YUAN Fengcheng, LIN Hailun, WANG Weiping, MENG Dan
National Engineering Laboratory for Information Security Technologies,Institute of Information Engineering, Chinese Academy of Sciences, Beijing 100093
Abstract:Aiming at feature sparseness and feature extraction of microblog text, a topic tracking model for Chinese microblog based on convolutional neural network(CNN-TTM) is proposed. Furthermore, user profiles and attributes are incorporated into CNN-TTM and a model called CNN-UserTTM is constructed. The user information of microblog is used to improve the accuracy of topic tracking. The experimental results demonstrate that CNN-TTM and CNN-UserTTM reach a high accuracy respectively on Sina microblog dataset.
付鹏,林政,袁凤程,林海伦,王伟平,孟丹. 基于卷积神经网络和用户信息的微博话题追踪模型*[J]. 模式识别与人工智能, 2017, 30(1): 73-80.
FU Peng, LIN Zheng, YUAN Fengcheng, LIN Hailun, WANG Weiping, MENG Dan. Convolutional Neural Network and User Information Based Model for Microblog Topic Tracking. , 2017, 30(1): 73-80.
[1] CARBONELL J, YANG Y, LAFFERTY J, et al. CMU Report on TDT-2: Segmentation, Detection and Tracking[C/OL].[2016-08-25].https://pdfs.semanticscholar.org/a40c/8ac016d6bf101143fe 526bac3be534f56a9d.pdf?_ga=1.53739176.1039401278.1467942913. [2] XU Y, NING X, GAO X, et al. Quality and Safety News Topic Tracking Algorithm Based on Improved K-Nearest Neighbor[C/OL]. [2016-08-25]. http:// pos. sissa.it/cgi-bin/reader/conf.cgi?confid=264.id.18.2015. [3] LI S D, L X Q, LI Y Q, et al. Study on Feature Selection Algorithm in Topic Tracking // Proc of the 2nd International Conference on Software Engineering and Data Mining. Washington, USA: IEEE, 2010: 384-389. [4] LIN J, SNOW R, MORGAN W. Smoothing Techniques for Adaptive Online Language Models: Topic Tracking in Tweet Streams // Proc of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, USA: ACM, 2011: 422-429. [5] FANG A J, MACDONALD C, OUNIS I, et al. Using Word Embedding to Evaluate the Coherence of Topics from Twitter Data // Proc of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York, USA: ACM, 2016: 1057-1060. [6] COLETTO M, LUCCHESE C, ORLANDO S, et al. Polarized User and Topic Tracking in Twitter // Proc of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York, USA: ACM, 2016: 945-948. [7] PHUVIPADAWAT S, MURATA T. Breaking News Detection and Tracking in Twitter // Proc of the IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology. New York, USA: IEEE, 2010, III: 120-123. [8] OU G B, MURPHEY Y L. Multi-class Pattern Classification Using Neural Networks. Pattern Recognition, 2007, 40(1): 4-18. [9] KIM Y. Convolutional Neural Networks for Sentence Classification[J/OL]. [2016-08-25]. https://arxiv.org/pdf/1408.5882v2.pdf. [10] BENGIO Y, DUCHARME R, VINCENT P, et al. A Neural Probabilistic Language Model. Journal of Machine Learning Research, 2003, 3: 1137-1155. [11] HINTON G E, MCCLELLAND J L, RUMELHART D E. Distributed Representations // RUMELHART D E, MCCLELLAND J L, eds. Parallel Distributed Processing: Explorations in the Microstructure of Cognition. Cambridge, USA: MIT Press, 1986: 77-109. [12] MIKOLOV T, CHEN K, CORRADO G, et al. Efficient Estimation of Word Representations in Vector Space[J/OL]. [2016-08-25]. https://arxiv.org/pdf/1301.3781v3.pdf.