|
|
Convolutional Neural Network and User Information Based Model for Microblog Topic Tracking |
FU Peng, LIN Zheng, YUAN Fengcheng, LIN Hailun, WANG Weiping, MENG Dan |
National Engineering Laboratory for Information Security Technologies,Institute of Information Engineering, Chinese Academy of Sciences, Beijing 100093 |
|
|
Abstract Aiming at feature sparseness and feature extraction of microblog text, a topic tracking model for Chinese microblog based on convolutional neural network(CNN-TTM) is proposed. Furthermore, user profiles and attributes are incorporated into CNN-TTM and a model called CNN-UserTTM is constructed. The user information of microblog is used to improve the accuracy of topic tracking. The experimental results demonstrate that CNN-TTM and CNN-UserTTM reach a high accuracy respectively on Sina microblog dataset.
|
Received: 19 September 2016
|
|
About author:: FU Peng, born in 1988, Ph.D. candidate. His research interests include natural language processing and machine learning. LIN Zheng(Corresponding author), born in 1984, Ph.D., assistant professor. Her research interests include natural language processing and data mining. YUAN Fengcheng, born in 1992, Ph.D. candidate. His research interests include na-tural language processing and deep learning. LIN Hailun, born in 1987, Ph.D., assistant professor. Her research interests include open knowledge computing and information extraction. WANG Weiping, born in 1975, Ph.D., professor. His research interests include big data storage and management and data analysis. MENG Dan, born in 1965, Ph.D., professor. His research interests include big data storage and management and parallel computing. |
|
|
|
[1] CARBONELL J, YANG Y, LAFFERTY J, et al. CMU Report on TDT-2: Segmentation, Detection and Tracking[C/OL].[2016-08-25].https://pdfs.semanticscholar.org/a40c/8ac016d6bf101143fe 526bac3be534f56a9d.pdf?_ga=1.53739176.1039401278.1467942913. [2] XU Y, NING X, GAO X, et al. Quality and Safety News Topic Tracking Algorithm Based on Improved K-Nearest Neighbor[C/OL]. [2016-08-25]. http:// pos. sissa.it/cgi-bin/reader/conf.cgi?confid=264.id.18.2015. [3] LI S D, L X Q, LI Y Q, et al. Study on Feature Selection Algorithm in Topic Tracking // Proc of the 2nd International Conference on Software Engineering and Data Mining. Washington, USA: IEEE, 2010: 384-389. [4] LIN J, SNOW R, MORGAN W. Smoothing Techniques for Adaptive Online Language Models: Topic Tracking in Tweet Streams // Proc of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, USA: ACM, 2011: 422-429. [5] FANG A J, MACDONALD C, OUNIS I, et al. Using Word Embedding to Evaluate the Coherence of Topics from Twitter Data // Proc of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York, USA: ACM, 2016: 1057-1060. [6] COLETTO M, LUCCHESE C, ORLANDO S, et al. Polarized User and Topic Tracking in Twitter // Proc of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York, USA: ACM, 2016: 945-948. [7] PHUVIPADAWAT S, MURATA T. Breaking News Detection and Tracking in Twitter // Proc of the IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology. New York, USA: IEEE, 2010, III: 120-123. [8] OU G B, MURPHEY Y L. Multi-class Pattern Classification Using Neural Networks. Pattern Recognition, 2007, 40(1): 4-18. [9] KIM Y. Convolutional Neural Networks for Sentence Classification[J/OL]. [2016-08-25]. https://arxiv.org/pdf/1408.5882v2.pdf. [10] BENGIO Y, DUCHARME R, VINCENT P, et al. A Neural Probabilistic Language Model. Journal of Machine Learning Research, 2003, 3: 1137-1155. [11] HINTON G E, MCCLELLAND J L, RUMELHART D E. Distributed Representations // RUMELHART D E, MCCLELLAND J L, eds. Parallel Distributed Processing: Explorations in the Microstructure of Cognition. Cambridge, USA: MIT Press, 1986: 77-109. [12] MIKOLOV T, CHEN K, CORRADO G, et al. Efficient Estimation of Word Representations in Vector Space[J/OL]. [2016-08-25]. https://arxiv.org/pdf/1301.3781v3.pdf. |
|
|
|