Abstract:Timely acquiring of hot topics is of great significance for commercial innovation and business marketing. Existing methods mostly need to cope with non-structured data or repeated traversal sample set, which results in high complexity. In this paper, emphasizing the topic statistical properties, a non-parameter method based on structured data is proposed to acquire the hot topics in time. Firstly, diffusion degree and focus degree are introduced to build heat curves to characterize the topics. Then, the varied heat curves are classified to determine the common behaviors of the topics. Finally, the weighted-vote scheme is employed to predict whether a topic is trend or not. The experimental results on Sina microblog show that the proposed method has simple data structure and works well with low time complexity and simple manipulation.
刘业政,杜亚楠,姜元春,杜非. 基于热度曲线分类建模的微博热门话题预测*[J]. 模式识别与人工智能, 2015, 28(1): 27-34.
LIU Ye-Zheng, DU Ya-Nan, JIANG Yuan-Chun, DU Fei. Trend Prediction for Microblog Based on Classification Modeling of Heat Curves. , 2015, 28(1): 27-34.
[1] Culnan M J, McHugh P J, Zubillaga J I. How Large U.S. Companies Can Use Twitter and Other Social Media to Gain Business Value. MIS Quarterly Executive, 2010, 9(4): 243-259 [2] Kwak H, Lee C, Park H, et al. What is Twitter, a Social Network or a News Media? // Proc of the 19th International Conference on World Wide Web. Raleigh, USA, 2010: 591-600 [3] Lee Y H, Wei C P, Cheng T H, et al. Nearest-Neighbor-Based Approach to Time-Series Classification. Decision Support Systems, 2012, 53(1): 207-217 [4] Mathioudakis M, Koudas N. Twittermonitor: Trend Detection over the Twitter Stream // Proc of the ACM SIGMOD International Conference on Management of Data. Indianapolis, USA, 2010: 1155-1158 [5] Cataldi M, Caro L D, Schifanella C. Emerging Topic Detection on Twitter Based on Temporal and Social Terms Evaluation // Proc of the 10th International Workshop on Multimedia Data Mining. Washington, USA, 2010. DOI: 10.1145/1814245.1814249 [6] Guo J, Zhang P, Tan J L, et al. Mining Hot Topics from Twitter Streams. Procedia Computer Science, 2012, 9: 2008-2011 [7] Lu R, Xiang L, Liu M R, et al. Discovering News Topics from Microblogs Based on Hidden Topics Analysis and Text Clustering. Pattern Recognition and Artificial Intelligence, 2012, 25(3): 382-387 (in Chinese) (路 荣,项 亮,刘明荣,等.基于隐主题分析和文本聚类的微博客中新闻话题的发现.模式识别与人工智能, 2012, 25(3): 382-387) [8] Han J, Xie X, Woo W. Context-Based Local Hot Topic Detection for Mobile User [EB/OL]. [引用时间]. http://icserv.gist.ac.kr/mis/publications/data/2010/Pervasive2010.pdf [9] Asur S, Huberman B A, Szabo G, et al. Trends in Social Media: Persistence and Decay // Proc of the 5th International AAAI Confe-rence on Weblogs and Social Media. Barcelona, Spain, 2011: 434 - 437 [10] Yu L, Asur S, Huberman B A. What Trends in Chinese Social Media // Proc of the 5th Workshop on Social Network Mining and Analysis (SNA-KDD). San Diego, USA, 2011: 37-46 [11] Box G E P, Jenkins G M, Reinsel G C. Time Series Analysis: Forecasting and Control. 4th Edition. Hoboken, USA: John Wiley & Sons, 2013 [12] Dakos V, Carpenter S R, Brock W A, et al. Methods for Detecting Early Warnings of Critical Transitions in Time Series Illustrated Using Simulated Ecological Data. PLoS One, 2012, 7(7): 1-8 [13] Yang J, Leskovec J. Patterns of Temporal Variation in Online Media // Proc of the 4th International Conference on Web Search and Data Mining. Hong Kong, China, 2011: 177-186 [14] Han Z M, Chen N, Le J J, et al. An Efficient and Effective Clustering Algorithm for Time Series of Hot Topics. Chinese Journal of Computers, 2012, 35(11): 2337-2347 (in Chinese) (韩忠明,陈 妮 乐嘉锦,等.面向热点话题时间序列的有效聚类算法研究.计算机学报, 2012, 35(11): 2337-2347) [15] Chen G H, Nikolov S, Shah D. A Latent Source Model for Nonparametric Time Series Classification. [EB/OL]. [2013-12-03]. http://arxiv.org/pdf/1302.3936.pdf [16] Nikolov S. Trend or No Trend: A Novel Nonparametric Method for Classifying Time Series. Master Dissertation. Cambridge, USA: Massachusetts Institute of Technology, 2012 [17] Hou H, Andrews H C. Cubic Splines for Image Interpolation and Digital Filtering. IEEE Trans on Acoustics, Speech, and Signal Processing, 1978, 26(6): 508-517 [18] Efrat A, Fan Q F, Venkatasubramanian S. Curve Matching, Time Warping, and Light Fields: New Algorithms for Computing Similarity between Curves. Journal of Mathematical Imaging and Vision, 2007, 27(3): 203-216 [19] Rodriguez W, Last M, Kandel A, et al. 3-Dimensional Curve Similarity Using String Matching. Robotics and Autonomous Systems, 2004, 49(3/4): 165-172 [20] Karypis G, Han E H, Kumar V. Chameleon: Hierarchical Clus-tering Using Dynamic Modeling. Computer, 1999, 32(8): 68-75