Semi-supervised Self-training for Multiple Standpoint Analysis in Social Events
LIN Junjie1,2, WANG Lei1, MAO Wenji1,2
1.State Key Laboratory for Management and Control of Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing 100190 2.School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 100049
Abstract:Existing methods for standpoint analysis mainly train standpoint classification models in a supervised or unsupervised manner. It usually needs a large number of labeled data to support the training of supervised models. In contrast, the performance of unsupervised models differs greatly from that of the supervised models. To reduce the demand of labeled data in model training, and meanwhile to ensure model performance, this paper proposes a semi-supervised self-training method for multiple standpoint analysis based on social media texts related to social events. For self-training methods, selecting and adding high-quality data to the training dataset play a key role in improving the performance of classification models during the iterative training process. The proposed method first measures the classification confidence of texts based on user-level standpoint consistency. It then leverages topic information to select high-quality texts to expand the training dataset, so as to constantly improve the performance of the model. Experimental results show that the proposed method can achieve better performance in standpoint classification compared with the representative methods in the related work as well as other semi-supervised model training methods. In addition, both the user-level standpoint consistency and topic information used in the method contribute to improve the performance of standpoint classification.
林俊杰, 王磊, 毛文吉. 面向社会事件的半监督自训练多方立场分析[J]. 模式识别与人工智能, 2018, 31(12): 1074-1084.
LIN Junjie, WANG Lei, MAO Wenji. Semi-supervised Self-training for Multiple Standpoint Analysis in Social Events. , 2018, 31(12): 1074-1084.
[1] MOHAMMAD S M, SOBHANI P, KIRITCHENKO S. Stance and Sentiment in Tweets. ACM Transactions on Internet Technology, 2017, 17(3). DOI:10.1145/3003433. [2] ABBOTT R, WALKER M, ANAND P, et al. How Can You Say Such Things?!?: Recognizing Disagreement in Informal Political Argument // Proc of the Workshop on Languages in Social Media. Stroudsburg, USA: ACL, 2011: 2-11. [3] SOMASUNDARAN S, WIEBE J. Recognizing Stances in Ideological On-line Debates // Proc of the NAACL HLT Workshop on Computational Approaches to Analysis and Generation of Emotion in Text. Stroudsburg, USA: ACL, 2010: 116-124. [4] HASAN K S, NG V. Stance Classification of Ideological Debates: Data, Models, Features, and Constraints // Proc of the 6th International Joint Conference on Natural Language Processing. Stroudsburg, USA: ACL, 2013: 1348-1356. [5] AUGENSTEIN I, ROCKTÄSCHEL T, VLACHOS A, et al. Stance Detection with Bidirectional Conditional Encoding // Proc of the Conference on Empirical Methods in Natural Language Processing. Stroudsburg, USA: ACL, 2016: 876-885. [6] WEI W, ZHANG X, LIU X Q, et al. Pkudblab at SemEval-2016 Task 6: A Specific Convolutional Neural Network System for Effective Stance Detection // Proc of the 10th International Workshop on Semantic Evaluation. Stroudsburg, USA: ACL, 2016: 384-388. [7] ANAND P, WALKER M, ABBOTT R, et al. Cats Rule and Dogs Drool!: Classifying Stance in Online Debate // Proc of the 2nd Workshop on Computational Approaches To Subjectivity and Sentiment Analysis. Stroudsburg, USA: ACL, 2011: 1-9. [8] WALKER M A, ANAND P, ABBOTT R, et al. Stance Classification Using Dialogic Properties of Persuasion // Proc of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg, USA: ACL, 2012: 592-596. [9] QIU M H, YANG L, JIANG J. Modeling Interaction Features for Debate Side Clustering // Proc of the 22nd ACM International Conference on Information & Knowledge Management. New York, USA: ACM, 2013: 873-878. [10] SRIDHAR D, GETOOR L, WALKER M. Collective Stance Classification of Posts in Online Debate Forums // Proc of the Joint Workshop on Social Dynamics and Personal Attributes in Social Media. New York, USA: ACM, 2014: 109-117. [11] SRIDHAR D, FOULDS J, HUANG B, et al. Joint Models of Disagreement and Stance in Online Debate // Proc of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing. Stroudsburg, USA: ACL, 2015: 116-125. [12] EBRAHIMI J, DOU D J, LOWD D. Weakly Supervised Tweet Stance Classification by Relational Bootstrapping // Proc of the Conference on Empirical Methods in Natural Language Processing. Stroudsburg, USA: ACL, 2016: 1012-1017. [13] QIU M H, SIM Y C, SMITH N A, et al. Modeling User Arguments, Interactions, and Attributes for Stance Prediction in Online Debate Forums // Proc of the SIAM International Conference on Data Mining. Philadelphia, USA: SIAM, 2015: 855-863. [14] HASAN K S, NG V. Extra-Linguistic Constraints on Stance Recognition in Ideological Debates // Proc of the 51st Annual Meeting of the Association for Computational Linguistics. Stroudsburg, USA: ACL, 2013: 816-821. [15] JOHNSON K, GOLDWASSER D. "All I Know about Politics Is What I Read in Twitter": Weakly Supervised Models for Extracting Politicians' Stances from Twitter // Proc of the 26th International Conference on Computational Linguistics. Stroudsburg, USA: ACL, 2016: 2966-2977. [16] EBRAHIMI J, DOU D J, LOWD D. A Joint Sentiment-Target-Stance Model for Stance Classification in Tweets // Proc of the 26th International Conference on Computational Linguistics. Stroudsburg, USA: ACL, 2016: 2656-2665. [17] HAMMER H L, SOLBERG P E, ØVRELID L. Sentiment Classification of Online Political Discussions: A Comparison of a Word-Based and Dependency-Based Method // Proc of the 5th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis. Stroudsburg, USA: ACL, 2014: 90-96. [18] RINGSQUANDL M, PETKOVIC' D. Analyzing Political Sentiment on Twitter // Proc of the AAAI Spring Symposium. Palo Alto, USA: AAAI Press, 2013: 40-47. [19] HASAN K S, NG V. Why Are You Taking This Stance? Identifying and Classifying Reasons in Ideological Debates // Proc of the Conference on Empirical Methods in Natural Language Processing. Stroudsburg, USA: ACL, 2014: 751-762. [20] BOLTUIĆ F, SˇNAJDER J. Back up Your Stance: Recognizing Arguments in Online Discussions // Proc of the 1st Workshop on Argumentation Mining. Stroudsburg, USA: ACL, 2014: 49-58. [21] QIU M H, JIANG J. A Latent Variable Model for Viewpoint Discovery from Threaded Forum Posts // Proc of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg, USA: ACL, 2013: 1031-1040. [22] THONET T, CABANAC G, BOUGHANEM M, et al. VODUM: A Topic Model Unifying Viewpoint, Topic and Opinion Discovery // Proc of the European Conference on Information Retrieval. Berlin, Germany: Springer, 2016: 533-545. [23] TRABELSI A, ZAÏANE O R. A Joint Topic Viewpoint Model for Contention Analysis // Proc of the International Conference on Applications of Natural Language to Data Bases/Information Systems. Berlin, Germany: Springer, 2014: 114-125. [24] LIN W H, WILSON T, WIEBE J, et al. Which Side Are You on?: Identifying Perspectives at the Document and Sentence Levels // Proc of the 10th Conference on Computational Natural Language Learning. Stroudsburg, USA: ACL, 2006: 109-116. [25] LIN J J, MAO W J, ZHANG Y H. An Enhanced Topic Modeling Approach to Multiple Stance Identification // Proc of the 26th ACM Conference on Information and Knowledge Management. New York, USA: ACM, 2017: 2167-2170. [26] ZHU X J. Semi-supervised Learning Literature Survey. Technical Report, 1530. Madison, USA: University of Wisconsin-Madison, 2008. [27] ZHAO W X, JIANG J, WENG J S, et al. Comparing Twitter and Traditional Media Using Topic Models // Proc of the European Conference on Information Retrieval. Berlin, Germany: Springer, 2011: 338-349. [28] SUN F Z, ZHANG K. NMF-Based Method of Text Classification // Proc of the 8th World Congress on Intelligent Control and Automation. Washington, USA: IEEE, 2010: 4312-4316. [29] XU W, LIU X, GONG Y H. Document Clustering Based on Non-negative Matrix Factorization // Proc of the 26th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval. New York, USA: ACM, 2003: 267-273.