Social Media Text Classification Method Based on Character-Word Feature Self-attention Learning
WANG Xiaoli1, YE Dongyi1
1.College of Mathematics and Computer Science, Fuzhou University, Fuzhou 350108 2.Key Laboratory of Spatial Data Mining and Information Sharing, Ministry of Education, Fuzhou University, Fuzhou 350108
Abstract:Long tail effect and excessive out-of-vocabulary(OOV) words in social media texts result in severe feature sparsity and reduce classification accuracy. To solve the problem, a social media text classification method based on character-word feature self-attention learning is proposed. Global features are constructed at the character level to learn attention weight distribution, and the existing multi-head attention mechanism is improved to reduce parameter scale and computational complexity. To further analyze character-word feature fusion, OOV sensitivity is proposed to measure the impact of OOV words on different types of features. Experiments on several social media text classification tasks indicate that the effectiveness and classification accuracy of the proposed method are obviously improved in terms of fusing word features and character features. Moreover, the quantitative results of OOV vocabulary sensitivity index verify the feasiblity and effectiveness of the proposed method.
王晓莉, 叶东毅. 基于字词特征自注意力学习的社交媒体文本分类方法[J]. 模式识别与人工智能, 2020, 33(4): 287-294.
WANG Xiaoli, YE Dongyi. Social Media Text Classification Method Based on Character-Word Feature Self-attention Learning. , 2020, 33(4): 287-294.
[1] PANG B, LEE L. Opinion Mining and Sentiment Analysis. Foundations and Trends in Information Retrieval, 2008, 2(1/2): 1-135. [2] 赵冬梅,李 雅,陶建华,等.基于协同过滤Attention机制的情感分析模型.中文信息学报, 2018, 32(8): 128-134. (ZHAO D M, LI Y, TAO J H, et al. Sentiment Analysis Based on Collaborative Filter Attention Mechanism. Journal of Chinese Information Processing, 2018, 32(8): 128-134.) [3] XIA R, ZONG C Q, LI S S. Ensemble of Feature Sets and Classification Algorithms for Sentiment Classification. Information Sciences, 2011, 181(6): 1138-1152. [4] SOCHER R, PENNINGTON J, HUANG E H, et al. Semi-supervised Recursive Autoencoders for Predicting Sentiment Distributions[C/OL].[2019-12-25]. https://nlp.stanford.edu/pubs/SocherPenningtonHuangNgManning_EMNLP2011.pdf. [5] 陈 珂,梁 斌,柯文德,等.基于多通道卷积神经网络的中文微博情感分析.计算机研究与发展, 2018, 55(5): 945-957. (CHEN K, LIANG B, KE W D, et al. Chinese Micro-blog Sentiment Analysis Based on Multi-channels Convolutional Neural Networks. Journal of Computer Research and Development, 2018, 55(5): 945-957.) [6] 张 斌,胡琳梅,侯 磊,等.基于词向量的中文事件发现及表示.模式识别与人工智能, 2018, 31(3): 275-282. (ZHANG B, HU L M, HOU L, et al. Word Embedding Based Chinese News Event Detection and Representation. Pattern Recognition and Artificial Intelligence, 2018, 31(3): 275-282.) [7] WANG J P, CONG G, ZHAO W X, et al. Mining User Intents in Twitter: A Semi-supervised Approach to Inferring Intent Categories for Tweets // Proc of the 29th International Joint Conference on Artificial Intelligence. New York, USA: ACM, 2015: 339-345. [8] DING X, LIU T, DUAN J W, et al. Mining User Consumption Intention from Social Media Using Domain Adaptive Convolutional Neural Network // Proc of the 24th International Joint Conference on Artificial Intelligence. New York, USA: ACM, 2015: 2389-2395. [9] DAVIDSON T, WARMSLEY D, MACY M, et al. Automated Hate Speech Detection and the Problem of Offensive Language[C/OL]. [2019-12-25]. https://arxiv.org/pdf/1703.04009.pdf. [10] SHU K, SLIVA A, WANG S H, et al. Fake News Detection on Social Media: A Data Mining Perspective[C/OL]. [2019-12-25]. https://arxiv.org/pdf/1708.01967.pdf. [11] BENGIO Y, DUCHARME R E, VINCENT P, et al. Neural Pro-babilistic Language Model. Journal of Machine Learning Research, 2003, 3(6): 1137-1155. [12] MIKOLOV T, YIH W, ZWEIG G. Linguistic Regularities in Continuous Space Word Representations // Proc of the Conference of the North American Chapter of the Association for Computational Linguistics. Stroudsburg, USA: ACL Press, 2013: 746-751. [13] PENNINGTON J, SOCHER R, MANNING C D. GloVe: Global Vectors for Word Representation // Proc of the Conference on Empirical Methods in Natural Language Processing. Stroudsburg, USA: ACL Press, 2014: 1532-1543. [14] HOCHREITER S, SCHMIDHUBER J. Long Short-Term Memory. Neural Computation, 1997, 9(8): 1735-1780. [15] CHUNG J, GULCEHRE C, CHO K, et al. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling[C/OL]. [2019-12-25]. https://arxiv.org/pdf/1412.3555.pdf. [16] CHO K, VAN MERRIENBOER B, BAHDANAU D, et al. On the Properties of Neural Machine Translation: Encoder-Decoder Approaches[C/OL]. [2019-12-25]. https://arxiv.org/pdf/1409.1259.pdf. [17] LAI S W, XU L H, LIU K, et al. Recurrent Convolutional Neural Networks for Text Classification // Proc of the 29th AAAI Confe-rence on Artificial Intelligence. Palo Alto, USA: AAAI Press, 2015: 2267-2273. [18] LIN Z H, FENG M W, DOS SANTOS C N, et al. A Structured Self-attentive Sentence Embedding[C/OL]. [2019-12-25]. https://arxiv.org/pdf/1703.03130.pdf. [19] VASWANI A, SHAZEER N, PARMAR N, et al. Attention Is All You Need[C/OL]. [2019-12-25]. https://arxiv.org/pdf/1706.03762.pdf. [20] DEVLIN J, CHANG M W, LEE K, et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding[C/OL]. [2019-12-25]. https://arxiv.org/pdf/1810.04805.pdf. [21] ZHANG X, ZHAO J B, LECUN Y. Character-Level Convolutional Networks for Text Classification[C/OL]. [2019-12-25]. https://arxiv.org/pdf/1509.01626v3.pdf. [22] WANG J, WANG Z Y, ZHANG D W, et al. Combining Know-ledge with Deep Convolutional Neural Networks for Short Text Cla-ssification // Proc of the 26th International Joint Conference on Artificial Intelligence. Berlin, Germany: Springer, 2017: 2915-2921. [23] AL-RFOU R, CHOE D, CONSTANT N, et al. Character-Level Language Modeling with Deeper Self-attention // Proc of the AAAI Conference on Artificial Intelligence. Palo Alto, USA: AAAI Press, 2019: 3159-3166. [24] CHUNG J, CHO K, BENGIO Y. A Character-Level Decoder without Explicit Segmentation for Neural Machine Translation // Proc of the 54th Annual Meeting of the Association for Computational Linguistics(Long Papers). Stroudsburg, USA: ACL Press, 2016, I: 1693-1703. [25] GOLUB D, HE X D. Character-Level Question Answering with Attention // Proc of the Conference on Empirical Methods in Natural Language Processing. Stroudsburg, USA: ACL Press, 2016, II: 1598-1607. [26] KIM Y, JERNITE Y, SONTAG D, et al. Character-Aware Neural Language Models // Proc of the 25th International Joint Conference on Artificial Intelligence. New York, USA: ACM, 2016: 2741-2749. [27] DHINGRA B, ZHOU Z, FITZPATRICK D, et al. Tweet2Vec: Character-Based Distributed Representations for Social Media // Proc of the 54th Annual Meeting of the Association for Computational Linguistics (Short Papers). Stroudsburg, USA: ACL Press, 2016, II: 269-274. [28] WANG F, CHEN W, YANG Z, et al. Hybrid Attention for Chinese Character-Level Neural Machine Translation. Neurocompu-ting, 2019, 358: 44-52. [29] LING W, LUIS T, MARUJO L, et al. Finding Function in Form: Compositional Character Models for Open Vocabulary Word Representation // Proc of the Conference on Empirical Methods in Natural Language Processing. Stroudsburg, USA: ACL Press, 2015: 1520-1530. [30] CHEN X X, XU L, LIU Z Y, et al. Joint Learning of Character and Word Embeddings // Proc of the 24th International Joint Conference on Artificial Intelligence. New York, USA: ACM, 2015: 1236-1242. [31] SONG Y, SHI S M, LI J, et al. Directional Skip-Gram: Explicitly Distinguishing Left and Right Context for Word Embeddings // Proc of the Conference of the North American Chapter of the ACL(Human Language Technologies). Stroudsburg, USA: ACL Press, 2018: 175-180. [32] KIM Y. Convolutional Neural Networks for Sentence Classification // Proc of the Conference on Empirical Methods in Natural Language Processing. Stroudsburg, USA: ACL Press, 2014: 1746-1751.