模式识别与人工智能
Wednesday, Apr. 2, 2025 Home      About Journal      Editorial Board      Instructions      Ethics Statement      Contact Us                   中文
  2020, Vol. 33 Issue (4): 287-294    DOI: 10.16451/j.cnki.issn1003-6059.202004001
Papers and Reports Current Issue| Next Issue| Archive| Adv Search |
Social Media Text Classification Method Based on Character-Word Feature Self-attention Learning
WANG Xiaoli1, YE Dongyi1
1.College of Mathematics and Computer Science, Fuzhou University, Fuzhou 350108
2.Key Laboratory of Spatial Data Mining and Information Sharing, Ministry of Education, Fuzhou University, Fuzhou 350108

Download: PDF (654 KB)   HTML (1 KB) 
Export: BibTeX | EndNote (RIS)      
Abstract  Long tail effect and excessive out-of-vocabulary(OOV) words in social media texts result in severe feature sparsity and reduce classification accuracy. To solve the problem, a social media text classification method based on character-word feature self-attention learning is proposed. Global features are constructed at the character level to learn attention weight distribution, and the existing multi-head attention mechanism is improved to reduce parameter scale and computational complexity. To further analyze character-word feature fusion, OOV sensitivity is proposed to measure the impact of OOV words on different types of features. Experiments on several social media text classification tasks indicate that the effectiveness and classification accuracy of the proposed method are obviously improved in terms of fusing word features and character features. Moreover, the quantitative results of OOV vocabulary sensitivity index verify the feasiblity and effectiveness of the proposed method.
Key wordsSocial Media Text Classification      Self-attention Learning      Character-Word Feature Fusion      Out of Vocabulary Sensitivity     
Received: 02 January 2020     
ZTFLH: TP 391  
Fund:Supported by National Natural Science Foundation of China(No. 61672158), Industry-University Cooperation Foundation of Fujian Province(No.2018H6010)
Corresponding Authors: YE Dongyi , Ph.D.,professor. His research interests include computational intelligence, data mining and natural language processing.   
About author:: WANG Xiaoli, Ph.D. candidate. Her research interests include computational intelligence and natural language processing.
Service
E-mail this article
Add to my bookshelf
Add to citation manager
E-mail Alert
RSS
Articles by authors
WANG Xiaoli
YE Dongyi
Cite this article:   
WANG Xiaoli,YE Dongyi. Social Media Text Classification Method Based on Character-Word Feature Self-attention Learning[J]. , 2020, 33(4): 287-294.
URL:  
http://manu46.magtech.com.cn/Jweb_prai/EN/10.16451/j.cnki.issn1003-6059.202004001      OR     http://manu46.magtech.com.cn/Jweb_prai/EN/Y2020/V33/I4/287
Copyright © 2010 Editorial Office of Pattern Recognition and Artificial Intelligence
Address: No.350 Shushanhu Road, Hefei, Anhui Province, P.R. China Tel: 0551-65591176 Fax:0551-65591176 Email: bjb@iim.ac.cn
Supported by Beijing Magtech  Email:support@magtech.com.cn