模式识别与人工智能
Friday, Apr. 11, 2025 Home      About Journal      Editorial Board      Instructions      Ethics Statement      Contact Us                   中文
  2015, Vol. 28 Issue (11): 992-1001    DOI: 10.16451/j.cnki.issn1003-6059.201511005
Papers and Reports Current Issue| Next Issue| Archive| Adv Search |
n-grams Features Weighting Algorithm Based on Relevance and Semantic
QIU Yun-Fei1, LIU Shi-Xing1, LIN Ming-Ming1, SHAO Liang-Shan2
1.School of Software, Liaoning Technical University, Huludao 125105
2.System Engineering Institute, Liaoning Technical University, Huludao 125105

Download: PDF (901 KB)   HTML (1 KB) 
Export: BibTeX | EndNote (RIS)      
Abstract  When n-grams are considered as text classification features, the classification accuracy is decreased. The redundancy and relevance between words are ignored while n-grams are weighted. Thus, n-grams features weighting algorithm based on relevance and semantic is proposed. To decrease the internal redundancy, feature reduction is conducted to n-grams during text preprocessing. Then, n-grams are weighted according to the relevance of words and classes in n-grams and the semantic similarity of n-grams and testing dataset. The experimental results on Sougo Chinese news corpse and NetEase text corpse show that the proposed algorithm can select n-grams features of high relevance and low redundancy, and reduce the sparse data while quantifying the testing dataset.
Key wordsMaximum Relevance Minimum Redundancy (mRMR)      Semantic Similarity      n-grams      Feature Weighting     
Received: 30 April 2014     
ZTFLH: TP 391.1  
Service
E-mail this article
Add to my bookshelf
Add to citation manager
E-mail Alert
RSS
Articles by authors
QIU Yun-Fei
LIU Shi-Xing
LIN Ming-Ming
SHAO Liang-Shan
Cite this article:   
QIU Yun-Fei,LIU Shi-Xing,LIN Ming-Ming等. n-grams Features Weighting Algorithm Based on Relevance and Semantic[J]. , 2015, 28(11): 992-1001.
URL:  
http://manu46.magtech.com.cn/Jweb_prai/EN/10.16451/j.cnki.issn1003-6059.201511005      OR     http://manu46.magtech.com.cn/Jweb_prai/EN/Y2015/V28/I11/992
Copyright © 2010 Editorial Office of Pattern Recognition and Artificial Intelligence
Address: No.350 Shushanhu Road, Hefei, Anhui Province, P.R. China Tel: 0551-65591176 Fax:0551-65591176 Email: bjb@iim.ac.cn
Supported by Beijing Magtech  Email:support@magtech.com.cn