模式识别与人工智能
Thursday, Apr. 3, 2025 Home      About Journal      Editorial Board      Instructions      Ethics Statement      Contact Us                   中文
  2006, Vol. 19 Issue (4): 531-537    DOI:
Researches and Applications Current Issue| Next Issue| Archive| Adv Search |
Document Feature Selection Based on the Minimum Term Frequency Threshold
CHEN XiaoYun1,2, LI RongLu1, HU YunFa1
1.Department of Computer and Information Technology, Fudan University, Shanghai 200433
2.School of Mathematics and Computer Science, Fuzhou University, Fuzhou 350002

Download: PDF (431 KB)   HTML (1 KB) 
Export: BibTeX | EndNote (RIS)      
Abstract  In this paper, a novel method of feature evaluation function based on document frequency with the minimum term frequency threshold (DFn) is presented. To decrease the influence of the unrelated features on the system of text categorization, the attribute of the unrelated features is analyzed and the term frequency of the unrelated feature is commonly low. By applying minimum term frequency to filter the low frequency features, the unrelated features are obviously decreased. The experimental results validate the proposed method greatly reduces the number of the unrelated features and effectively improves the accuracy of the text categorization. The improvement to Mutual Information(MI) is very obvious, the Macroaverage F1 value based on DFn is 40% higher than that of Term Frequency, and 15~30% higher than that of Document Frequency(DF).
Key wordsText Classification      Feature Selection      Information Gain      Mutual Information      χ2 Statistic     
Received: 15 November 2004     
ZTFLH: TP311  
Service
E-mail this article
Add to my bookshelf
Add to citation manager
E-mail Alert
RSS
Articles by authors
CHEN XiaoYun
LI RongLu
HU YunFa
Cite this article:   
CHEN XiaoYun,LI RongLu,HU YunFa. Document Feature Selection Based on the Minimum Term Frequency Threshold[J]. , 2006, 19(4): 531-537.
URL:  
http://manu46.magtech.com.cn/Jweb_prai/EN/      OR     http://manu46.magtech.com.cn/Jweb_prai/EN/Y2006/V19/I4/531
Copyright © 2010 Editorial Office of Pattern Recognition and Artificial Intelligence
Address: No.350 Shushanhu Road, Hefei, Anhui Province, P.R. China Tel: 0551-65591176 Fax:0551-65591176 Email: bjb@iim.ac.cn
Supported by Beijing Magtech  Email:support@magtech.com.cn