模式识别与人工智能
Monday, Jul. 28, 2025 Home      About Journal      Editorial Board      Instructions      Ethics Statement      Contact Us                   中文
  2009, Vol. 22 Issue (6): 936-940    DOI:
Researches and Applications Current Issue| Next Issue| Archive| Adv Search |
An Improved KNN Text Categorization Algorithm by Adopting Cluster Technology
ZHANG Xiao-Fei, HUANG He-Yan
Research Center of Computer and Language Information Engineering, Chinese Academy of Sciences, Beijing 100097

Download: PDF (0 KB)   HTML (1 KB) 
Export: BibTeX | EndNote (RIS)      
Abstract  k-Nearest Neighbor (KNN) algorithm has the advantage of high accuracy and stability. But the time complexity of KNN is directly proportional to the sample size, its classification speed is low and it is problematic to be put into practice in large-scale information processing. An improved KNN text categorization algorithm is proposed which classifies faster than the traditional KNN does. Firstly, some similar sample documents are combined into a center document through adopting automatic text clustering technology. Then, a large number of original samples are replaced with the small amount of sample cluster centers. Therefore, the calculation amount of KNN is reduced greatly and the classification is speeded up. The experimental results show that the time complexity of the proposed algorithm is decreased by one order of magnitude and its accuracy is approximately equal to those of the SVM and traditional KNN.
Key wordsk-Nearest Neighbor (KNN)      Text Categorization      Text Clustering      Cluster Center      Natural Language Processing (NLP)     
Received: 31 October 2008     
ZTFLH: TP391  
Service
E-mail this article
Add to my bookshelf
Add to citation manager
E-mail Alert
RSS
Articles by authors
ZHANG Xiao-Fei
HUANG He-Yan
Cite this article:   
ZHANG Xiao-Fei,HUANG He-Yan. An Improved KNN Text Categorization Algorithm by Adopting Cluster Technology[J]. , 2009, 22(6): 936-940.
URL:  
http://manu46.magtech.com.cn/Jweb_prai/EN/      OR     http://manu46.magtech.com.cn/Jweb_prai/EN/Y2009/V22/I6/936
Copyright © 2010 Editorial Office of Pattern Recognition and Artificial Intelligence
Address: No.350 Shushanhu Road, Hefei, Anhui Province, P.R. China Tel: 0551-65591176 Fax:0551-65591176 Email: bjb@iim.ac.cn
Supported by Beijing Magtech  Email:support@magtech.com.cn