模式识别与人工智能
Wednesday, Apr. 2, 2025 Home      About Journal      Editorial Board      Instructions      Ethics Statement      Contact Us                   中文
  2014, Vol. 27 Issue (2): 141-145    DOI:
Papers and Reports Current Issue| Next Issue| Archive| Adv Search |
New Words Discovery in Microblog Content
HUO Shuai, ZHANG Min, LIU Yi-Qun, MA Shao-Ping
State Key Laboratory of Intelligent Technology and Systems, Beijing 100084
Tsinghua National Laboratory for Information Science and Technology, Beijing 100084
Department of Computer Science and Technology, Tsinghua University, Beijing 100084

Download: PDF (388 KB)   HTML (1 KB) 
Export: BibTeX | EndNote (RIS)      
Abstract  New words discovery is of great significance in the field of natural language processing. It is more difficult to find new words in microblog than in other corpus. In this paper, an algorithm based on context entropy is proposed, and the new word candidates are filtered based on the context. To improve the precision, lexical features are introduced and an algorithm combining them with term frequency is put forward. Thus, the precision rate and the recall rate are greatly improved, and the F-measure value is up to 89.6%.
Key wordsNew Word Discovery      Context Entropy      Unknown Words Extraction     
Received: 13 May 2013     
ZTFLH: TP 391.1  
Service
E-mail this article
Add to my bookshelf
Add to citation manager
E-mail Alert
RSS
Articles by authors
HUO Shuai
ZHANG Min
LIU Yi-Qun
MA Shao-Ping
Cite this article:   
HUO Shuai,ZHANG Min,LIU Yi-Qun等. New Words Discovery in Microblog Content[J]. , 2014, 27(2): 141-145.
URL:  
http://manu46.magtech.com.cn/Jweb_prai/EN/      OR     http://manu46.magtech.com.cn/Jweb_prai/EN/Y2014/V27/I2/141
Copyright © 2010 Editorial Office of Pattern Recognition and Artificial Intelligence
Address: No.350 Shushanhu Road, Hefei, Anhui Province, P.R. China Tel: 0551-65591176 Fax:0551-65591176 Email: bjb@iim.ac.cn
Supported by Beijing Magtech  Email:support@magtech.com.cn