基于词相似性与CRP的主题模型

Abstract
Figure/Table
References
Related Citation (9)

Download: PDF (322 KB) HTML (1 KB)
Export: BibTeX | EndNote (RIS)

Abstract The topic model can extract the topics hided in documents to make the dimensions of documents reduced and the documents be classified and retrieved according to their topics. It is a research focus on information classification and retrieval fields. Aiming at the problem that the number of topics cannot be automatically determined in LDA topic model, a latent topic model is proposed by combining the similarity between words and Chinese restaurant process (CRP). It can adaptively update the contents and determine the rational number of topics. Meanwhile, a novel method of setting the hyperparameters during updating topics is put forward. The experimental results on traditional Chinese medicine (TCM) clinical dataset show the proposed model has good analysis results accepted by TCM expert.

Key words： Topic Model Word Similarity Dirichlet Distribution

Received: 27 April 2009

ZTFLH:

TP391

	Service

	E-mail this article
	Add to my bookshelf
	Add to citation manager
	E-mail Alert
	RSS
	Articles by authors
	ZHANG Xiao-Ping
	ZHOU Xue-Zhong
	HUANG Hou-Kuan
	FENG Qi
	CHEN Shi-Bo

Cite this article:

ZHANG Xiao-Ping,ZHOU Xue-Zhong,HUANG Hou-Kuan等. A Topic Model Based on CRP and Word Similarity[J]. , 2010, 23(1): 72-76.

URL:

http://manu46.magtech.com.cn/Jweb_prai/EN/ OR http://manu46.magtech.com.cn/Jweb_prai/EN/Y2010/V23/I1/72