基于查询扩展的中文语音高效检索

Abstract
Figure/Table
References
Related Citation (2)

Download: PDF (378 KB) HTML (1 KB)
Export: BibTeX | EndNote (RIS)

Abstract The aim of Chinese voice retrieval systems is to locate query texts in audio files fast and precisely. In a typical implementation of the system, voice files are recognized and stored in index. The system segments each query into a word sequence and uses the sequence to search. The mismatch between query segmentation and recognition can influence systems performance. To solve this problem, multiple segmentation results and prefix-suffix expansions have been used to broaden the original query. The retrieval process is on the basis of the expansions outputs. Query expansion generates a lot of outputs, which slows down the retrieval speed. In order to increase the systems efficiency, the Finite State Automata (FSA) is introduced to compress query expansions. And a Token-based search algorithm is used for fast search. Experimental results show that the query expansion leads the systems EER to improve about 50%~70% relatively. The FSA compresses the retrieval space, and raises the retrieval speed nearly 30 times.

Key words： Chinese Speech Retrieval Word Segmentation Query Expansion Finite State Automata (FSA) Token-based Search

Received: 25 September 2010

ZTFLH:

TP319.3

	Service

	E-mail this article
	Add to my bookshelf
	Add to citation manager
	E-mail Alert
	RSS
	Articles by authors

Cite this article:

URL:

http://manu46.magtech.com.cn/Jweb_prai/EN/ OR http://manu46.magtech.com.cn/Jweb_prai/EN/Y2011/V24/I4/561