基于互补子词单元词图融合的集外词识别<sup>*</sup>

doi:10.16451/j.cnki.issn1003-6059.201604007

摘要
图/表
参考文献
相关文章 (11)

全文: PDF (685 KB) HTML (1 KB)
输出: BibTeX | EndNote (RIS)

摘要混合模型在进行集外词识别时，采用不同类型的子词单元通常具有性能上的互补性.基于此种情况，文中提出互补子词单元词图融合的集外词识别方法.首先分别采用音节和字母音素对搭建2套具有性能差异性的混合模型系统.然后获得这2套系统的识别词图，并合并处理词图中的子词单元.最后分别采用基于词图并集和基于词图交集的融合策略融合处理后的词图，得到更好的集外词识别结果.实验表明文中方法性能优于单系统及ROVER方法.

	服务

	把本文推荐给朋友
	加入我的书架
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章
	范正光
	屈丹
	陈斌

关键词 ：集外词检测, 集外词恢复, 混合模型, 词图融合

Abstract：Different sub-lexical units used in hybrid model often provide complementary information for each other during out-of-vocabulary (OOV) words recognition. In this paper, a lattice combination method of complement sub-lexical units for out-of-vocabulary words recognition is proposed. Firstly, two hybrid model systems with performance difference are built respectively by using syllables and graphones. Next, the recognition lattices are obtained from the built systems and the sub-lexical units are preprocessed for the purpose of combination. Finally, the combination strategies based on lattices union and lattices intersection are respectively explored to combine the lattices to acquire the better result of OOV Words recognition . The experimental results show the proposed method is superior to individual system and the recognizer output voting error reduction (ROVER) system in OOV words recognition.

Key words： Out-of-Vocabulary Detection Out-of-Vocabulary Recovery Hybrid Model Lattice Combination

收稿日期: 2015-07-16

ZTFLH:

TP 391

基金资助:国家自然科学基金项目(No.61403415,61302107,61175017)资助

作者简介: 范正光(通讯作者)，男，1990年生，硕士研究生，主要研究方向为语音识别、模式识别.E-mail:fanzg11@163.com.
屈丹，女，1974年生，博士，副教授，主要研究方向为语音识别、智能信息处理.E-mail:qudanqudan@sina.com.
陈斌，男，1987年生，博士，主要研究方向为连续语音识别、区分性训练.E-mail:chenbin873335@163.com.

引用本文:

范正光，屈丹，陈斌. 基于互补子词单元词图融合的集外词识别^*[J]. 模式识别与人工智能, 2016, 29(4): 350-358. FAN Zhengguang, QU Dan, CHEN Bin. Out-of-Vocabulary Word Recognition Based on Lattice Combination of Complement Sub-lexical Units. , 2016, 29(4): 350-358.

链接本文:

http://manu46.magtech.com.cn/Jweb_prai/CN/10.16451/j.cnki.issn1003-6059.201604007 或 http://manu46.magtech.com.cn/Jweb_prai/CN/Y2016/V29/I4/350

[1] LEE H Y, CHOU P W, LEE L S. Improved Open-Vocabulary Spoken Content Retrieval with Word and Subword Lattices Using Acoustic Feature Similarity. Computer Speech and Language, 2014, 28(5): 1045-1065.
[2] HE Y Z, HUTCHINSON B, BAUMANN P, et al. Subword-Based Mode-ling for Handling OOV Words in Keyword Spotting // Proc of the IEEE International Conference on Acoustics, Speech and Signal Processing. Singapore, Singapore, 2014: 7864-7868.
[3] RASTROW A, SETHY A, RAMABHADRAN B. A New Method for OOV Detection Using Hybrid Word/Fragment System // Proc of the IEEE International Conference on Acoustics,Speech and Signal Proce-ssing. Taibei, China, 2009: 3953-3956.
[4] ALI M, SHAIK B. Hybrid Language Models Using Mixed Types of Sub-Lexical Units for Open Vocabulary German LVCSR // Proc of the 12th Annual Conference of the International Speech Communication Association. Florence, Italy, 2011: 1441-1444.
[5] SHAIK M A B, RYBACH D, HAHN S, et al. Hierarchical Hybrid Language Models for Open Vocabulary Continuous Speech Recognition Using WFST // Proc of the Workshop on Statistical and Perceptual Audition. Portland, USA, 2012: 46-51.
[6] RVEIL B, DEMUYNCK K, MARTENS J P. An Improved Two-Stage Mixed Language Model Approach for Handling Out-of-Vocabulary Words in Large Vocabulary Continuous Speech Recognition. Computer Speech and Language, 2014, 28(1): 141-162.
[7] QIN L, SUN M, RUDNICKY A I. OOV Detection and Recovery Using Hybrid Models with Different Fragments // Proc of the 12th Annual Conference of the International Speech Communication Association. Florence, Italy, 2011: 1913-1916.
[8] VALENTE F. Multi-stream Speech Recognition Based on Dempster-Shafer Combination Rule. Speech Communication, 2010, 52(3): 213-222.
[9] MAMOU J, CUI J, CUI X D, et al. System Combination and Score Normalization for Spoken Term Detection // Proc of the IEEE International Conference on Acoustics, Speech and Signal Processing. Vancouver, Canada, 2013: 8272-8276.
[10] QIN L, SUN M, RUDNICKY A I. System Combination for Out-of-Vocabulary Word Detection // Proc of the IEEE International Conference on Acoustics, Speech and Signal Processing. Kyoto, Japan, 2012: 4817-4820.
[11] HOMEISTER B. Bayes Risk Decoding and Its Application to System Combination. Ph.D Dissertation. Aachen, Germany: RWTH Aachen University of Computer Science, 2011.
[12] XU H H, POVEY D, MANGU L, et al. Minimum Bayes Risk Decoding and System Combination Based on a Recursion for Edit Distance. Computer Speech and Language, 2011, 25(4): 802-828.
[13] BLACK A W, TAYLOR P, CALEY R. The Festival Speech Synthesis System. Edinburgh, UK: University of Edinburgh, 1997.
[14] BISANI M, NEY H. Joint-Sequence Models for Grapheme-to-Phoneme Conversion. Speech Communication, 2008, 50(5): 434-451.
[15] FISCUS J G. A Post-Processing System to Yield Reduced Word Error Rates: Recognizer Output Voting Error Reduction (ROVER) // Proc of the IEEE Workshop on Automatic Speech Recognition and Understanding. Santa Barbara, USA, 1997: 347-354.
[16] POVEY D, HANNEMANN M, BOULIANNE G, et al. Generating Exact Lattices in the WFST Framework // Proc of the IEEE International Conference on Acoustics, Speech and Signal Processing. Kyoto, Japan, 2012: 4213-4216.
[17] BENESTY J, SONDHI M M, HUANG Y T. Handbook of Speech Processing. Berlin, Germany: Springer-Verlag, 2008.
[18] POVEY D, GHOSHAL A, BOULIANNE G, et al. The Kaldi Speech Recognition Toolkit // Proc of the IEEE Workshop on Automatic Speech Recognition and Understanding. Hawaii, USA, 2011: 565-568.
[19] STOLCKE A. SRILM-An Extensible Language Modeling Toolkit (ICSLP) // Proc of the 7th International Conference on Spoken Language Processing. Denver, USA, 2002: 901-904.