Abstract:Different sub-lexical units used in hybrid model often provide complementary information for each other during out-of-vocabulary (OOV) words recognition. In this paper, a lattice combination method of complement sub-lexical units for out-of-vocabulary words recognition is proposed. Firstly, two hybrid model systems with performance difference are built respectively by using syllables and graphones. Next, the recognition lattices are obtained from the built systems and the sub-lexical units are preprocessed for the purpose of combination. Finally, the combination strategies based on lattices union and lattices intersection are respectively explored to combine the lattices to acquire the better result of OOV Words recognition . The experimental results show the proposed method is superior to individual system and the recognizer output voting error reduction (ROVER) system in OOV words recognition.
范正光,屈丹,陈斌. 基于互补子词单元词图融合的集外词识别*[J]. 模式识别与人工智能, 2016, 29(4): 350-358.
FAN Zhengguang, QU Dan, CHEN Bin. Out-of-Vocabulary Word Recognition Based on Lattice Combination of Complement Sub-lexical Units. , 2016, 29(4): 350-358.
[1] LEE H Y, CHOU P W, LEE L S. Improved Open-Vocabulary Spoken Content Retrieval with Word and Subword Lattices Using Acoustic Feature Similarity. Computer Speech and Language, 2014, 28(5): 1045-1065. [2] HE Y Z, HUTCHINSON B, BAUMANN P, et al. Subword-Based Mode-ling for Handling OOV Words in Keyword Spotting // Proc of the IEEE International Conference on Acoustics, Speech and Signal Processing. Singapore, Singapore, 2014: 7864-7868. [3] RASTROW A, SETHY A, RAMABHADRAN B. A New Method for OOV Detection Using Hybrid Word/Fragment System // Proc of the IEEE International Conference on Acoustics,Speech and Signal Proce-ssing. Taibei, China, 2009: 3953-3956. [4] ALI M, SHAIK B. Hybrid Language Models Using Mixed Types of Sub-Lexical Units for Open Vocabulary German LVCSR // Proc of the 12th Annual Conference of the International Speech Communication Association. Florence, Italy, 2011: 1441-1444. [5] SHAIK M A B, RYBACH D, HAHN S, et al. Hierarchical Hybrid Language Models for Open Vocabulary Continuous Speech Recognition Using WFST // Proc of the Workshop on Statistical and Perceptual Audition. Portland, USA, 2012: 46-51. [6] RVEIL B, DEMUYNCK K, MARTENS J P. An Improved Two-Stage Mixed Language Model Approach for Handling Out-of-Vocabulary Words in Large Vocabulary Continuous Speech Recognition. Computer Speech and Language, 2014, 28(1): 141-162. [7] QIN L, SUN M, RUDNICKY A I. OOV Detection and Recovery Using Hybrid Models with Different Fragments // Proc of the 12th Annual Conference of the International Speech Communication Association. Florence, Italy, 2011: 1913-1916. [8] VALENTE F. Multi-stream Speech Recognition Based on Dempster-Shafer Combination Rule. Speech Communication, 2010, 52(3): 213-222. [9] MAMOU J, CUI J, CUI X D, et al. System Combination and Score Normalization for Spoken Term Detection // Proc of the IEEE International Conference on Acoustics, Speech and Signal Processing. Vancouver, Canada, 2013: 8272-8276. [10] QIN L, SUN M, RUDNICKY A I. System Combination for Out-of-Vocabulary Word Detection // Proc of the IEEE International Conference on Acoustics, Speech and Signal Processing. Kyoto, Japan, 2012: 4817-4820. [11] HOMEISTER B. Bayes Risk Decoding and Its Application to System Combination. Ph.D Dissertation. Aachen, Germany: RWTH Aachen University of Computer Science, 2011. [12] XU H H, POVEY D, MANGU L, et al. Minimum Bayes Risk Decoding and System Combination Based on a Recursion for Edit Distance. Computer Speech and Language, 2011, 25(4): 802-828. [13] BLACK A W, TAYLOR P, CALEY R. The Festival Speech Synthesis System. Edinburgh, UK: University of Edinburgh, 1997. [14] BISANI M, NEY H. Joint-Sequence Models for Grapheme-to-Phoneme Conversion. Speech Communication, 2008, 50(5): 434-451. [15] FISCUS J G. A Post-Processing System to Yield Reduced Word Error Rates: Recognizer Output Voting Error Reduction (ROVER) // Proc of the IEEE Workshop on Automatic Speech Recognition and Understanding. Santa Barbara, USA, 1997: 347-354. [16] POVEY D, HANNEMANN M, BOULIANNE G, et al. Generating Exact Lattices in the WFST Framework // Proc of the IEEE International Conference on Acoustics, Speech and Signal Processing. Kyoto, Japan, 2012: 4213-4216. [17] BENESTY J, SONDHI M M, HUANG Y T. Handbook of Speech Processing. Berlin, Germany: Springer-Verlag, 2008. [18] POVEY D, GHOSHAL A, BOULIANNE G, et al. The Kaldi Speech Recognition Toolkit // Proc of the IEEE Workshop on Automatic Speech Recognition and Understanding. Hawaii, USA, 2011: 565-568. [19] STOLCKE A. SRILM-An Extensible Language Modeling Toolkit (ICSLP) // Proc of the 7th International Conference on Spoken Language Processing. Denver, USA, 2002: 901-904.