Abstract:Different languages have different pronunciation rates, so the state duration reflects the pronunciation rate of a language. The phone recognition system and LVCSR (Large Vocabulary Continuous Speech Recognition) system are developed by using DDBHMM (Duration Distribution Based Hidden Markov Model). Both systems are used to identify Mandarin and English. The results prove that DDBHMM describes the state duration accurately and improves the performance of language identification.
孙健,王作英. 融合段长信息的中、英文语种辨识*[J]. 模式识别与人工智能, 2006, 19(5): 567-571.
SUN Jian, WANG ZuoYing. Language Identification between Mandarin and English with State Duration Information. , 2006, 19(5): 567-571.
[1] Zissman M A, Berkling K M. Automatic Language Identification. Speech Communication, 2001, 35(1/2): 115-124 [2] Zissman M A. Automatic Language Identification Using Gauss Mixture and Hidden Markov Models // Proc of the IEEE International Conference on Acoustics, Speech, and Signal Processing. Minneapolis, USA, 1993, Ⅱ: 399-402 [3] House A S, Neuburg E P. Toward Automatic Identification of the Language of an Utterance. I. Preliminary Methodological Considerations. Journal of Acoustical Society of America, 1977, 62(3): 708-713 [4] Muthusam Y K, Jain N, Cole R A. Perceptual Benchmarks for Automatic Language Identification // Proc of the IEEE International Conference on Acoustics, Speech, and Signal Processing. Adelaide, Australia, 1994, Ⅰ: 333-336 [5] Lamel L F, Gauvain J L. Cross-Lingual Experiments with Phone Recognition // Proc of the IEEE International Conference on Acoustics, Speech, and Signal Processing. Minneapolis, USA, 1993, Ⅱ: 507-510 [6] Kwan H K, Hirose K. Use of Recurrent Network for Unknown Language Rejection in Language Identification System // Proc of the 5th European Conference on Speech Communication and Technology. Rhodes, Greece, 1997, Ⅰ: 63-67 [7] Dalsgaard P, Andersen O. Identification of Mono-and Poly-Phonemes Using Acoustic-Phonetic Features Derived by a Self-Organizing Neural Network // Proc of the International Conference on Spoken Language Processing. Banff, Canada, 1992: 547-550 [8] Kadambe S, Hieronymus J L. Language Identification with Phonological and Lexical Models // Proc of the IEEE International Conference on Acoustic, Speech, and Signal Processing. Detroit, USA, 1995, Ⅴ: 3507-3511 [9] Mendoza S, Gillick L, Ito Y, et al. Automatic Language Identification Using Large Vocabulary Continuous Speech Recognition // Proc of the IEEE International Conference on Acoustics, Speech, and Signal Processing. Atlanta, USA, 1996, Ⅱ: 785-788 [10] Schultz T, Rogina I, Waibel A. LVCSR-Based Language Identification // Proc of the IEEE International Conference on Acoustics, Speech, and Signal Processing. Atlanta, USA, 1996, Ⅱ: 781-784 [11] Schultz T, Waibel A. Language Independent and Language Adaptive Large Vocabulary Speech Recognition // Proc of the International Conference on Spoken Language Processing. Sydney, Australia, 1998, Ⅴ: 1819-1823 [12] Hieronymus J L, Kadamebe S. Robust Spoken Language Identification Using Large Vocabulary Speech Recognition // Proc of the IEEE International Conference on Acoustics, Speech, and Signal Processing. Munich, Germany, 1997, Ⅱ: 1111-1114 [13] Wang Zuoying, Xiao Xi. Duration Distribution Based HMM Speech Recognition Models. Acta Electronica Sinica, 2004, 32(1): 46-50 (in Chinese) (王作英,肖 熙. 基于段长分布的HMM语音识别模型. 电子学报, 2004, 32(1): 46-50) [14] Wang Zuoying, Gao Hongge. An Inhomogeneous HMM Speech Recognition Algorithm. Chinese Journal of Electronic. 1998, 7(1): 73-77