Abstract:The recognition rate for confusable speech is still low in stateoftheart Chinese speech recognition systems based on HMM. The inherent defects of HMM are analyzed, then a twolevelarchitecture recognition framework combining HMM and SVM is proposed. A confidence estimation module is adopted to improve the performance and efficiency of the system. The information obtained by Viterbi decoding is utilized to construct new classes of feature for SVM, which solves the problem that the conventional SVM cannot directly process variable length sequences. The relevant issues, such as confidence estimation, classification feature extraction and SVM recognizer construction, are addressed. The experimental results of confusable Chinese speech show that compared with the hybrid HMM/SVM based system the proposed method can highly improve the recognition rate with little impact on the running speed.
王欢良,韩纪庆,李海峰,郑铁然. 基于HMM/SVM两级结构的汉语易混淆语音识别*[J]. 模式识别与人工智能, 2006, 19(5): 578-584.
WANG HuanLiang, HAN JiQing, LI HaiFeng, ZHENG TieRan. Confusable Chinese Speech Recognition Based on HMM/SVM TwoLevel Architecture. , 2006, 19(5): 578-584.
[1] Ganapathiraju A, Hamarker J, Picone J. Support Vector Machines for Speech Recognition // Proc of the International Conference on Spoken Language Processing. Sydney, Australia, 1998: 2923-2926 [2] Aldebaro K. Speech Recognition Using Discriminative Classifiers. Ph.D Dissertation. San Diero, USA: University of California, 2003 [3] Ganapathiraju A, Hamaker J E, Picone J. Applications of Support Vector Machines to Speech Recognition. IEEE Trans on Signal Processing, 2004, 52(8): 2348-2355 [4] Smith N, Gales M. Speech Recognition Using SVMs // Dietterich T G, Becker S, Ghahramani Z, eds. Advances in Neural Information Processing Systems 14. Cambridge, USA: MIT Press, 2002: 117-129 [5] Shimodaira H, Noma K, Nakai M, et al. Dynamic Time-Alignment Kernel in Support Vector Machine // Dietterich T G, Becker S, Ghahramani Z, eds. Advances in Neural Information Processing Systems 14.Cambridge, USA: MIT Press, 2002, Ⅱ: 921-928 [6] Fine S, Saon G, Gopinath R A. Digit Recognition in Noisy Environments via a Sequential GMM/SVM System // Proc of the International Conference on Acoustics, Speech, and Signal Processing. Orlando, USA, 2002: 2242-2246 [7] Salomon J, King S, Osborne M. Framewise Phone Classification Using Support Vector Machines // Proc of the International Conference on Spoken Language Processing. Denver, USA, 2002: 2645-2648 [8] Platt J C. Probabilities for SV Machines // Smola A J, Scholkpf B, Bartlett P L, et al, eds. Advances in Large Margin Classifiers. Cambridge, USA: MIT Press, 2000: 61-74 [9] Hsu C W, Lin C J. A Comparison of Methods for Multi-Class Support Vector Machines. IEEE Trans on Neural Networks, 2002, 13(2): 415-425 [10] Zhou Tongchun. Chinese Phonetics. Beijing, China: Beijing Normal University Press, 1999 (in Chinese) (周同春. 汉语语音学. 北京:北京师范大学出版社, 1990) [11] Chang C C, Lin C J. LIBSVM: A Library for Support Vector Machines [EB/OL]. [2001-04-01] http://www.csie.ntu.edu.tw/~cjlin/libsvm [12] Li Husheng, Liu Jia, Liu Runsheng. High Performance Digit Mandarin Speech Recognition. Journal of Tsinghua University: Science and Technology, 2000, 40(1): 32-34 (in Chinese) (李虎生, 刘 加, 刘润生. 高性能汉语数码语音识别算法. 清华大学学报: 自然科学版, 2000, 40(1): 32-34)