Multilingual Acoustic Modeling Method Based on Phoneme Clustering
MENG Meng1, LIANG Jia-En1, XU Bo1,2
1.Digital Content Technology Research Center, Institute of Automation,Chinese Academy of Sciences, Beijing 100190 2.National Laboratory of Pattern Recognition, Institute of Automation,Chinese Academy of Sciences, Beijing 100190
Abstract:A clustering method is proposed to generate multilingual global phoneme based on the decrease of model self-likelihood. Two linguistic limitations are used in the clustering procedure, and the phonemes in same language or belonging to different international phonetic alphabet (IPA) classes are not merged. In telephone speech keyword spotting system, the performance of several Chinese-English bilingual model are compared which are generated by different phoneme clustering methods. The experimental results show that the merged phoneme set of an appropriate size can generate acoustic models with good quality, far above the results without merging. Moreover, the linguistic limitations added to clustering procedure can improve the performance.
[1] Liu Chen, Melnar L. An Automated Linguistic Knowledge-Based Cross-Language Transfer Method for Building Acoustic Models for a Language without Native Training Data // Proc of the 9th European Conference on Speech Communication and Technology. Lisbon, Portugal, 2005: 1365-1368 [2] Yu Shengmin, Zhang Shuwu, Xu Bo. Research of Chinese-English Bilingual Acoustic Modeling. Journal of Chinese Information Processing, 2004, 18(5): 78-84 (in Chinese) (于胜民,张树武,徐 波.汉英双语混合声学建模方法研究.中文信息学报, 2004, 18(5): 78-84) [3] Byrne W, Beyerlein P, Huerta J M, et al. Towards Language Independent Acoustic Modeling // Proc of the IEEE International Conference on Acoustics, Speech and Signal Processing. Istanbul, Turkey, 2000: 1029-1032 [4] Zgank A, Imperl B, Johansen F T, et al. Crosslingual Speech Recognition with Multilingual Acoustic Models Based on Agglomerative and Tree-Based Triphone Clustering // Proc of the 7th European Conference on Speech Communication and Technology. Aalborg, Denmark, 2001: 2725-2729 [5] Zgank A, Kacic Z, Vicsi K, et al. Crosslingual Transfer of Source Acoustic Models to Two Different Target Languages // Proc of the COST278 and ISCA Tutorial and Research Workshop on Robustness Issues in Conversational Interaction. Norwich, UK, 2004: 19 [6] Sooful J J, Botha E C. An Acoustic Distance Measure for Automatic Cross-Language Phoneme Mapping // Proc of the Pattern Recognition Association of South Africa. Franschhoek, South Africa, 2001: 99-102 [7] Tsai M Y, Lee L S. Pronunciation Variation Analysis Based on Acoustic and Phonemic Distance Measures with Application Examples on Mandarin Chinese // Proc of the Workshop on Automatic Speech Recognition and Understanding. Virgin Islands, USA, 2003: 117-122