融合段长信息的中、英文语种辨识<sup>*</sup>

摘要
图/表
参考文献
相关文章 (15)

全文: PDF (369 KB) HTML (1 KB)
输出: BibTeX | EndNote (RIS)

摘要状态的段长信息反映语言发音变化速率的信息.不同语言的发音速率也存在着差异,因此状态的段长信息可以作为区分语种的信息之一.本文在建立基于段长分布的隐含Markov模型(DDBHMM)的音素识别系统和大词汇量连续语音识别(LVCSR)系统的基础上进行中、英文语种辨识,表明DDBHMM可以准确描述状态的段长信息,改善中、英文语种的辨识性能.

	服务

	把本文推荐给朋友
	加入我的书架
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章
	孙健
	王作英

关键词 ：语种辨识, 基于段长分布的隐含Markov模型(DDBHMM), Gauss混合模型, 连续音素识别, 大词汇量连续语音识别(LVCSR)

Abstract：Different languages have different pronunciation rates, so the state duration reflects the pronunciation rate of a language. The phone recognition system and LVCSR (Large Vocabulary Continuous Speech Recognition) system are developed by using DDBHMM (Duration Distribution Based Hidden Markov Model). Both systems are used to identify Mandarin and English. The results prove that DDBHMM describes the state duration accurately and improves the performance of language identification.

Key words： Language Identification Duration Distribution Based Hidden Markov Model (DDBHMM) Gauss Mixture Model Continuous Phone Recognition Large Vocabulary Continuous Speech Recognition (LVCSR)

收稿日期: 2005-05-12

ZTFLH:

TP391

基金资助:国家863计划资助项目(No.2001AA114071)

作者简介: 孙健,男,1976年生,博士研究生,主要研究方向为语种辨识、语音识别.E-mail: jsun01@mails.tsinghua.edu.cn.王作英,男,1935年生,教授,博士生导师,主要研究方向为语音信号处理、模式识别.

引用本文:

孙健，王作英. 融合段长信息的中、英文语种辨识^*[J]. 模式识别与人工智能, 2006, 19(5): 567-571. SUN Jian, WANG ZuoYing. Language Identification between Mandarin and English with State Duration Information. , 2006, 19(5): 567-571.

链接本文:

http://manu46.magtech.com.cn/Jweb_prai/CN/ 或 http://manu46.magtech.com.cn/Jweb_prai/CN/Y2006/V19/I5/567

[1] Zissman M A, Berkling K M. Automatic Language Identification. Speech Communication, 2001, 35(1/2): 115-124
[2] Zissman M A. Automatic Language Identification Using Gauss Mixture and Hidden Markov Models // Proc of the IEEE International Conference on Acoustics, Speech, and Signal Processing. Minneapolis, USA, 1993, Ⅱ: 399-402
[3] House A S, Neuburg E P. Toward Automatic Identification of the Language of an Utterance. I. Preliminary Methodological Considerations. Journal of Acoustical Society of America, 1977, 62(3): 708-713
[4] Muthusam Y K, Jain N, Cole R A. Perceptual Benchmarks for Automatic Language Identification // Proc of the IEEE International Conference on Acoustics, Speech, and Signal Processing. Adelaide, Australia, 1994, Ⅰ: 333-336
[5] Lamel L F, Gauvain J L. Cross-Lingual Experiments with Phone Recognition // Proc of the IEEE International Conference on Acoustics, Speech, and Signal Processing. Minneapolis, USA, 1993, Ⅱ: 507-510
[6] Kwan H K, Hirose K. Use of Recurrent Network for Unknown Language Rejection in Language Identification System // Proc of the 5th European Conference on Speech Communication and Technology. Rhodes, Greece, 1997, Ⅰ: 63-67
[7] Dalsgaard P, Andersen O. Identification of Mono-and Poly-Phonemes Using Acoustic-Phonetic Features Derived by a Self-Organizing Neural Network // Proc of the International Conference on Spoken Language Processing. Banff, Canada, 1992: 547-550
[8] Kadambe S, Hieronymus J L. Language Identification with Phonological and Lexical Models // Proc of the IEEE International Conference on Acoustic, Speech, and Signal Processing. Detroit, USA, 1995, Ⅴ: 3507-3511
[9] Mendoza S, Gillick L, Ito Y, et al. Automatic Language Identification Using Large Vocabulary Continuous Speech Recognition // Proc of the IEEE International Conference on Acoustics, Speech, and Signal Processing. Atlanta, USA, 1996, Ⅱ: 785-788
[10] Schultz T, Rogina I, Waibel A. LVCSR-Based Language Identification // Proc of the IEEE International Conference on Acoustics, Speech, and Signal Processing. Atlanta, USA, 1996, Ⅱ: 781-784
[11] Schultz T, Waibel A. Language Independent and Language Adaptive Large Vocabulary Speech Recognition // Proc of the International Conference on Spoken Language Processing. Sydney, Australia, 1998, Ⅴ: 1819-1823
[12] Hieronymus J L, Kadamebe S. Robust Spoken Language Identification Using Large Vocabulary Speech Recognition // Proc of the IEEE International Conference on Acoustics, Speech, and Signal Processing. Munich, Germany, 1997, Ⅱ: 1111-1114
[13] Wang Zuoying, Xiao Xi. Duration Distribution Based HMM Speech Recognition Models. Acta Electronica Sinica, 2004, 32(1): 46-50 (in Chinese)
(王作英,肖熙. 基于段长分布的HMM语音识别模型. 电子学报, 2004, 32(1): 46-50)
[14] Wang Zuoying, Gao Hongge. An Inhomogeneous HMM Speech Recognition Algorithm. Chinese Journal of Electronic. 1998, 7(1): 73-77