基于自适应高斯混合模型特征映射的说话人确认

摘要
图/表
参考文献
相关文章 (4)

全文: PDF (352 KB) HTML (1 KB)
输出: BibTeX | EndNote (RIS)

摘要为了解决电话语音说话人确认系统中信道非线性失真导致系统性能下降的问题，提出一种消除信道影响的特征映射方法.采用高斯混合模型建立语音模型，通过最大后验概率自适应某种信道的语音模型，两种模型间相应高斯类的差异描述了该信道对于不同语音的影响.由此得出信道映射规则进行参数补偿，消除训练和测试语音中不匹配的影响.在NIST 1999年和2004年男性说话人的数据库上进行的实验表明，此方法使系统的等错误率分别改善了14.7%和15.18%.

	服务

	把本文推荐给朋友
	加入我的书架
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章
	杨世清
	戴蓓蒨
	许敏强
	刘青松

关键词 ：说话人确认, 信道失配, 特征映射(FM), 最大后验概率(MAP), 信道映射规则

Abstract：To mitigate the channel effect of the handset speaker recognition system, a feature mapping (FM) method is proposed to eliminate the channel variability. Gaussian mixture model (GMM) is used to establish a channel-independent voice model, and the channel-dependent voice models are derived from the GMM using a well-known maximum a posteriori (MAP) adaptation algorithm. The difference of clustering gaussians describes the channel variability for different voice. The mismatch between train and test is compensated by mapping channel rules. Experimental results on NIST99 and 2004 SRE database show that the system performance can be increased by 14.7% and 15.18% by the proposed approach.

Key words： Speaker Verification Channel Mismatch Feature Mapping (FM) Maximum A Posteriori (MAP) Channel Mapping Rules

收稿日期: 2008-03-07

ZTFLH:

TP391

作者简介: 杨世清，男，1982年生，硕士研究生，主要研究方向为说话人识别.戴蓓蒨，女，1942年生，教授，博士生导师，主要研究方向为语音信号、信息处理、说话人识别.E-mail: bqdai@ustc.edu.cn.许敏强，男，1982年生，博士研究生，主要研究方向为语音信号、信息处理.刘青松，男，1984年生，博士研究生，主要研究方向为语音识别、说话人识别.

引用本文:

杨世清，戴蓓蒨，许敏强，刘青松. 基于自适应高斯混合模型特征映射的说话人确认[J]. 模式识别与人工智能, 2009, 22(3): 417-421. YANG Shi-Qing, DAI Bei-Qian, XU Min-Qiang, LIU Qing-Song. Speaker Verification Based on Adapted Gaussian Mixture Model Feature Mapping. , 2009, 22(3): 417-421.

链接本文:

http://manu46.magtech.com.cn/Jweb_prai/CN/ 或 http://manu46.magtech.com.cn/Jweb_prai/CN/Y2009/V22/I3/417

[1] Reynolds D A, Quatieri T F, Dunn R B. Speaker Verification Using Adapted Gaussian Mixed Models. Digital Signal Processing, 2000, 10(1/2/3): 19-41
[2] Teuern R, Shashahani B, Heck L. A Model-Based Transformational Approach to Robust Speaker Recognition // Proc of the 6th International Conference on Spoken Language Processing. Beijing, China, 2000, Ⅱ: 495-498
[3] Reynolds D A. Comparison of Background Normalization Methods for Text-Independent Speaker Verification // Proc of the European Conference on Speech Communication and Technology. Rhodes, Greece, 1997: 963-966
[4]Atal B S. Effectiveness of Linear Prediction Characteristics of the Speech Wave for Automatic Speaker Identification and Verification. Journal of the Acoustical Society of America, 1974, 55(6): 1304-1312
[5] You K H, Wang H C. Robust Features Derived from Temporal Trajectory Filtering for Speech Recognition under the Corruption of Additive and Convolutional Noises // Proc of the International Conference on Acoustics, Speech and Signal Processing. Seattle, USA, 1998: 577-580
[6] Reynolds D A. Channel Robust Speaker Verification via Feature Mapping // Proc of the International Conference on Acoustics, Speech and Signal Processing. Hongkong, China, 2003, Ⅱ: 53-56
[7]NIST. The NIST Year 2004 Speaker Recognition Evaluation Plan [EB/OL]. [2004-01-01]. http://www.nist.gov/speech/tests/spk/2004/sre-04-evalplan-v9.pdf
[8] Yiu K K, Mark M W, Kung S Y. A GMM-Based Handset Selector for Channel Mismatch Compensation with Applications to Speaker Identification // Proc of the 2nd IEEE Pacific-Rim Conference on Multimedia. Beijing, China, 2001: 1132-1137
[9] Hautamki V, Kinnunen T, Krkkinen I. Maximum a Posteriori Adaptation of the Centroid Model for Speaker Verification. IEEE Signal Processing Letters, 2007, 15(1): 162-165
[10] Deng Jing, Zheng T F, Wu Wenhu. Session Variability Subspace Projection Based Model Compensation for Speaker Verification // Proc of the International Conference on Acoustics, Speech and Signal Processing. Honolulu, USA, 2007, Ⅳ: 47-50