Abstract:To mitigate the channel effect of the handset speaker recognition system, a feature mapping (FM) method is proposed to eliminate the channel variability. Gaussian mixture model (GMM) is used to establish a channel-independent voice model, and the channel-dependent voice models are derived from the GMM using a well-known maximum a posteriori (MAP) adaptation algorithm. The difference of clustering gaussians describes the channel variability for different voice. The mismatch between train and test is compensated by mapping channel rules. Experimental results on NIST99 and 2004 SRE database show that the system performance can be increased by 14.7% and 15.18% by the proposed approach.
杨世清,戴蓓蒨,许敏强,刘青松. 基于自适应高斯混合模型特征映射的说话人确认[J]. 模式识别与人工智能, 2009, 22(3): 417-421.
YANG Shi-Qing, DAI Bei-Qian, XU Min-Qiang, LIU Qing-Song. Speaker Verification Based on Adapted Gaussian Mixture Model Feature Mapping. , 2009, 22(3): 417-421.
[1] Reynolds D A, Quatieri T F, Dunn R B. Speaker Verification Using Adapted Gaussian Mixed Models. Digital Signal Processing, 2000, 10(1/2/3): 19-41 [2] Teuern R, Shashahani B, Heck L. A Model-Based Transformational Approach to Robust Speaker Recognition // Proc of the 6th International Conference on Spoken Language Processing. Beijing, China, 2000, Ⅱ: 495-498 [3] Reynolds D A. Comparison of Background Normalization Methods for Text-Independent Speaker Verification // Proc of the European Conference on Speech Communication and Technology. Rhodes, Greece, 1997: 963-966 [4]Atal B S. Effectiveness of Linear Prediction Characteristics of the Speech Wave for Automatic Speaker Identification and Verification. Journal of the Acoustical Society of America, 1974, 55(6): 1304-1312 [5] You K H, Wang H C. Robust Features Derived from Temporal Trajectory Filtering for Speech Recognition under the Corruption of Additive and Convolutional Noises // Proc of the International Conference on Acoustics, Speech and Signal Processing. Seattle, USA, 1998: 577-580 [6] Reynolds D A. Channel Robust Speaker Verification via Feature Mapping // Proc of the International Conference on Acoustics, Speech and Signal Processing. Hongkong, China, 2003, Ⅱ: 53-56 [7]NIST. The NIST Year 2004 Speaker Recognition Evaluation Plan [EB/OL]. [2004-01-01]. http://www.nist.gov/speech/tests/spk/2004/sre-04-evalplan-v9.pdf [8] Yiu K K, Mark M W, Kung S Y. A GMM-Based Handset Selector for Channel Mismatch Compensation with Applications to Speaker Identification // Proc of the 2nd IEEE Pacific-Rim Conference on Multimedia. Beijing, China, 2001: 1132-1137 [9] Hautamki V, Kinnunen T, Krkkinen I. Maximum a Posteriori Adaptation of the Centroid Model for Speaker Verification. IEEE Signal Processing Letters, 2007, 15(1): 162-165 [10] Deng Jing, Zheng T F, Wu Wenhu. Session Variability Subspace Projection Based Model Compensation for Speaker Verification // Proc of the International Conference on Acoustics, Speech and Signal Processing. Honolulu, USA, 2007, Ⅳ: 47-50