采用高斯概率分布和支持向量机的说话人确认<sup>*</sup>

摘要
图/表
参考文献
相关文章 (8)

全文: PDF (348 KB) HTML (1 KB)
输出: BibTeX | EndNote (RIS)

摘要在采用支持向量机的说话人确认中,将语音特征参数相对于通用背景模型各高斯分量的概率分布作为支持向量机输入,在线性核函数的情况下,系统能取得与广义线性判别式序列核函数(GLDS)几乎相同的识别率,同时该高斯概率分布算法能够与混合高斯背景模型、广义线性判别式序列核函数的得分进行融合,进一步提高识别性能.在2006年NIST SRE 1conv4w-1conv4w数据库上,融合后的系统相对于基线的混合高斯模型最多有25%的等错误率下降.

	服务

	把本文推荐给朋友
	加入我的书架
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章
	郭武
	戴礼荣
	王仁华

关键词 ：广义线性判别式序列(GLDS), 梅尔刻度式倒谱参数(MFCC), 线性预测倒谱参数(LPCC)

Abstract：In the text-independent speaker verification research, the probability distribution against the universal background model (PD-UBM) is calculated. And the score of each UBM Gaussian mixture is adopted as the input feature of the support vector machine (SVM) during the training and testing process. The proposed PD-UBM algorithm with linear kernel function can obtain the same or better performance as the generalized linear discriminant sequence (GLDS) kernel system. Furthermore, if the scores of the Gaussian mixture models (GMM-UBM), the GLDS and the PD-UBM are combined, the significant improvement of the system can be achieved. In 2006, on NIST 1conv4w-1conv4w speaker recognition evaluation (SRE) corpus, the fusion system obtained 25% relative improvement equal error rate (ERR) of over the GMM-UBM system.

Key words： Generalized Linear Discriminant Sequence (GLDS) Mel Frequency Cepstrum Coefficient (MFCC) Linear Prediction Cepstrum Coefficient (LPCC)

收稿日期: 2007-05-04

ZTFLH:

TN912.34

基金资助:国家863计划资助项目(No.2006AA010104)

作者简介: 郭武,男,1973年生,讲师,主要研究方向为说话人识别.E-mail:guowu@mail.ustc.edu.cn.戴礼荣,男,1962年生,博士,教授,主要研究方向为语音识别、语音合成、基于内容的音视频检索等.王仁华,男,1943年生,教授,博士生导师,主要研究方向为语音通信、数字信号处理及其应用、多媒体通信.

引用本文:

郭武，戴礼荣，王仁华. 采用高斯概率分布和支持向量机的说话人确认^*[J]. 模式识别与人工智能, 2008, 21(6): 794-798. GUO Wu, DAI Li-Rong, WANG Ren-Hua. Speaker Verification Based on Gaussian Probability Distribution and SVM. , 2008, 21(6): 794-798.

链接本文:

http://manu46.magtech.com.cn/Jweb_prai/CN/ 或 http://manu46.magtech.com.cn/Jweb_prai/CN/Y2008/V21/I6/794

[1] Campbell W M, Campbell J P, Reynolds D A. Support Vector Machines for Speaker and Language Recognition. Computer Speech and Language, 2006, 20(2/3): 210-229
[2] Campbell W M, Sturim D E, Reynolds D A. Support Vector Machines Using GMM Supervectors for Speaker Verification. IEEE Signal Processing Letters, 2006, 13(5): 308-311
[3] Campbell W M, Sturim D E, Reynolds D A, et al. SVM Based Speaker Verification Using a GMM Supervector Kernel and Nap Variability Compensation // Proc of the IEEE International Conference on Acoustics, Speech and Signal Processing. Toulouse, USA, 2006, I: 97-100
[4] Reynolds D A, Quatieri T F, Dunn R B. Speaker Verification Using Adapted Gaussian Mixture Models. Digital Signal Processing, 2000, 10(1/2/3): 19-41
[5] Nello C, Jhon S T. Support Vector Machines. Cambridge, UK: Cambridge University Press, 2000
[6] Lamel L F, Rabiner L R, Rosenberg A, et al. An Improved Endpoint Detector for Isolated Word Recognition. IEEE Trans on Acoustics, Speech and Signal Processing, 1981, 29(4): 777-785
[7] Xiang Bing, Chaudhari U V, Navratil J, et al. Short-Time Gaussianization for Robust Speaker Verification // Proc of the IEEE International Conference on Acoustics, Speech and Signal Processing. Orlando, USA, 2002, Ⅰ: 681-684
[8] Collobert R. SVMTorch: Support Vector Machines for Large-Scale Regression Problems. Journal of Machine Learning Research, 2001, 1: 143-160
[9] Matejka P, Burget L, Schwarz P, et al. STBU System for the NIST 2006 Speaker Recognition Evaluation // Proc of the IEEE International Conference on Acoustics, Speech and Signal Processing. Honolulu, USA, 2007, Ⅳ: 221-224
[10] Niko B, Johan D P. Application-Independent Evaluation of Speaker Detection. Computer Speech and Language, 2006, 20(2/3): 230-275
[11] NIST. The NIST Year 2006 Speaker Recognition Evaluation Plan [DB/OL]. [2005-12-01]. http://www.nist.gov/speech/tests/sre/2006/sre-06_evalplan-v9.pdf
[12] Vivaracho C E. Improving SVM Training by Means of NTIL When the Data Sets are Imbalanced // Proc of the 16th International Symposium on Foundations of Intelligent Systems. Bari, Italy, 2006: 111-120