基于TLS-NAP的文本无关说话人识别算法

摘要
图/表
参考文献
相关文章 (15)

全文: PDF (417 KB) HTML (1 KB)
输出: BibTeX | EndNote (RIS)

摘要为提高文本无关说话人识别系统的识别率，提出一种基于总体最小二乘法的无用分量投影算法。利用总体最小二乘法估计的隐含变量考虑无用分量投影矩阵的扰动，并将该扰动最小化，使基于该隐含变量求得的投影矩阵能更好地刻画无用分量空间。在美国国家标准技术署于2008年公布说话人识别数据库上的实验结果验证该方法的有效性。

	服务

	把本文推荐给朋友
	加入我的书架
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章
	何亮
	杨毅
	刘加

关键词 ：说话人识别, 无用分量投影, 总体最小二乘法, 支持向量机, 高斯混合模型

Abstract：To improve the recognition accuracy rate of a text-independent speaker recognition system, a total least square-nuisance attribute projection (TLS-NAP) algorithm is proposed. The perturbation of the projection matrix is considered and its negative effect is minimized when hidden variables are estimated by the total least square algorithm. A better performance is obtained by the nuisance attribute projection space based on these variables. The effectiveness of the proposed method is demonstrated by the experimental results on NIST SRE 08 data corpus.

Key words： Speaker Recognition Nuisance Attribute Projection Total Least Square Method Support Vector Machine Gaussian Mixture Models

收稿日期: 2011-11-02

ZTFLH:

TP391.4

基金资助:国家自然科学基金项目(No.90920302,61005019,61105017)、国家863计划项目(No.2008AA040201)资助

作者简介: 何亮，男，1981年生，博士，助理研究员，主要研究方向为说话人识别、语种识别。E-mail:sanphiee@gmail。com。杨毅，女，1978年生，博士，助理研究员，主要研究方向为麦克风阵列、语音增强。刘加，男，1955年生，教授，博士生导师，主要研究方向为语音识别。

引用本文:

何亮，杨毅，刘加. 基于TLS-NAP的文本无关说话人识别算法[J]. 模式识别与人工智能, 2012, 25(6): 916-921. HE Liang, YANG Yi, LIU Jia. TLS-NAP Algorithm for Text-Independent Speaker Recognition. , 2012, 25(6): 916-921.

链接本文:

http://manu46.magtech.com.cn/Jweb_prai/CN/ 或 http://manu46.magtech.com.cn/Jweb_prai/CN/Y2012/V25/I6/916

[1] Kinnunen T,Li H.An Overview of Text-Independent Speaker Recognition: From Features to Supervectors.Speech Communication,2010,52(1): 12-40
[2] Reynolds D,Quatieri T,Dunn R.Speaker Verification Using Adapted Gaussian Mixture Models.Digital Signal Processing,2000,10(1/2/3): 19-41
[3] Campbell W,Sturim D,Reynolds D.Support Vector Machines Using GMM Supervectors for Speaker Verification.Signal Processing Letters,2006,13(5): 308-311
[4] Reynolds D.Channel Robust Speaker Verification via Feature Mapping // Proc of the International Conference on Acoustics,Speech and Signal Processing.Hong Kong,China,2003,II: 53-56
[5] Vogt R,Baker B,Sridharan S.Modeling Session Variability in Text-Independent Speaker Verification // Proc of the 9th European Conference on Speech Communication and Technology.Lisbon,Portugal,2005: 3117-3120
[6] Kenny P,Ouellet P,Dehak N,et al.A Study of Interspeaker Variability in Speaker Verification.IEEE Trans on Audio,Speech,and Language Processing,2008,16(5): 980-988
[7] Kenny P,Boulianne G,Ouellet P,et al.Joint Factor Analysis versus Eigenchannels in Speaker Recognition.IEEE Trans on Audio,Speech and Language Processing,2007,15(4): 1435-1447
[8] Kenny P,Boulianne G,Ouellet P,et al.Speaker and Session Variability in GMM-Based Speaker Verification.IEEE Trans on Audio,Speech and Language Processing,2007,15(4): 1448-1460
[9] Solomonoff A,Campbell W,Boardman I.Advances in Channel Compensation for SVM Speaker Recognition // Proc of the International Conference on Acoustics,Speech and Signal Processing.Philadelphia,USA,2005: 629-632
[10] Campbell W.Weighted Nuisance Attribute Projection // Proc of Odyssey Speaker and Language Recegnition Workshop.Brno,Czech,2010: 97-102
[11] Campbell W,Karam Z,Sturim D.Speaker Comparison with Inner Product Discriminant Functions // Bengio Y,Schuurmans D,Lafferty J,et al,eds.Advances in Neural Information Processing Systems.Cambridge,USA: MIT Press,2009,XXII: 207-215
[12] Zhang Xianda.Matrix Analysis and Application.Beijing,China: Tsinghua University Press,2005 (in Chinese)
(张贤达.矩阵分析与应用.北京:清华大学出版社,2005)
[13] Sturim D E,Reynolds D A.Speaker Adaptive Cohort Selection for Tnorm in Text-Independent Speaker Verification // Proc of the International Conference on Acoustics,Speech and Signal Processing.Philadelphia,USA,2005: 741-744
[14] Xiang Bin,Chaudhari U,Navratil J,et al.A Short-Time Gaussianization for Robust Speaker Verification // Proc of the International Conference on Acoustics,Speech and Signal Processing.Orlando,USA,2002: 681-684