|
|
K-L Divergence Based Model Clustering Method for Fast Speaker Identification |
WANG Huan-Liang1,2,HAN Ji-Qing1,ZHENG Gui-Bin1 |
1.School of Computer Science and Technology,Harbin Institute of Technology,Harbin 150001 2.College of Information Science and Technology,Qingdao University of Science and Technology,Qingdao 266035 |
|
|
Abstract With the increase of enrolled speakers and audio data to be recognized, the conventional speaker identification methods can not meet the real-time demand for internet application environment. A K-L divergence based speaker model clustering method is proposed to construct a hierarchical identification system, which remarkably improves the recognition efficiency. Moreover, the confidence measure using class-level identification information is also investigated to effectively exclude out-of-set speaker as early as possible. The experimental results show the proposed method averagely increases the identification speed by 3.2 times while the error rate of closed-set identification only increases about 0.9% compared with the conventional method. The open-set identification can be speeded up by using class-level confidence measure and a relatively 5.1% error rate reduction can be achieved on out-of-set speakers identification while keeping the identification performance of in-set speakers unchanged.
|
Received: 09 February 2009
|
|
|
|
|
[1] Campbell J P. Speaker Recognition: A Tutorial. Proc of the IEEE, 1997, 85(9): 1437-1462 [2] Pellom B L, Hansen J H L. An Efficient Scoring Algorithm for Gaussian Mixture Model Based Speaker Identification. IEEE Signal Processing Letter, 1998, 5(11): 281-284 [3] McLaughlin J, Reynolds D A, Gleeson T. A Study of Computation Speed-Ups of the GMM-UBM Speaker Recognition System // Proc of the 6th European Conference on Speech Communication and Technology. Budapest, Hungary, 1999: 1215-1218 [4] Kinnunen T, Karpov E, Franti P. Real-Time Speaker Identification and Verification. IEEE Trans on Audio, Speech, and Language Processing, 2006, 14(1): 277-288 [5] Jhanwar N, Raina A K. Pitch Correlogram Clustering for Fast Speaker Identification. EURASIP Journal on Applied Signal Processing, 2004, 17: 2640-2649 [6] Liu Wenju, Sun Bin, Zhong Qiuhai. Research on Hierarchical Speaker Recognition Based on Speaker Clustering Technology. Acta Electronica Sinica, 2005, 33(7): 1230-1233 (in Chinese) (刘文举,孙 兵,钟秋海.基于说话人分类技术的分级说话人识别研究.电子学报, 2005, 33(7): 1230-1233) [7] Xiong Zhenyu, Zheng T F, Song Zhanjiang, et al. Combining Selection Tree with Observation Reordering Pruning for Efficient Speaker Identification Using GMM-UBM // Proc of the IEEE International Conference on Acoustics, Speech and Signal Processing. Philadelphia, USA, 2005: 625-628 [8] Aronowitz H, Burshtein D. Efficient Speaker Recognition Using Approximated Cross Entropy (ACE). IEEE Trans on Audio, Speech and Language Processing, 2007, 15(7): 2033-2043 [9] Apsingekar V R, Leon P L D. Efficient Speaker Identification Using Speaker Model Clustering // Proc of the 16th European Signal Processing Conference. Lausanne, Switzerland, 2008: 64-68 [10] Shangguan Wei, Dai Beiqian. Speaker Clustering Based Likelihood Scores Fusion Robust Speaker Verification. Journal of Lanzhou University: Natural Sciences, 2008, 44(3): 81-86 (in Chinese) (上官葳,戴蓓蒨.基于话者聚类的多系统输出评分融合话者确认.兰州大学学报:自然科学版, 2008, 44(3): 81-86) [11] Kullback S, Leibler R A. On Information and Sufficiency. Annals of Mathematical Statistics, 1951, 22(1): 79-86 [12] Goldberger J, Gordon S, Greenspan H. An Efficient Image Similarity Measure Based on Approximations of KL-Divergence between Two Gaussian Mixtures // Proc of the 9th International Conference on Computer Vision. Nice, France, 2003: 370-377 [13] Wang Huanliang, Han Jiqin, Zheng Tieran. Approximation of Kullback-Leibler Divergence between Two Gaussian Mixture Distributions. Acta Automatica Sinica, 2008, 34(5): 529-534 (in Chinese) (王欢良,韩纪庆,郑铁然.高斯混合分布之间K-L散度的近似计算.自动化学报, 2008, 34(5): 529-534) |
|
|
|