K-L Divergence Based Model Clustering Method for Fast Speaker Identification
WANG Huan-Liang1,2,HAN Ji-Qing1,ZHENG Gui-Bin1
1.School of Computer Science and Technology,Harbin Institute of Technology,Harbin 150001 2.College of Information Science and Technology,Qingdao University of Science and Technology,Qingdao 266035
Abstract With the increase of enrolled speakers and audio data to be recognized, the conventional speaker identification methods can not meet the real-time demand for internet application environment. A K-L divergence based speaker model clustering method is proposed to construct a hierarchical identification system, which remarkably improves the recognition efficiency. Moreover, the confidence measure using class-level identification information is also investigated to effectively exclude out-of-set speaker as early as possible. The experimental results show the proposed method averagely increases the identification speed by 3.2 times while the error rate of closed-set identification only increases about 0.9% compared with the conventional method. The open-set identification can be speeded up by using class-level confidence measure and a relatively 5.1% error rate reduction can be achieved on out-of-set speakers identification while keeping the identification performance of in-set speakers unchanged.
[1] Campbell J P. Speaker Recognition: A Tutorial. Proc of the IEEE, 1997, 85(9): 1437-1462 [2] Pellom B L, Hansen J H L. An Efficient Scoring Algorithm for Gaussian Mixture Model Based Speaker Identification. IEEE Signal Processing Letter, 1998, 5(11): 281-284 [3] McLaughlin J, Reynolds D A, Gleeson T. A Study of Computation Speed-Ups of the GMM-UBM Speaker Recognition System // Proc of the 6th European Conference on Speech Communication and Technology. Budapest, Hungary, 1999: 1215-1218 [4] Kinnunen T, Karpov E, Franti P. Real-Time Speaker Identification and Verification. IEEE Trans on Audio, Speech, and Language Processing, 2006, 14(1): 277-288 [5] Jhanwar N, Raina A K. Pitch Correlogram Clustering for Fast Speaker Identification. EURASIP Journal on Applied Signal Processing, 2004, 17: 2640-2649 [6] Liu Wenju, Sun Bin, Zhong Qiuhai. Research on Hierarchical Speaker Recognition Based on Speaker Clustering Technology. Acta Electronica Sinica, 2005, 33(7): 1230-1233 (in Chinese) (刘文举,孙 兵,钟秋海.基于说话人分类技术的分级说话人识别研究.电子学报, 2005, 33(7): 1230-1233) [7] Xiong Zhenyu, Zheng T F, Song Zhanjiang, et al. Combining Selection Tree with Observation Reordering Pruning for Efficient Speaker Identification Using GMM-UBM // Proc of the IEEE International Conference on Acoustics, Speech and Signal Processing. Philadelphia, USA, 2005: 625-628 [8] Aronowitz H, Burshtein D. Efficient Speaker Recognition Using Approximated Cross Entropy (ACE). IEEE Trans on Audio, Speech and Language Processing, 2007, 15(7): 2033-2043 [9] Apsingekar V R, Leon P L D. Efficient Speaker Identification Using Speaker Model Clustering // Proc of the 16th European Signal Processing Conference. Lausanne, Switzerland, 2008: 64-68 [10] Shangguan Wei, Dai Beiqian. Speaker Clustering Based Likelihood Scores Fusion Robust Speaker Verification. Journal of Lanzhou University: Natural Sciences, 2008, 44(3): 81-86 (in Chinese) (上官葳,戴蓓蒨.基于话者聚类的多系统输出评分融合话者确认.兰州大学学报:自然科学版, 2008, 44(3): 81-86) [11] Kullback S, Leibler R A. On Information and Sufficiency. Annals of Mathematical Statistics, 1951, 22(1): 79-86 [12] Goldberger J, Gordon S, Greenspan H. An Efficient Image Similarity Measure Based on Approximations of KL-Divergence between Two Gaussian Mixtures // Proc of the 9th International Conference on Computer Vision. Nice, France, 2003: 370-377 [13] Wang Huanliang, Han Jiqin, Zheng Tieran. Approximation of Kullback-Leibler Divergence between Two Gaussian Mixture Distributions. Acta Automatica Sinica, 2008, 34(5): 529-534 (in Chinese) (王欢良,韩纪庆,郑铁然.高斯混合分布之间K-L散度的近似计算.自动化学报, 2008, 34(5): 529-534)