Abstract:In the keyword spotting system based on dynamic match lattice spotting (DMLS), the minimum edit distance is used as the confidence measure. When the detection rate is increased, the false alarm rate is raised as well. Aiming at this problem, an approach integrating the posterior probability confidence measure with DMLS is proposed. Firstly, the posterior probability based on lattice is introduced with the index stage of DMLS. Secondly, data driven phone substitution, insertion and deletion costs are incorporated for more flexible phone sequence matching. Finally, the minimum edit distance and the posterior probability confidence measure are blended together to detect all occurrences of the keywords. The experimental results show that there is a certain complementarity between the minimum edit distance and posterior probability confidence measure and the equal error rate is relatively reduced.
[1] Wang B X, Qu D, Peng X. Practical Fundamentals of Speech Re-cognition. Beijing, China: National Defense Industry Press, 2005 (in Chinese) (王炳锡,屈 丹,彭 煊.实用语音识别基础.北京:国防工业出版社, 2005) [2] Sun C L. A Study of Speech Keyword Recognition Technology. Ph.D Dissertation. Beijing, China: Beijing University of Posts and Telecommunications, 2008 (in Chinese) (孙成立.语音关键词识别技术的研究.博士学位论文.北京:北京邮电大学, 2008) [3] Pan Y C, Lee L S. Performance Analysis for Lattice-Based Speech Indexing Approaches Using Words and Subword Units. IEEE Trans on Audio, Speech, and Language Processing, 2010, 18(6): 1562-1574 [4] Akbacak M, Burget L, Wang W, et al. Rich System Combination for Keyword Spotting in Noisy and Acoustically Heterogeneous Audio Streams // Proc of the IEEE International Conference on Acoustics, Speech and Signal Processing. Vancouver, Canada, 2013: 8267-8271 [5] Thambiratmam K, Sridharan S. Rapid Yet Accurate Speech Ind-exing Using Dynamic Match Lattice Spotting. IEEE Trans on Audio, Speech, and Language Processing, 2007, 15(1): 346-357 [6] Audhkhasi K, Verma A. Keyword Search Using Modified Minimum Edit Distance Measure // Proc of the IEEE International Conference on Acoustic, Speech and Signal Processing. Honolulu, USA, 2007, IV: 929-932 [7] Wallace R, Vogt R, Sridharan S. Spoken Term Detection Using Fast Phonetic Decoding // Proc of the IEEE International Conference on Acoustics, Speech and Signal Processing. Taipei, China, 2009: 4881-4884 [8] Rajabzadeh M, Tabibian S, Akbari A, et al. Improved Dynamic Match Phone Lattice Search Using Viterbi Scores and Jaro Winkler Distance for Keyword Spotting System // Proc of the 16th CSI International Symposium on Artificial Intelligence and Signal Processing. Shiraz, Iran, 2012: 423-427 [9] Wessel F, Schluter R, Macherey K, et al. Confidence Measures for Large Vocabulary Continuous Speech Recognition. IEEE Trans on Speech and Audio Processing, 2001, 9(3): 288-298 [10] Li W X, Qu D, Li B C, et al. Confidence Measure Based on Time and Boundary Features for Speech Keyword Spotting System. Journal of Applied Sciences, 2012, 30(6): 588-594 (in Chinese) (李文昕,屈 丹,李弼程,等.语音关键词检测系统中基于时长和边界信息的置信度.应用科学学报, 2012, 30(6): 588-594) [11] Schwarz P. Phoneme Recognition Based on Long Temporal Context. [EB/OL].[2013-08-10].http://www.fit.vutbr.cz/reach/groups/speech/publi/2009/schwarz-thesis.pdf [12] Tüske Z, Plahl C, Schlüter R. A Study on Speaker Normalized MLP Features in LVCSR // Proc of the 12th Annual Conference of the International Speech Communication Association. Florence, Italy, 2011: 1089-1092 [13] Wallace R. Fast and Accurate Phonetic Spoken Term Detection. Ph.D Dissertation. Brisbane, Australia: Queensland University of Technology, 2010 [14] Li J, Guo W, Dai L R. Space Transformation Based on Signal Subspace in Joint Factor Analysis. Pattern Recognition and Artificial Intelligence, 2013, 26(8): 705-710 (in Chinese) (李 晋,郭 武,戴礼荣.联合因子分析算法中基于信号子空间的空间变换方法.模式识别与人工智能, 2013, 26(8): 705-710) [15] Fiscus J G, Ajot J S, Garofolo J, et al. Results of the 2006 Spoken Term Detection Evaluation[EB/OL].[2013-09-25]. http://www.itl.nist.gov/iad/mig//publications/storage_paper/Interspeech07-STD06-v13.pdf