Abstract:According to the spectrum characteristics of fricatives, a fricative detection method based on the energy spectrum entropy is proposed. Firstly, phone boundaries are detected based on spectrum of different phonemes. Then, each spectrum entropy of speech segments is computed and the segments whose entropy exceeds the threshold are selected as candidates. Finally, post processing is conducted to remove the insertion errors according to parameters of segment length and the sudden changing of energy at segment starts and ends. The experimental results show that the accuracy of the proposed method is up to 96.9% in clean circumstance when the tolerance is 20 ms.
李立永,张连海. 基于能量谱熵的英语摩擦音检测方法*[J]. 模式识别与人工智能, 2014, 27(6): 554-560.
LI Li-Yong, ZHANG Lian-Hai. An English Fricative Detection Method Based on Energy Spectrum Entropy. , 2014, 27(6): 554-560.
[1] Lee C. From Knowledge-Ignorant to Knowledge-Rich Modeling: A New Speech Research Paradigm for Next Generation Automatic Speech Recognition[EB/OL]. [2012-08-30]. http://slam.iis.sinica.edu.tw/NGASR/workshop/20041127-asat.pdf [2] Dusan S, Rabiner L R. On Integrating Insights from Human Speech Perception into Automatic Speech Recognition [EB/OL]. [2012-09-01]. http://cronos.rutgers.edu/~lrr/lrr%20papers/352_dr_euro2005c.pdf [3] Lee C H . An Overview on Automatic Speech Attribute Transcription(ASAT) // Proc of the 8th Annual Conference of the International Speech Communication Association. Antwerp, Belgium, 2007: 1825-1828 [4] Stevens K N. Toward a Model for Lexical Access Based on Acoustic Landmarks and Distinctive Features. Journal of the Acoustical Society of America, 2002, 111(4): 1872-1891 [5] Liu S A. Landmark Detection for Distinctive Feature-Based Speech Recognition. Journal of the Acoustical Society of America, 1996, 100(5): 3417-3430 [6] Park C. Consonant Landmark Detection for Speech Recognition. Ph. D Dissertation. Massachusetts, USA: Massachusetts Institute of Technology, 2008 [7] Chen B, Zhan L H, Niu T, et al. A Method for Chinese Nasal Detection Based on Energy Distribute and Formant Structure Characteristic. Journal of Chinese Information Processing, 2012, 26(1): 105-109 (in Chinese) (陈 斌,张连海,牛 铜,等.基于能量分布和共振峰结构的汉语鼻音检测.中文信息学报, 2012, 26(1): 105-109) [8] Wang Y. A Two-Stage Sample-Based Phone Boundary Detector Using Segmental Similarity Features // Proc of the 12th Annual Conference of the International Speech Communication Association. Florence, Italy, 2011: 413-416 [9] Quatieri T F. Discrete-Time Speech Signal Processing: Principles and Practice. Upper Saddle River, USA: Prentice Hall, 2001 [10] Li Z H, Chi H S. Progress in Computational Modeling of Auditory Periphery. Acta Acustica, 2006, 31(5): 449-465 (in Chinese) (李朝晖,迟惠生.听觉外周计算模型研究进展.声学学报, 2006, 31(5): 449-465) [11] Zhang B Q, Zhan L H, Qu D. Segmentation of Chinese Initials and Finals Based on Auditory Event Detection. Acta Acustica, 2010, 35(6): 701-707(in Chinese) (张宝奇,张连海,屈 丹.基于听觉事件检测的汉语语音声韵切分.声学学报, 2010, 35(6): 701-707)