Abstract:Chaotic characteristics in speech by calculating the maximum Lyapunov exponents of 38 Mandarin phonemes are presented. The physical significance of three nonlinear features of human speech, i.e. the largest Lyapunov exponent, the secondorder dynamical entropy, and the fractal dimension, is studied. A speaker recognition system based on the Gaussian mixture model is established. On the decision layer, the recognition results obtained from MFCC and nonlinear dynamics are combined in a serial manner to give an improved performance. The experimental result shows nonlinear dynamics coefficients can distinguish different speaker and aid speaker identification only by MFCC features.
[1] Sabanal S, Nakagawa M. The Fractal Properties of Vocal Sounds and Their Application in the Speech Recognition Model. Chaos, Solitons and Fractals, 1996, 7(11): 1825-1843 [2] Petry A, Barone D A C. Fractal Dimension Applied to Speaker Identification // Proc of the IEEE International Conference on Acoustics, Speech and Signal Processing. Salt Lake City, USA, 2001, Ⅰ: 405-408 [3] Jungpa S, Hong S, Gu J, et al. New Speaker Recognition Feature Using Correlation Dimension // Proc of the IEEE International Symposium on Industrial Electronics. Pusan, South Korea, 2001, Ⅰ: 505-507 [4] Thompson C, Mulpur A, Mehta V, et al. Transition to Chaos in Acoustically Driven Flows. Journal of the Acoustical Society of America, 1991, 90(4): 2097-2108 [5] Kumar A, Mullick S K. Nonlinear Dynamical Analysis of Speech. Journal of the Acoustical Society of America, 1996, 100(1): 615-629 [6] Kumar A, Mullick S K. Attractor Dimension, Entropy and Modelling of Speech Time Series. Electronics Letters, 1990, 26(21): 1790-1792 [7] Lü Jinghu, Lu Junan, Chen Shihua. The Chaos Time Series Analysis and Application. Wuhan, China: Wuhan University Press, 2002 (in Chinese) (吕金虎, 陆君安, 陈士华. 混沌时间序列分析及其应用. 武汉:武汉大学出版社, 2002) [8] Petry A, Barone D A C. Speaker Identification Using Nonlinear Dynamical Features. Chaos, Solitons and Fractals, 2002, 13 (2): 221-231 [9] Reynolds D A, Rose R C. Robust Text-Independent Speaker Identification Using Gaussian Mixture Speaker Models. IEEE Trans on Speech and Audio Processing, 1995, 3(1): 72-83