3D Visualization Method for Tongue Movements in Pronunciation
LI Rui1,2,3, YU Jun2,3, LUO Changwei2,3, WANG Zengfu1,2,3
1.Laboratory of Nuclear Environment Telerobot, Institute of Intelligent Machines, Chinese Academy of Sciences, Hefei 230031 2. National Engineering Laboratory for Speech and Language Information Processing, University of Science and Technology of China, Hefei 230027 3.School of Information Science and Technology, University of Science and Technology of China, Hefei 230027
Abstract:Problem of 3D visualization of tongue movements in pronunciation is studied. Firstly, a precise 3D tongue model according to magnetic resonance imaging scan data is built. Based on the 3D tongue model, the electromagnetic articulometer(EMA) data collected from three points on tongue dorsum surface are used as the driven data. The mass spring technique is used to realize realistic tongue movements in pronunciation. To evaluate the effect of modeling and synthesis methods for tongue movements, the computer graphics techniques are employed to simulate the detailed effect of the tongue movements. Finally, the simulation results are compared with X-ray video of the motion characteristics of articulators for Mandarin Chinese recorded by a pronunciation specialist. The experimental result shows the proposed method achieves precise and realistic results of 3D tongue movements and it has a wide application prospect.
李睿,於俊,罗常伟,汪增福. 发音过程中舌头运动的3D可视化方法*[J]. 模式识别与人工智能, 2016, 29(5): 385-392.
LI Rui, YU Jun, LUO Changwei, WANG Zengfu. 3D Visualization Method for Tongue Movements in Pronunciation. , 2016, 29(5): 385-392.
[1] DORAN G A, BAGGETT H. A Structural and Functional Classifica- tion of Mammalian Tongues. Journal of Mammalogy, 1971, 52(2): 427-429. [2] HIIEMAE K M, PALMER J B. Tongue Movements in Feeding and Speech. Critical Reviews in Oral Biology and Medicine, 2003, 14(6): 413-429. [3] WANG L, CHEN H, LI S, et al. Phoneme-Level Articulatory Animation in Pronunciation Training. Speech Communication, 2012, 54(7): 845-856. [4] 江 辰,於 俊,罗常伟,等.基于生理舌头模型的语音可视化系统.中国图象图形学报, 2015, 20(9): 1237-1246. (JIANG C, YU J, LUO C W, et al. Speech Visualization System Based on Physiological Tongue Model. Journal of Image and Graphics, 2015, 20(9): 1237-1246.) [5] SONG C, WEI J G, FANG Q, et al. Tongue Shape Synthesis Based on Active Shape Model // Proc of the 8th International Symposium on Chinese Spoken Language Processing. Hong Kong, China, 2012: 383-386. [6] ENGWALL O. Combining MRI, EMA and EPG Measurements in a Three-Dimensional Tongue Model. Speech Communication, 2003, 41(2/3): 303-329. [7] COHEN M M, BESKOW J, MASSARO D W. Recent Developments in Facial Animation: An Inside View [C/OL] .[2015-08-20]. http://www.speech.kth.se/prod/publications/files/1143.pdf. [8] WESTBURY J R. X-Ray Microbeam Speech Production Database User's Handbook: 1.0 Version [EB/OL]. [2015-08-20]. http://www.haskins.yale.edu/staff/gafos_downloads/ubdbman.pdf. [9] WESTBURY J R, SEVERSON E J, LINDSTROM M J. Kinematic Event Patterns in Speech: Special Problems. Language and Speech, 2000, 43(4): 403-428. [10] TASKO S M, KENT R D, WESTBURY J R. Variability in Tongue Movement Kinematics during Normal Liquid Swallowing. Dysphagia, 2002, 17(2): 126-138. [11] GRARD J M, WILHELMS-TRICARICO R, PERRIER P, et al. A 3D Dynamical Biomechanical Tongue Model to Study Speech Motor Control. Research Developments in Biomechanics, 2003, 1: 49-64. [12] ENGWALL O. A 3D Tongue Model Based on MRI Data // Proc of the 6th International Conference on Speech and Language Proce- ssing. Beijing, China, 2000: 901-904. [13] 宋 婵.人体发音过程中的三维声道几何建模.硕士学位论文.天津:天津大学, 2013. (SONG C. Modeling of 3D Geometry Vocal Tract in the Procession of Speech Production. Master Dissertation. Tianjin, China: Tianjin University, 2013.) [14] TAKEMOTO H, KITAMURA T, NISHIMOTO H, et al. A Method of Tooth Superimposition on MRI Data for Accurate Measurement of Vocal Tract Shape and Dimensions. Acoustical Science and Technology, 2004, 25(6): 468-474. [15] BADIN P, ELISEI F, BAILLY G, et al. An Audiovisual Talking Head for Augmented Speech Generation: Models and Animations Based on a Real Speaker's Articulatory Data // Proc of the 5th International Conference on Articulated Motion and Deformable Objects. Mallorca, Spain, 2008: 132-143. [16] SI H. Tetgen: The Quality Tetrahedral Mesh Generator. Version 1.0 User′s Manual [EB/OL]. [2015-08-20]. http://chem.skku.ac.kr/~wkpark/tutor/chem/molsurf/tetgen/UserManual.pdf. [17] KING S A, PARENT R E. A 3D Parametric Tongue Model for Animated Speech. The Journal of Visualization and Computer Animation, 2001, 12(3): 107-115. [18] LI R, YU J, JIANG C, et al. A Mass-Spring Tongue Model with Efficient Collision Detection and Response during Speech // Proc of the 9th International Symposium on Chinese Spoken Language Processing. Singapore, Singapore, 2014: 354-358. [19] YU J, WANG Z F. A Video, Text, and Speech-Driven Realistic 3-D Virtual Head for Human-Machine Interface. IEEE Trans on Cybernetics, 2014, 45(5): 991-1002. [20] 鲍怀翘,杨力立.普通话发音器官动作特性(X光录像带).北京:北京语言学院出版社, 1985. (BAO H Q, YANG L L. The Motion Characteristics of Articulators for Mandarin Chinese (X-Ray Video). Beijing, China: Beijing Language and Culture University Press, 1985.)