|
|
3D Visualization Method for Tongue Movements in Pronunciation |
LI Rui1,2,3, YU Jun2,3, LUO Changwei2,3, WANG Zengfu1,2,3 |
1.Laboratory of Nuclear Environment Telerobot, Institute of Intelligent Machines, Chinese Academy of Sciences, Hefei 230031 2. National Engineering Laboratory for Speech and Language Information Processing, University of Science and Technology of China, Hefei 230027 3.School of Information Science and Technology, University of Science and Technology of China, Hefei 230027 |
|
|
Abstract Problem of 3D visualization of tongue movements in pronunciation is studied. Firstly, a precise 3D tongue model according to magnetic resonance imaging scan data is built. Based on the 3D tongue model, the electromagnetic articulometer(EMA) data collected from three points on tongue dorsum surface are used as the driven data. The mass spring technique is used to realize realistic tongue movements in pronunciation. To evaluate the effect of modeling and synthesis methods for tongue movements, the computer graphics techniques are employed to simulate the detailed effect of the tongue movements. Finally, the simulation results are compared with X-ray video of the motion characteristics of articulators for Mandarin Chinese recorded by a pronunciation specialist. The experimental result shows the proposed method achieves precise and realistic results of 3D tongue movements and it has a wide application prospect.
|
Received: 10 September 2015
|
About author:: 李 睿,女,1989年生,博士研究生,主要研究方向为计算机图形学、可视化语音处理、人机交互.E-mail:ruili89@mail.ustc.edu.cn. (LI Rui, born in 1989, Ph.D. candidate. Her research inte- rests include computer graphics, audio visual speech processing and human machine interaction.) 於 俊,男,1982年生,博士,副研究员,主要研究方向为人机交互、计算机图形学、可视化语音处理.E-mail:harryjun@ustc.edu.cn. (YU Jun, born in 1982, Ph.D., associate professor. His research interests include human machine interaction, computer graphics and audio visual speech processing.) 罗常伟,男,1985年生,博士,主要研究方向为计算机图形学、人机交互、视频跟踪.E-mail:luocw@mail.ustc.edu.cn. (LUO Changwei, born in 1985, Ph.D.. His research inte- rests include computer graphics, human machine interaction and video tracking.) 汪增福(通讯作者),男,1960年生,博士,教授,主要研究方向为计算机视觉、模式识别、语音可视化、人机交互、智能机器人.E-mail:zfwang@ustc.edu.cn. (WANG Zengfu(Corresponding author), born in 1960, Ph.D., professor. His research interests include computer vision, pattern recognition, audio visual speech processing, human machine interaction and intelligent robot.) |
|
|
|
[1] DORAN G A, BAGGETT H. A Structural and Functional Classifica- tion of Mammalian Tongues. Journal of Mammalogy, 1971, 52(2): 427-429. [2] HIIEMAE K M, PALMER J B. Tongue Movements in Feeding and Speech. Critical Reviews in Oral Biology and Medicine, 2003, 14(6): 413-429. [3] WANG L, CHEN H, LI S, et al. Phoneme-Level Articulatory Animation in Pronunciation Training. Speech Communication, 2012, 54(7): 845-856. [4] 江 辰,於 俊,罗常伟,等.基于生理舌头模型的语音可视化系统.中国图象图形学报, 2015, 20(9): 1237-1246. (JIANG C, YU J, LUO C W, et al. Speech Visualization System Based on Physiological Tongue Model. Journal of Image and Graphics, 2015, 20(9): 1237-1246.) [5] SONG C, WEI J G, FANG Q, et al. Tongue Shape Synthesis Based on Active Shape Model // Proc of the 8th International Symposium on Chinese Spoken Language Processing. Hong Kong, China, 2012: 383-386. [6] ENGWALL O. Combining MRI, EMA and EPG Measurements in a Three-Dimensional Tongue Model. Speech Communication, 2003, 41(2/3): 303-329. [7] COHEN M M, BESKOW J, MASSARO D W. Recent Developments in Facial Animation: An Inside View [C/OL] .[2015-08-20]. http://www.speech.kth.se/prod/publications/files/1143.pdf. [8] WESTBURY J R. X-Ray Microbeam Speech Production Database User's Handbook: 1.0 Version [EB/OL]. [2015-08-20]. http://www.haskins.yale.edu/staff/gafos_downloads/ubdbman.pdf. [9] WESTBURY J R, SEVERSON E J, LINDSTROM M J. Kinematic Event Patterns in Speech: Special Problems. Language and Speech, 2000, 43(4): 403-428. [10] TASKO S M, KENT R D, WESTBURY J R. Variability in Tongue Movement Kinematics during Normal Liquid Swallowing. Dysphagia, 2002, 17(2): 126-138. [11] GRARD J M, WILHELMS-TRICARICO R, PERRIER P, et al. A 3D Dynamical Biomechanical Tongue Model to Study Speech Motor Control. Research Developments in Biomechanics, 2003, 1: 49-64. [12] ENGWALL O. A 3D Tongue Model Based on MRI Data // Proc of the 6th International Conference on Speech and Language Proce- ssing. Beijing, China, 2000: 901-904. [13] 宋 婵.人体发音过程中的三维声道几何建模.硕士学位论文.天津:天津大学, 2013. (SONG C. Modeling of 3D Geometry Vocal Tract in the Procession of Speech Production. Master Dissertation. Tianjin, China: Tianjin University, 2013.) [14] TAKEMOTO H, KITAMURA T, NISHIMOTO H, et al. A Method of Tooth Superimposition on MRI Data for Accurate Measurement of Vocal Tract Shape and Dimensions. Acoustical Science and Technology, 2004, 25(6): 468-474. [15] BADIN P, ELISEI F, BAILLY G, et al. An Audiovisual Talking Head for Augmented Speech Generation: Models and Animations Based on a Real Speaker's Articulatory Data // Proc of the 5th International Conference on Articulated Motion and Deformable Objects. Mallorca, Spain, 2008: 132-143. [16] SI H. Tetgen: The Quality Tetrahedral Mesh Generator. Version 1.0 User′s Manual [EB/OL]. [2015-08-20]. http://chem.skku.ac.kr/~wkpark/tutor/chem/molsurf/tetgen/UserManual.pdf. [17] KING S A, PARENT R E. A 3D Parametric Tongue Model for Animated Speech. The Journal of Visualization and Computer Animation, 2001, 12(3): 107-115. [18] LI R, YU J, JIANG C, et al. A Mass-Spring Tongue Model with Efficient Collision Detection and Response during Speech // Proc of the 9th International Symposium on Chinese Spoken Language Processing. Singapore, Singapore, 2014: 354-358. [19] YU J, WANG Z F. A Video, Text, and Speech-Driven Realistic 3-D Virtual Head for Human-Machine Interface. IEEE Trans on Cybernetics, 2014, 45(5): 991-1002. [20] 鲍怀翘,杨力立.普通话发音器官动作特性(X光录像带).北京:北京语言学院出版社, 1985. (BAO H Q, YANG L L. The Motion Characteristics of Articulators for Mandarin Chinese (X-Ray Video). Beijing, China: Beijing Language and Culture University Press, 1985.) |
|
|
|