|
|
An Automatic Approach to Lip Localization, Contour Extraction and Tracking |
WANG XiaoPing1, HAO YuFeng2, FU DeGang1, YUAN ChunWei1 |
1.State Key Laboratory of Bioelectronics, Southeast University, Nanjing 210096 2.Beijing InfoQuick SinoVoice Speech Technology Corporation, Beijing 100094 |
|
|
Abstract ;An automatic approach to lip localization, contour extraction and tracking is presented, which combines CbCr color space, Fisher transform and deformable templates. Firstly, a skincolor model is constructed for skin detection in CbCr color space so that the approximate lip region can be obtained based on the geometry features of human face. Then, the color difference between lip and skin is enhanced by Fisher transform. Preprocessing of brightness is carried out before segmentation, and the threshold is obtained by Otsu method. Next, the lip color model is used to validate the segmentation result of accurate localization and the deformable templates are used for lip contour extraction. Based on the segmentation result, a method of curves fitting for edges of inner mouth is presented to extract the inner contour robustly. Finally, locating result of previous frame is predicted as the next lip region, in which lip localization and contour extraction are executed for lip tracking.
|
Received: 16 January 2006
|
|
|
|
|
[1] Hennecke M E, Prasad K V, Stork D G. Automatic Speech Recognition System Using Acoustic and Visual Signals // Proc of the 29th Asilomar Conference on Signals, Systems and Computers. Pacific Grove, USA, 1995, Ⅱ: 12141218 [2] Lanitis A, Taylor C J, Cootes T F. An Automatic Face Identification System Using Flexible Appearance Models. Image and Vision Computing, 1995, 13(5): 393401 [3] Turk M, Pentland A. Eigenfaces for Recognition. Journal of Cognitive Neuroscience, 1991, 3(1): 7186 [4] Yang Jie, Waibel A. A Realtime Face Tracker // Proc of the 3rd IEEE Workshop on Applications of Computer Vision. Sarasota, USA, 1996: 142147 [5] Chai D, Ngan K N. Face Segmentation Using SkinColor Map in Videophone Application. IEEE Trans on Circuits and Systems for Video Technology, 1999, 9(4): 551564 [6] Lin Fuzong. Fundamentals of Multimedia Technology. 2nd Edition. Beijing, China: Tsinghua University Press, 2002 (in Chinese) (林福宗.多媒体技术基础.第2版.北京:清华大学出版社, 2002) [7] Yao Hongxun, Liu Mingbao, Gao Wen, et al. Method of Face Locating and Tracking Based on Chromatic Coordinates Transformation of Color Images. Chinese Journal of Computers, 2000, 23(2): 158165 (in Chinese) (姚鸿勋,刘明宝,高 文,等.基于彩色图像的色系坐标变换的面部定位与跟踪法.计算机学报, 2000, 23(2): 158165) [8] Wang Rui, Gao Wen, Ma Jiyong. An Approach to Robust and Fast Locating of Lip Motion. Chinese Journal of Computers, 2001, 24(8): 866871 (in Chinese) (王 瑞,高 文,马继涌.一种快速、鲁棒的唇动检测与定位方法.计算机学报, 2001, 24(8): 866871) [9] Bian Zhaoqi, Zhang Xuegong. Pattern Recognition. 2nd Edition. Beijing, China: Tsinghua University Press, 2002 (in Chinese) (边肇祺, 张学工.模式识别.第2版.北京:清华大学出版社, 2002) [10] Otsu N. A Threshold Selection Method from GrayLevel Histogram. IEEE Trans on Systems, Man and Cybernetics, 1979, 9(1): 6266 [11] Hennecke M E, Prasad K V, Stork D G. Using Deformable Templates to Infer Visual Speech Dynamics // Proc of the 28th Annual Asilomar Conference on Signals, Systems and Computers. Pacific Grove, USA, 1994, Ⅰ: 578582 [12] Kass M, Witkin A, Terzopoulus D. Snakes: Active Contour Models. International Journal of Computer Vision, 1988, 1(4): 321331 [13] Bregler C, Konig Y. Eigenlips for Robust Speech Recognition // Proc of the IEEE International Conference on Acoustics, Speech and Signal Processing. Adelaide, Australia, 1994: 669672 [14] Iwano K, Tamura S, Furui S. Bimodal Speech Recognition Using Lip Movement Measured by OpticalFlow Analysis // Proc of the International Workshop on HandsFree Speech Communication. Kyoto, Japan, 2001: 187190 [15] Luettin J, Thacker N A, Beet S W. Speechreading Using Shape and Intensity Information // Proc of the IEEE International Conference on Spoken Language. Philadelphia, USA, 1996, Ⅰ: 5861 [16] Cootes T F, Edwards G J, Taylor C J. Active Appearance Models. IEEE Trans on Pattern Analysis and Machine Intelligence, 2001, 23(6): 681685 [17] Aleksic P S, Katsaggelos K. Comparison of MPEG4 Facial Animation Parameter Groups with Respect to AudioVisual Speech Recognition Performance // Proc of the IEEE International Conference on Image Processing. Genoa, Italy, 2005, Ⅲ: 501504 [18] Xue Yi. The Principles and Methods for Optimization. Beijing, China: Beijing Industry University Press, 2001 (in Chinese) (薛 毅.最优化原理与方法.北京:北京工业大学出版社, 2001) [19] Gonzalez R C, Woods R E. Digital Image Processing. 2nd Edition. Upper Saddle River, USA: Prentice Hall, 2002 |
|
|
|