模式识别与人工智能
Tuesday, Apr. 22, 2025 Home      About Journal      Editorial Board      Instructions      Ethics Statement      Contact Us                   中文
  2021, Vol. 34 Issue (6): 572-580    DOI: 10.16451/j.cnki.issn1003-6059.202106009
Researches and Applications Current Issue| Next Issue| Archive| Adv Search |
Speech Driven Talking Face Video Generation via Landmarks Representation
NIAN Fudong1,2, WANG Wentao1, WANG Yan1, ZHANG Jingjing1, HU Guiheng3, LI Teng1
1. School of Artificial Intelligence, Anhui University, Hefei 230601
2. School of Advanced Manufacturing Engineering, Hefei University, Hefei 230601
3. School of Information Engineering, Anhui Business and Technology College, Hefei 231131

Download: PDF (2728 KB)   HTML (1 KB) 
Export: BibTeX | EndNote (RIS)      
Abstract  The speaker's head motion is ignored in the existing speech driven talking face video generation methods. Aiming at this problem, a speech driven talking face video generation method based on facial landmarks representation is proposed. The speaker's head motion information and lip motion information are represented by facial contour landmarks and lip landmarks, respectively. The speech is converted to facial landmarks through a parallel multi-branch network. The final talking face video is synthesized by continuous lip landmark sequence, head landmark sequence and template image. The corresponding quantitative and qualitative experiments are conducted. Experimental results show that the talking face video with head action synthesized by the proposed method is clear and natural, and its performance is better.
Key wordsTalking Face      Facial Landmark      Lip Action      Head Action      Face Video     
Received: 03 March 2021     
ZTFLH: TP 391.4  
Fund:University Synergy Innovation Program of Anhui Province(No.GXXT-2019-007), National Natural Science Foundation of China(No.61902104), Natural Science Foundation of Anhui Province(No.2008085QF295), University Natural Science Research Project of Anhui Province(No.KJ2020A0651), Talent Research Foundation of Hefei University(No.18-19RC54)
Corresponding Authors: NIAN Fudong, Ph.D., associate professor. His research interests include computer vision and multimedia computing.   
About author:: WANG Wentao, master student. His research interests include image generation.
WANG Yan, Ph.D. candidate. Her research interests include convolution neural network and multimodal fusion.
ZHANG Jingjing, Ph.D., associate professor. Her research interests include compu-ter vision.
HU Guiheng, master, lecturer. His research interests include software technology and artificial intelligence.
LI Teng, Ph.D., professor. His research interests include computer vision and multimedia computing.
Service
E-mail this article
Add to my bookshelf
Add to citation manager
E-mail Alert
RSS
Articles by authors
NIAN Fudong
WANG Wentao
WANG Yan
ZHANG Jingjing
HU Guiheng
LI Teng
Cite this article:   
NIAN Fudong,WANG Wentao,WANG Yan等. Speech Driven Talking Face Video Generation via Landmarks Representation[J]. , 2021, 34(6): 572-580.
URL:  
http://manu46.magtech.com.cn/Jweb_prai/EN/10.16451/j.cnki.issn1003-6059.202106009      OR     http://manu46.magtech.com.cn/Jweb_prai/EN/Y2021/V34/I6/572
Copyright © 2010 Editorial Office of Pattern Recognition and Artificial Intelligence
Address: No.350 Shushanhu Road, Hefei, Anhui Province, P.R. China Tel: 0551-65591176 Fax:0551-65591176 Email: bjb@iim.ac.cn
Supported by Beijing Magtech  Email:support@magtech.com.cn