模式识别与人工智能
Saturday, Apr. 5, 2025 Home      About Journal      Editorial Board      Instructions      Ethics Statement      Contact Us                   中文
Pattern Recognition and Artificial Intelligence  2024, Vol. 37 Issue (1): 73-84    DOI: 10.16451/j.cnki.issn1003-6059.202401006
Researches and Applications Current Issue| Next Issue| Archive| Adv Search |
Lipreading Based on Multiple Visual Attention
XIE Yincen1, XUE Feng2, CAO Mingwei3
1. School of Computer Science and Information Engineering, Hefei University of Technology, Hefei 230601;
2. School of Software, Hefei University of Technology, Hefei 230601;
3. School of Computer Science and Technology, Anhui University, Hefei 230601

Download: PDF (2836 KB)   HTML (1 KB) 
Export: BibTeX | EndNote (RIS)      
Abstract  Lipreading is a technology that translates the silent video of a single speaker's lip motion into text. Due to the small amplitude of lip movements, the feature differentiation ability and the generalization ability of the model are both weak. To address this issue, the purification of lipreading visual features is studied from three dimensions including time, space and channel. A method for lipreading based on multiple visual attention network(LipMVA) is proposed. Firstly, channel-level features are calibrated adaptively by channel attention to mitigate the interference from meaningless channels. Then, two spatio-temporal attention modules with different granularities are employed to suppress the effect of unimportant pixels or frames. Finally, experiments on CMLR and GRID datasets demonstrate LipMVA can reduce the error rate and therefore its effectiveness is verified.
Key wordsLipreading      Visual Speech Recognition      Attention Mechanism      Deep Neural Network      Feature Extraction     
Received: 26 September 2023     
ZTFLH: TP391.41  
Fund:National Natural Science Foundation of China(No.62272143), Anhui Provincial Major Science and Technology Project(No.202203a05020025), University Synergy Innovation Program of Anhui Province(No.GXXT-2022-054), The Se-venth Special Support Plan for Innovation and Entrepreneurship in Anhui Province
Corresponding Authors: XUE Feng, Ph.D., professor. His research interests include artificial intelligence, multimedia analysis and recommendation system.   
About author:: XIE Yincen, Master student. His research interests include computer vision.CAO Mingwei, Ph.D., associate profe-ssor. His research interests include 3D reconstruction and virtual reality.
Service
E-mail this article
Add to my bookshelf
Add to citation manager
E-mail Alert
RSS
Articles by authors
XIE Yincen
XUE Feng
CAO Mingwei
Cite this article:   
XIE Yincen,XUE Feng,CAO Mingwei. Lipreading Based on Multiple Visual Attention[J]. Pattern Recognition and Artificial Intelligence, 2024, 37(1): 73-84.
URL:  
http://manu46.magtech.com.cn/Jweb_prai/EN/10.16451/j.cnki.issn1003-6059.202401006      OR     http://manu46.magtech.com.cn/Jweb_prai/EN/Y2024/V37/I1/73
Copyright © 2010 Editorial Office of Pattern Recognition and Artificial Intelligence
Address: No.350 Shushanhu Road, Hefei, Anhui Province, P.R. China Tel: 0551-65591176 Fax:0551-65591176 Email: bjb@iim.ac.cn
Supported by Beijing Magtech  Email:support@magtech.com.cn