基于多重视觉注意力的唇语识别

doi:10.16451/j.cnki.issn1003-6059.202401006

Abstract
Figure/Table
References
Related Citation (15)

Download: PDF (2836 KB) HTML (1 KB)
Export: BibTeX | EndNote (RIS)

Abstract Lipreading is a technology that translates the silent video of a single speaker's lip motion into text. Due to the small amplitude of lip movements, the feature differentiation ability and the generalization ability of the model are both weak. To address this issue, the purification of lipreading visual features is studied from three dimensions including time, space and channel. A method for lipreading based on multiple visual attention network(LipMVA) is proposed. Firstly, channel-level features are calibrated adaptively by channel attention to mitigate the interference from meaningless channels. Then, two spatio-temporal attention modules with different granularities are employed to suppress the effect of unimportant pixels or frames. Finally, experiments on CMLR and GRID datasets demonstrate LipMVA can reduce the error rate and therefore its effectiveness is verified.

Key words： Lipreading Visual Speech Recognition Attention Mechanism Deep Neural Network Feature Extraction

Received: 26 September 2023

ZTFLH:

TP391.41

Fund:National Natural Science Foundation of China(No.62272143), Anhui Provincial Major Science and Technology Project(No.202203a05020025), University Synergy Innovation Program of Anhui Province(No.GXXT-2022-054), The Se-venth Special Support Plan for Innovation and Entrepreneurship in Anhui Province

Corresponding Authors: XUE Feng, Ph.D., professor. His research interests include artificial intelligence, multimedia analysis and recommendation system.

About author:: XIE Yincen, Master student. His research interests include computer vision.CAO Mingwei, Ph.D., associate profe-ssor. His research interests include 3D reconstruction and virtual reality.

	Service

	E-mail this article
	Add to my bookshelf
	Add to citation manager
	E-mail Alert
	RSS
	Articles by authors
	XIE Yincen
	XUE Feng
	CAO Mingwei

Cite this article:

XIE Yincen,XUE Feng,CAO Mingwei. Lipreading Based on Multiple Visual Attention[J]. Pattern Recognition and Artificial Intelligence, 2024, 37(1): 73-84.

URL:

http://manu46.magtech.com.cn/Jweb_prai/EN/10.16451/j.cnki.issn1003-6059.202401006 OR http://manu46.magtech.com.cn/Jweb_prai/EN/Y2024/V37/I1/73