基于多尺度注意力机制的场景文本擦除

doi:10.16451/j.cnki.issn1003-6059.202207004

摘要
图/表
参考文献
相关文章 (15)

全文: PDF (5454 KB) HTML (1 KB)
输出: BibTeX | EndNote (RIS)

摘要自然场景文本擦除技术可应用在图像通信中的隐私保护、图像编辑等领域,然而现阶段的场景文本擦除在面对背景复杂、文本尺度变化较大的场景图像时,难以提取鲁棒的文本特征,出现文本检测不全、背景修复不完整等问题.针对上述问题,文中提出基于多尺度注意力机制的场景文本擦除框架.该框架主要由背景修复网络和文本检测网络共同组成,它们共享一个主干网络.在背景修复网络中,设计纹理自适应模块,从原始特征的通道和空间两个维度进行特征编码,自适应地集成局部特征与全局特征,有效修复因重构文本区域而导致的阴影部分.在文本检测网络中,设计上下文感知模块,学习图像中文本区域和非文本区域之间的判别关系,有效区分文本区域和非文本区域,提升文本检测的效果.此外,为了增强网络的感受野,改进不同尺度文本的擦除效果,提出多尺度特征损失函数,同时优化背景修复网络和文本检测网络.SCUT-SYN、SCUT-EnsText数据集上的实验表明,文中框架可取得较优的文本擦除性能.

	服务

	把本文推荐给朋友
	加入我的书架
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章
	何平
	张恒
	刘成林

关键词 ：场景文本擦除, 文本分割, 注意力机制, 多尺度特征, 端到端方法

Abstract：Scene text removal is of great significance for privacy protection and image editing in image communication. However, existing scene text removal models are insufficient in extracting robust features for images with complex background and multi-scale texts, resulting in incomplete text detection and background repair. To solve this problem, a scene text removal framework based on multi-scale attention mechanism is proposed for robust background repair and text detection. The proposed framework is mainly composed of background repair network and text detection network, sharing a backbone network. In the background repair network, a texture adaptive module is designed to encode the channel/spatial features and adaptively integrate local/global features, effectively repairing shadow parts in text reconstruction. To improve text detection, a context aware module is designed to learn the discriminative features between texts and non-texts in the image. Besides, to enhance the receptive field of the network and improve the removal of multi-scale texts, a multi-scale feature loss function is designed to optimize the background repair and text detection modules. Experimental results on SCUT-SYN and SCUT-EnsText datasets show that the proposed method can achieve the state-of-the-art performance in text removal.

Key words： Scene Text Erasure Text Segmentation Attention Mechanism Multi-scale Features End-to-End Method

收稿日期: 2022-05-30

ZTFLH:

TP 391

基金资助:国家自然科学基金项目(No.61936003,61721004)资助

通讯作者: 刘成林 ,博士,教授,主要研究方向为模式识别、计算机视觉、文档图像分析与识别.E-mail:liucl@nlpr.ia.ac.cn.

作者简介: 何平,硕士研究生,主要研究方向为场景文本风格变换.E-mail:ping.he@nlpr.ia.ac.cn.
张恒,博士,副研究员,主要研究方向为文档图像分析与识别.E-mail:heng.zhang@ia.ac.cn.

引用本文:

何平, 张恒, 刘成林. 基于多尺度注意力机制的场景文本擦除[J]. 模式识别与人工智能, 2022, 35(7): 614-624. HE Ping, ZHANG Heng, LIU Chenglin. Scene Text Removal Based on Multi-scale Attention Mechanism. Pattern Recognition and Artificial Intelligence, 2022, 35(7): 614-624.

链接本文:

http://manu46.magtech.com.cn/Jweb_prai/CN/10.16451/j.cnki.issn1003-6059.202207004 或 http://manu46.magtech.com.cn/Jweb_prai/CN/Y2022/V35/I7/614