空间约束下自相互注意力的RGB-D显著目标检测

doi:10.16451/j.cnki.issn1003-6059.202206005

摘要
图/表
参考文献
相关文章 (15)

全文: PDF (1697 KB) HTML (1 KB)
输出: BibTeX | EndNote (RIS)

摘要针对RGB-D显著目标检测问题,提出空间约束下自相互注意力的RGB-D显著目标检测方法.首先,引入空间约束自相互注意力模块,利用多模态特征的互补性,学习具有空间上下文感知的多模态特征表示,同时计算两种模态查询位置与周围区域的成对关系以集成自注意力和相互注意力,进而聚合两个模态的上下文特征.然后,为了获得更互补的信息,进一步将金字塔结构应用在一组空间约束自相互注意力模块中,适应不同空间约束下感受野不同的特征,学习到局部和全局的特征表示.最后,将多模态融合模块嵌入双分支编码-解码网络中,解决RGB-D显著目标检测问题.在4个公开数据集上的实验表明,文中方法在RGB-D显著目标检测任务上具有较强的竞争性.

	服务

	把本文推荐给朋友
	加入我的书架
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章
	袁晓
	肖云
	江波
	汤进

关键词 ： RGB-D显著目标检测, 多模态融合, 自注意力机制, 卷积神经网络

Abstract：Aiming at the problem of RGB-D salient object detection, a RGB-D salient object detection method is proposed based on pyramid spatial constrained self-mutual attention. Firstly, a spatial constrained self-mutual attention module is introduced to learn multi-modal feature representations with spatial context awareness by the complementarity of multi-modal features. Meanwhile, the pairwise relationships between the query positions and surrounding areas are calculated to integrate self-attention and mutual attention, and thus the contextual features of the two modalities are aggregated. Then, to obtain more complementary information, the pyramid structure is applied to a set of spatial constrained self-mutual attention modules to adapt to different features of the receptive field under different spatial constraints and learn local and global feature representations. Finally, the multi-modal fusion module is embedded into a two-branch encoder-decoder network model, and the RGB-D salient object detection task is solved. Experiments on four benchmark datasets show strong competitiveness of the proposed me-thod in RGB-D salient object detection.

Key words： RGB-D Salient Object Detection Multi-modal Fusion Self-Attention Mechanism Convolution Neural Network

收稿日期: 2021-08-27

ZTFLH:

TP 391

基金资助:国家自然科学基金项目(No.62076004,62006002)、安徽省自然科学基金青年项目(No.1908085QF264)、安徽高校协同创新项目(No.GXXT-2020-013)资助

通讯作者: 江波,博士,副教授,主要研究方向为图像特征提取和匹配、图数据表示和学习.E-mail:jiangbo@ahu.edu.cn.

作者简介: 袁晓,硕士研究生,主要研究方向为显著性检测.E-mail:yuanx25@163.com.
肖云,博士,副教授,主要研究方向为显著目标检测、多模态分析等.E-mail:xiaoyun@ahu.edu.cn.
汤进,博士,教授,主要研究方向为图像视频的表示与识别、多模态分析等.E-mail:tangjin@ahu.edu.cn.

引用本文:

袁晓, 肖云, 江波, 汤进. 空间约束下自相互注意力的RGB-D显著目标检测[J]. 模式识别与人工智能, 2022, 35(6): 526-535. YUAN Xiao, XIAO Yun, JIANG Bo, TANG Jin. RGB-D Salient Object Detection Based on Spatial Constrained and Self-Mutual Attention. Pattern Recognition and Artificial Intelligence, 2022, 35(6): 526-535.

链接本文:

http://manu46.magtech.com.cn/Jweb_prai/CN/10.16451/j.cnki.issn1003-6059.202206005 或 http://manu46.magtech.com.cn/Jweb_prai/CN/Y2022/V35/I6/526

[1] 杨佳信,胡晓,向俊将.基于堆叠边缘感知模块的显著性目标检测.模式识别与人工智能, 2020, 33(10): 906-916.
(YANG J X, HU X, XIANG J J. Salient Object Detection Based on Stack Edge-Aware Module. Pattern Recognition and Artificial Inte-lligence, 2020, 33(10): 906-916.)
[2] LIU N, ZHANG N, HAN J W. Learning Selective Self-Mutual Atten-tion for RGB-D Saliency Detection // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2020: 13753-13762.
[3] 王延召,彭国华,延伟东.基于流形排序和联合连通性先验的显著性目标检测.模式识别与人工智能, 2019, 32(1): 82-93.
(WANG Y Z, PENG G H, YAN W D. Salient Object Detection Based on Manifold Ranking and Co-connectivity. Pattern Recognition and Artificial Intelligence, 2019, 32(1): 82-93.)
[4] MAHADEVAN V, VASCONCELOS N. Saliency-Based Discriminant Tracking // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2009: 1007-1013.
[5] SHIMODA W, YANAI K. Distinct Class-Specific Saliency Maps for Weakly Supervised Semantic Segmentation // Proc of the European Conference on Computer Vision. Berlin, Germany: Springer, 2016: 218-234.
[6] ZHAO R, OUYANG W L, WANG X G. Person Re-identification by Saliency Learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(2): 356-370.
[7] CHEN H, LI Y F, SU D. Multi-modal Fusion Network with Multi-scale Multi-path and Cross-Modal Interactions for RGB-D Salient Object Detection. Pattern Recognition, 2019, 86: 376-385.
[8] 鲍蕾,陆建江,李阳,等.基于全局和局部信息融合的图像显著性检测.模式识别与人工智能, 2015, 28(3): 275-281.
(BAO L, LU J J, LI Y, et al. Image Saliency Detection Based on Global and Local Information Fusion. Pattern Recognition and Artificial Intelligence, 2015, 28(3): 275-281.)
[9] ZHOU T, FAN D P, CHENG M M, et al. RGB-D Salient Object Detection: A Survey. Computational Visual Media, 2021, 7: 37-69.
[10] QU L Q, HE S F, ZHANG J W, et al. RGBD Salient Object Detection via Deep Fusion. IEEE Transactions on Image Processing, 2017, 26(5): 2274-2285.
[11] HAN J W, CHEN H, LIU N, et al. CNNs-Based RGB-D Saliency Detection via Cross-View Transfer and Multiview Fusion. IEEE Transactions on Cybernetics, 2018, 48(11): 3171-3183.
[12] CHEN H, LI Y F. Three-Stream Attention-Aware Network for RGB-D Salient Object Detection. IEEE Transactions on Image Proce-ssing, 2019, 28(6): 2825-2835.
[13] PIAO Y R, JI W, LI J J, et al. Depth-Induced Multi-scale Recu-rrent Attention Network for Saliency Detection // Proc of the IEEE/CVF International Conference on Computer Vision. Washington, USA: IEEE, 2019: 7253-7262.
[14] CHEN H, LI Y F. Progressively Complementarity-Aware Fusion Network for RGB-D Salient Object Detection // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2018: 3051-3060.
[15] VASWANI A, SHAZEER N, PARMAR N, et al. Attention Is All You Need // Proc of the 31st Conference on Neural Information Processing Systems. Cambridge,USA: The MIT Press, 2017: 6000-6010.
[16] ZHAO H S, JIA J Y, KOLTUN V. Exploring Self-Attention for Image Recognition // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2020: 10073-10082.
[17] HU H, ZHANG Z, XIE Z D, et al. Local Relation Networks for Image Recognition // Proc of the IEEE/CVF International Confe-rence on Computer Vision. Washington, USA: IEEE, 2019: 3464-3473.
[18] ZHANG D, ZHANG H W, TANG J H, et al. Feature Pyramid Transformer // Proc of the European Conference on Computer Vision. Berlin, Germany: Springer, 2020: 323-339.
[19] WANG X L, GIRSHICK R, GUPTA A, et al. Non-local Neural Networks // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2018: 7794-7803.
[20] HUANG Z L, WANG X G, HUANG L C, et al. CCNet: Criss-Cross Attention for Semantic Segmentation // Proc of the IEEE/CVF International Conference on Computer Vision. Washington, USA: IEEE, 2019: 603-612.
[21] ZHANG L, LI X T, ARNAB A, et al. Dual Graph Convolutional Network for Semantic Segmentation[C/OL].[2021-07-20]. https://arxiv.org/pdf/1909.06121v2.pdf.
[22] LIU Z Y, TANG J T, XIANG Q, et al. Salient Object Detection for RGB-D Images by Generative Adversarial Network. Multimedia Tools and Applications, 2020, 79: 25403-25425.
[23] GU Y C, WANG L J, WANG Z Q, et al. Pyramid Constrained Self-Attention Network for Fast Video Salient Object Detection // Proc of the 34th AAAI Conference on Artificial Intelligence. Palo Alto, USA: AAAI, 2020: 10869-10876.
[24] ZHANG J, FAN D P, DAI Y C, et al. UC-Net: Uncertainty Inspired RGB-D Saliency Detection via Conditional Variational Autoencoders // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2020: 8579-8588.
[25] ZHANG M, REN W S, PIAO Y R, et al. Select, Supplement and Focus for RGB-D Saliency Detection // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2020: 3469-3478.
[26] HE K M, ZHANG X Y, REN S Q, et al. Deep Residual Learning for Image Recognition // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2016: 770-778.
[27] WEI J, WANG S H, WU Z, et al. Label Decoupling Framework for Salient Object Detection // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2020: 13022-13031.
[28] ZHAO J X, CAO Y, FAN D P, et al. Contrast Prior and Fluid Pyramid Integration for RGBD Salient Object Detection // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2019: 3922-3931.
[29] JU R, GE L, GENG W J, et al. Depth Saliency Based on Anisotropic Center-Surround Difference // Proc of the IEEE International Conference on Image Processing. Washington, USA: IEEE, 2014: 1115-1119.
[30] PENG H W, LI B, XIONG W H, et al. RGBD Salient Object Detection: A Benchmark and Algorithms // Proc of the European Conference on Computer Vision. Berlin, Germany: Springer, 2014: 92-109.
[31] NIU Y Z, GENG Y J, LI X Q, et al. Leveraging Stereopsis for Saliency Analysis // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2012: 454-461.
[32] FAN D P, LIN Z, ZHANG Z, et al. Rethinking RGB-D Salient Object Detection: Models, Data Sets, and Large-Scale Benchmarks. IEEE Transactions on Neural Networks and Learning Systems, 2021, 32(5): 2075-2089.
[33] BORJI A, CHENG M M, JIANG H Z, et al. Salient Object Detection: A Benchmark. IEEE Transactions on Image Processing, 2015, 24(12): 5706-5722.
[34] FAN D P, CHENG M M, LIU Y, et al. Structure-Measure: A New Way to Evaluate Foreground Maps // Proc of the IEEE International Conference on Computer Vision. Washington, USA: IEEE, 2017: 4558-4567.
[35] ACHANTA R, HEMAMI S, ESTRADA F, et al. Frequency-Tuned Salient Region Detection // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2009: 1597-1604.
[36] FAN D P, GONG C, CAO Y, et al. Enhanced-Alignment Measure for Binary Foreground Map Evaluation // Proc of the 27th International Joint Conference on Artificial Intelligence. San Francisco, USA: IJCAI, 2018: 698-704.
[37] PERAZZI F, KRÄHENBÜHL P, PRITH Y, et al. Saliency Filters: Contrast Based Filtering for Salient Region Detection // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2012: 733-740.
[38] DENG J, DONG W, SOCHER R, et al. ImageNet: A Large-Scale Hierarchical Image Database // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2009: 248-255.
[39] PIAO Y R, RONG Z K, ZHANG M, et al. A2dele: Adaptive and Attentive Depth Distiller for Efficient RGB-D Salient Object Detection // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2020: 9057-9066.
[40] LI C Y, CONG R M, PIAO Y R, et al. RGB-D Salient Object De-tection with Cross-Modality Modulation and Selection // Proc of the European Conference on Computer Vision. Berlin, Germany: Springer, 2020: 225-241.
[41] ZHAO X Q, ZHANG L H, PANG Y W, et al. A Single Stream Network for Robust and Real-Time RGB-D Salient Object Detection // Proc of the European Conference on Computer Vision. Berlin, Germany: Springer, 2020: 646-662.