RGB-D Salient Object Detection Based on Spatial Constrained and Self-Mutual Attention
YUAN Xiao1, XIAO Yun2, JIANG Bo1,3, TANG Jin1
1. Anhui Provincial Key Laboratory of Multimodal Cognitive Computation, School of Computer Science and Technology, Anhui University, Hefei 230601; 2. School of Artificial Intelligence, Anhui University, Hefei 230601; 3. Institute of Artificial Intelligence, Hefei Comprehensive National Science Center, Hefei 230088
Abstract:Aiming at the problem of RGB-D salient object detection, a RGB-D salient object detection method is proposed based on pyramid spatial constrained self-mutual attention. Firstly, a spatial constrained self-mutual attention module is introduced to learn multi-modal feature representations with spatial context awareness by the complementarity of multi-modal features. Meanwhile, the pairwise relationships between the query positions and surrounding areas are calculated to integrate self-attention and mutual attention, and thus the contextual features of the two modalities are aggregated. Then, to obtain more complementary information, the pyramid structure is applied to a set of spatial constrained self-mutual attention modules to adapt to different features of the receptive field under different spatial constraints and learn local and global feature representations. Finally, the multi-modal fusion module is embedded into a two-branch encoder-decoder network model, and the RGB-D salient object detection task is solved. Experiments on four benchmark datasets show strong competitiveness of the proposed me-thod in RGB-D salient object detection.
[1] 杨佳信,胡晓,向俊将.基于堆叠边缘感知模块的显著性目标检测.模式识别与人工智能, 2020, 33(10): 906-916. (YANG J X, HU X, XIANG J J. Salient Object Detection Based on Stack Edge-Aware Module. Pattern Recognition and Artificial Inte-lligence, 2020, 33(10): 906-916.) [2] LIU N, ZHANG N, HAN J W. Learning Selective Self-Mutual Atten-tion for RGB-D Saliency Detection // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2020: 13753-13762. [3] 王延召,彭国华,延伟东.基于流形排序和联合连通性先验的显著性目标检测.模式识别与人工智能, 2019, 32(1): 82-93. (WANG Y Z, PENG G H, YAN W D. Salient Object Detection Based on Manifold Ranking and Co-connectivity. Pattern Recognition and Artificial Intelligence, 2019, 32(1): 82-93.) [4] MAHADEVAN V, VASCONCELOS N. Saliency-Based Discriminant Tracking // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2009: 1007-1013. [5] SHIMODA W, YANAI K. Distinct Class-Specific Saliency Maps for Weakly Supervised Semantic Segmentation // Proc of the European Conference on Computer Vision. Berlin, Germany: Springer, 2016: 218-234. [6] ZHAO R, OUYANG W L, WANG X G. Person Re-identification by Saliency Learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(2): 356-370. [7] CHEN H, LI Y F, SU D. Multi-modal Fusion Network with Multi-scale Multi-path and Cross-Modal Interactions for RGB-D Salient Object Detection. Pattern Recognition, 2019, 86: 376-385. [8] 鲍蕾,陆建江,李阳,等.基于全局和局部信息融合的图像显著性检测.模式识别与人工智能, 2015, 28(3): 275-281. (BAO L, LU J J, LI Y, et al. Image Saliency Detection Based on Global and Local Information Fusion. Pattern Recognition and Artificial Intelligence, 2015, 28(3): 275-281.) [9] ZHOU T, FAN D P, CHENG M M, et al. RGB-D Salient Object Detection: A Survey. Computational Visual Media, 2021, 7: 37-69. [10] QU L Q, HE S F, ZHANG J W, et al. RGBD Salient Object Detection via Deep Fusion. IEEE Transactions on Image Processing, 2017, 26(5): 2274-2285. [11] HAN J W, CHEN H, LIU N, et al. CNNs-Based RGB-D Saliency Detection via Cross-View Transfer and Multiview Fusion. IEEE Transactions on Cybernetics, 2018, 48(11): 3171-3183. [12] CHEN H, LI Y F. Three-Stream Attention-Aware Network for RGB-D Salient Object Detection. IEEE Transactions on Image Proce-ssing, 2019, 28(6): 2825-2835. [13] PIAO Y R, JI W, LI J J, et al. Depth-Induced Multi-scale Recu-rrent Attention Network for Saliency Detection // Proc of the IEEE/CVF International Conference on Computer Vision. Washington, USA: IEEE, 2019: 7253-7262. [14] CHEN H, LI Y F. Progressively Complementarity-Aware Fusion Network for RGB-D Salient Object Detection // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2018: 3051-3060. [15] VASWANI A, SHAZEER N, PARMAR N, et al. Attention Is All You Need // Proc of the 31st Conference on Neural Information Processing Systems. Cambridge,USA: The MIT Press, 2017: 6000-6010. [16] ZHAO H S, JIA J Y, KOLTUN V. Exploring Self-Attention for Image Recognition // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2020: 10073-10082. [17] HU H, ZHANG Z, XIE Z D, et al. Local Relation Networks for Image Recognition // Proc of the IEEE/CVF International Confe-rence on Computer Vision. Washington, USA: IEEE, 2019: 3464-3473. [18] ZHANG D, ZHANG H W, TANG J H, et al. Feature Pyramid Transformer // Proc of the European Conference on Computer Vision. Berlin, Germany: Springer, 2020: 323-339. [19] WANG X L, GIRSHICK R, GUPTA A, et al. Non-local Neural Networks // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2018: 7794-7803. [20] HUANG Z L, WANG X G, HUANG L C, et al. CCNet: Criss-Cross Attention for Semantic Segmentation // Proc of the IEEE/CVF International Conference on Computer Vision. Washington, USA: IEEE, 2019: 603-612. [21] ZHANG L, LI X T, ARNAB A, et al. Dual Graph Convolutional Network for Semantic Segmentation[C/OL].[2021-07-20]. https://arxiv.org/pdf/1909.06121v2.pdf. [22] LIU Z Y, TANG J T, XIANG Q, et al. Salient Object Detection for RGB-D Images by Generative Adversarial Network. Multimedia Tools and Applications, 2020, 79: 25403-25425. [23] GU Y C, WANG L J, WANG Z Q, et al. Pyramid Constrained Self-Attention Network for Fast Video Salient Object Detection // Proc of the 34th AAAI Conference on Artificial Intelligence. Palo Alto, USA: AAAI, 2020: 10869-10876. [24] ZHANG J, FAN D P, DAI Y C, et al. UC-Net: Uncertainty Inspired RGB-D Saliency Detection via Conditional Variational Autoencoders // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2020: 8579-8588. [25] ZHANG M, REN W S, PIAO Y R, et al. Select, Supplement and Focus for RGB-D Saliency Detection // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2020: 3469-3478. [26] HE K M, ZHANG X Y, REN S Q, et al. Deep Residual Learning for Image Recognition // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2016: 770-778. [27] WEI J, WANG S H, WU Z, et al. Label Decoupling Framework for Salient Object Detection // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2020: 13022-13031. [28] ZHAO J X, CAO Y, FAN D P, et al. Contrast Prior and Fluid Pyramid Integration for RGBD Salient Object Detection // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2019: 3922-3931. [29] JU R, GE L, GENG W J, et al. Depth Saliency Based on Anisotropic Center-Surround Difference // Proc of the IEEE International Conference on Image Processing. Washington, USA: IEEE, 2014: 1115-1119. [30] PENG H W, LI B, XIONG W H, et al. RGBD Salient Object Detection: A Benchmark and Algorithms // Proc of the European Conference on Computer Vision. Berlin, Germany: Springer, 2014: 92-109. [31] NIU Y Z, GENG Y J, LI X Q, et al. Leveraging Stereopsis for Saliency Analysis // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2012: 454-461. [32] FAN D P, LIN Z, ZHANG Z, et al. Rethinking RGB-D Salient Object Detection: Models, Data Sets, and Large-Scale Benchmarks. IEEE Transactions on Neural Networks and Learning Systems, 2021, 32(5): 2075-2089. [33] BORJI A, CHENG M M, JIANG H Z, et al. Salient Object Detection: A Benchmark. IEEE Transactions on Image Processing, 2015, 24(12): 5706-5722. [34] FAN D P, CHENG M M, LIU Y, et al. Structure-Measure: A New Way to Evaluate Foreground Maps // Proc of the IEEE International Conference on Computer Vision. Washington, USA: IEEE, 2017: 4558-4567. [35] ACHANTA R, HEMAMI S, ESTRADA F, et al. Frequency-Tuned Salient Region Detection // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2009: 1597-1604. [36] FAN D P, GONG C, CAO Y, et al. Enhanced-Alignment Measure for Binary Foreground Map Evaluation // Proc of the 27th International Joint Conference on Artificial Intelligence. San Francisco, USA: IJCAI, 2018: 698-704. [37] PERAZZI F, KRÄHENBÜHL P, PRITH Y, et al. Saliency Filters: Contrast Based Filtering for Salient Region Detection // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2012: 733-740. [38] DENG J, DONG W, SOCHER R, et al. ImageNet: A Large-Scale Hierarchical Image Database // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2009: 248-255. [39] PIAO Y R, RONG Z K, ZHANG M, et al. A2dele: Adaptive and Attentive Depth Distiller for Efficient RGB-D Salient Object Detection // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2020: 9057-9066. [40] LI C Y, CONG R M, PIAO Y R, et al. RGB-D Salient Object De-tection with Cross-Modality Modulation and Selection // Proc of the European Conference on Computer Vision. Berlin, Germany: Springer, 2020: 225-241. [41] ZHAO X Q, ZHANG L H, PANG Y W, et al. A Single Stream Network for Robust and Real-Time RGB-D Salient Object Detection // Proc of the European Conference on Computer Vision. Berlin, Germany: Springer, 2020: 646-662.