Deep Snake with 2D-Circular Convolution and Difficulty Sensitive Contour-IoU Loss
LI Hao1, YUAN Guanglin1, LI Congli2, QIN Xiaoyan1, ZHU Hong1
1. Department of Information Engineering, Army Academy of Artillery and Air Defense of People's Liberation Army of China, Hefei 230031 2. Department of Ordnance Engineering, Army Academy of Artillery and Air Defense of People's Liberation Army of China, Hefei 230031
摘要 Deep Snake端到端地变形初始目标框到目标轮廓,能提升实例分割的性能,但存在对初始目标框敏感和轮廓参数独立回归的问题.因此文中提出基于2D循环卷积和难度敏感轮廓交并比损失的Deep Snake.首先,基于轮廓的空间上下文信息设计2D循环卷积,解决对初始目标框敏感的问题.然后,基于定积分的几何意义与样本难易度提出难度敏感轮廓交并比损失函数,将轮廓参数进行整体回归.最后,利用2D循环卷积和难度敏感轮廓交并比损失函数完成实例分割.在Cityscapes、Kins、Sbd数据集上的实验证明文中方法的实例分割精度较优.
Abstract:The initial bounding box is deformed to the object contour end-to-end by Deep Snake, and the performance of instance segmentation is significantly improved. However, the problems of sensitivity to the initial bounding box and independent regression of contour parameters emerge. To address these issues, Deep Snake with 2D-circular convolution and difficulty sensitive intersection over union(contour-IoU) loss is proposed. Firstly, 2D-circular convolution is designed based on the spatial context information of the contour to solve the problem of sensitivity to the initial bounding box. Secondly, difficulty sensitive contour-IoU loss function is proposed according to the geometric meaning of the definite integral and the difficulty of the sample to regress the contour parameters as a whole unit. Finally, instance segmentation is accomplished by the proposed 2D-circular convolution and difficulty sensitive contour-IoU loss function. Experiments on Cityscapes, Kins and Sbd datasets show that the proposed method achieves better segmentation accuracy.
[1] LI Y, QI H Z, DAI J F, et al. Fully Convolutional Instance-Aware Semantic Segmentation // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2017: 4438-4446. [2] HE K M, GKIOXARI G, DOLLAR P, et al. Mask R-CNN // Proc of the IEEE International Conference on Computer Vision. Washington, USA: IEEE, 2017: 2980-2988. [3] LIU S, QI L, QIN H F, et al. Path Aggregation Network for Instance Segmentation // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2018: 8759-8768. [4] ZHANG H, TIAN Y L, WANG K F, et al. Mask SSD: An Effective Single-Stage Approach to Object Instance Segmentation. IEEE Transactions on Image Processing, 2020, 29(1): 2078-2093. [5] DAI J F, HE K M, LI Y, et al. Instance-Sensitive Fully Convolutional Networks // Proc of the European Conference on Computer Vision. Berlin, Germany: Springer, 2016: 534-549. [6] LIANG X D, LIN L, WEI Y C, et al. Proposal-Free Network for Instance-Level Object Segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 40(12): 2978-2991. [7] LIU Y D, YANG S Y, LI B, et al. Affinity Derivation and Graph Merge for Instance Segmentation // Proc of the European Conference on Computer Vision. Berlin, Germany: Springer, 2018: 708-724. [8] GAO N Y, SHAN Y H, WANG Y P, et al. SSAP: Single-Shot Instance Segmentation with Affinity Pyramid. IEEE Transactions on Circuits and Systems for Video Technology, 2021, 31(2): 661-673. [9] NEVEN D, DE BRABANDERE B, PROESMANS M, et al. Instance Segmentation by Jointly Optimizing Spatial Embeddings and Clustering Bandwidth // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2019: 8829-8837. [10] CHEN X L, GIRSHICK R, HE K M, et al. TensorMask: A Foundation for Dense Object Segmentation // Proc of the IEEE/CVF International Conference on Computer Vision. Washington, USA: IEEE, 2019: 2061-2069. [11] KASS M, WITKIN A, TERZOPOULOS D. Snakes: Active Contour Models. International Journal of Computer Vision, 1988, 1(4): 321-331. [12] 周昌雄,于盛林,吴 陈,等.基于先验知识和区域信息的Snake模型图像分割研究. 模式识别与人工智能, 2006, 19(2): 257-261. (ZHOU C X, YU S L, WU C, et al. Research on Image Segmentation Based on Snake Model of Previous Knowledge and Region Information. Pattern Recognition and Artificial Intelligence, 2006, 19(2): 257-261.) [13] 李春明,李玉山,张大朴,等.基于PCA/Snake混合模型的运动目标外轮廓求解.模式识别与人工智能, 2007, 20(3): 313-318. (LI C M, LI Y S, ZHANG D P, et al. Contour Extraction of Mo-ving Objects Based on PCA/Snake Mixture Model. Pattern Recognition and Artificial Intelligence, 2007, 20(3): 313-318.) [14] 李 敏,梁久祯,廖翠萃,等.基于聚类信息的活动轮廓图像分割模型.模式识别与人工智能, 2015, 28(7): 665-672. (LI M, LIANG J Z, LIAO C C, et al. Active Contour Model for Image Segmentation Based on Clustering Information. Pattern Re-cognition and Artificial Intelligence, 2015, 28(7): 665-672.) [15] JETLEY S, SAPIENZA M, GOLODETZ S, et al. Straight to Shapes: Real-Time Detection of Encoded Shapes // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2017: 4207-4216. [16] ZHANG L S, BAI M, LIAO R J, et al. Learning Deep Structured Active Contours End-to-End // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2018: 8877-8885. [17] YANG Z, XU Y H, XUE H, et al. Dense RepPoints: Representing Visual Objects with Dense Point Sets[C/OL]. [2021-06-26]. https://export.arxiv.org/pdf/1912.11473. [18] XIE E Z, SUN P E, SONG X G, et al. PolarMask: Single Shot Instance Segmentation with Polar Representation // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2020: 12190-12199. [19] PENG S D, JIANG W, PI H J, et al. Deep Snake for Real-Time Instance Segmentation // Proc of the IEEE Conference on Compu-ter Vision and Pattern Recognition. Washington, USA: IEEE, 2020: 8533-8542. [20] REN S Q, HE K M, GIRSHICK R, et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137-1149. [21] LING H, GAO J, KAR A, et al. Fast Interactive Object Annotation with Curve-GCN // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2019: 5252-5261. [22] YU J H, JIANG Y N, WANG Z Y, et al. UnitBox: An Advanced Object Detection Network // Proc of the 24th ACM International Conference on Multimedia. New York, USA: ACM, 2016: 516-520. [23] FREUND Y, SCHAPIRE R E. A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting. Journal of Computer and System Sciences, 1997, 55(1): 119-139. [24] ZHOU X Y, WANG D Q, KRÜHENBÜHL P. Objects as Points[C/OL]. [2021-06-26]. https://arxiv.org/pdf/1904.07850.pdf. [25] CORDTS M, OMRAN M, RAMOS S, et al. The Cityscapes Dataset for Semantic Urban Scene Understanding // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2016: 3213-3223. [26] QI L, JIANG L, LIU S, et al. Amodal Instance Segmentation with Kins Dataset // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2019: 3009-3018. [27] HARIHARAN B, ARBELÁEZ P, BOURDEV L, et al. Semantic Contours from Inverse Detectors // Proc of the International Confe-rence on Computer Vision. Washington, USA: IEEE, 2011: 991-998. [28] LIU S, JIA J Y, FIDLER S, et al. SGN: Sequential Grouping Networks for Instance Segmentation // Proc of the IEEE International Conference on Computer Vision. Washington, USA: IEEE, 2017: 3516-3524. [29] ACUNA D, LING H, KAR A, et al. Efficient Interactive Annotation of Segmentation Datasets with Polygon-RNN++ // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2018: 859-868. [30] DAI J F, HE K M, SUN J. Instance-Aware Semantic Segmentation via Multi-task Network Cascades // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2016: 3150-3158. [31] FOLLMANN P, KONIG R, HÄRTINGER P, et al. Learning to See the Invisible: End-to-End Trainable Amodal Instance Segmentation // Proc of the IEEE Winter Conference on Applications of Computer Vision. Washington, USA: IEEE, 2019: 1328-1336. [32] XU W Q, WANG H Y, QI F B, et al. Explicit Shape Encoding for Real-Time Instance Segmentation // Proc of the IEEE/CVF International Conference on Computer Vision. Washington, USA: IEEE, 2019: 5168-5177.