Instance-Level Sketch-Based Image Retrieval Based on Two Stream Multi-granularity Local Alignment Network
HAN Xuekun1,2, MIAO Duoqian1,2, ZHANG Hongyun1,2, ZHANG Qixian1,2
1. College of Electronic and Information Engineering, Tongji University, Shanghai 201804; 2. Key Laboratory of Embedded System and Service Computing, Ministry of Education, Tongji University, Shanghai 201804
Abstract:The goal of instance-level sketch-based image retrieval is to retrieve images by sketches. There is a significant modality gap and feature misalignment issue between sketches and images. In the existing methods, the modality gap between sketches and images cannot be effectively reduced, and only information at a single granularity is captured. Thus, features cannot be aligned effectively. To address these issues, a two stream multi-granularity local alignment network(TSMLA) is proposed. A two-stream feature extractor is introduced to extract both modality-shared and modality-specific local features. These features are simultaneously utilized to calculate the distance between the sketch and the image and reduce the differences between different modalities. Moreover, a multi-granularity local alignment module is adopted to pool the distance matrix at various granularities. Local features are aligned at different scales to effectively address the problem of feature misalignment. TSMLA can fully utilize the information of sketches and real images, while effectively utilizing the connections between features of different granularities. Experiments on multiple datasets validate the effectiveness of TSMLA.
[1] XU P, SONG Z Y, YIN Q Y, et al. Deep Self-Supervised Representation Learning for Free-Hand Sketch. IEEE Transactions on Circuits and Systems for Video Technology, 2021, 31(4): 1503-1513. [2] RIBEIRO L S F, BUI T, COLLOMOSSE J, et al. Sketchformer: Transformer-Based Representation for Sketched Structure// Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2020: 14141-14150. [3] QI Y G, SU G Y, CHOWDHURY P N, et al. SketchLattice: La-tticed Representation for Sketch Manipulation// Proc of the IEEE/CVF International Conference on Computer Vision. Washington, USA: IEEE, 2021: 933-941. [4] CAO N, YAN X, SHI Y, et al. AI-Sketcher: A Deep Generative Model for Producing High-Quality Sketches. Proceedings of the AAAI Conference on Artificial Intelligence, 2019, 33(1): 2564-2571. [5] BHUNIA A K, DAS A, MUHAMMAD U R, et al. Pixelor: A Competitive Sketching AI Agent. So You Think You Can Sketch? ACM Transactions on Graphics, 2020, 39(6). DOI: 10.1145/3414685.3417840. [6] BHUNIA A K, KHAN S, CHOLAKKAL H, et al. DoodleFormer: Creative Sketch Drawing with Transformers// Proc of the 17th European Conference on Computer Vision. Berlin, Germany: Springer, 2022: 338-355. [7] SHEN Y M, LIU L, SHEN F M, et al. Zero-Shot Sketch-Image Hashing// Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2018: 3598-3607. [8] YU Q, LIU F, SONG Y Z, et al. Sketch Me That Shoe// Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2016: 799-807. [9] SONG J F, YU Q, SONG Y Z, et al. Deep Spatial-Semantic Attention for Fine-Grained Sketch-Based Image Retrieval// Proc of the IEEE International Conference on Computer Vision. Washington, USA: IEEE, 2017: 5552-5561. [10] LIN H Y, FU Y W, LU P, et al. TC-Net for iSBIR: Triplet Cla-ssification Network for Instance-Level Sketch Based Image Retrieval// Proc of the 27th ACM International Conference on Multimedia. New York, USA: ACM, 2019: 1676-1684. [11] XU J Q, SUN H F, QI Q, et al. DLA-Net for FG-SBIR: Dynamic Local Aligned Network for Fine-Grained Sketch-Based Image Retrieval// Proc of the 29th ACM International Conference on Multimedia. New York, USA: ACM, 2021: 5609-5618. [12] SUN H F, XU J Q, WANG J Y, et al. DLI-Net: Dual Local Interaction Network for Fine-Grained Sketch-Based Image Retrieval. IEEE Transactions on Circuits and Systems for Video Technology, 2022, 32(10): 7177-7189. [13] LING Z X, XING Z, LI J T, et al. Multi-level Region Matching for Fine-Grained Sketch-Based Image Retrieval// Proc of the 30th ACM International Conference on Multimedia. New York, USA: ACM, 2022: 462-470. [14] BHUNIA A K, KOLEY S, KHILJI A F U R, et al. Sketching without Worrying: Noise-Tolerant Sketch-Based Image Retrieval // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2022: 989-998. [15] LING Z X, XING Z, ZHOU J, et al. Conditional Stroke Recovery for Fine-Grained Sketch-Based Image Retrieval// Proc of the 17th European Conference on Computer Vision. Berlin, Germany: Springer, 2022: 722-738. [16] PANG K Y, YANG Y X, HOSPEDALES T M, et al. Solving Mixed-Modal Jigsaw Puzzle for Fine-Grained Sketch-Based Image Retrieval// Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2020: 10344-10352. [17] BHUNIA A K, CHOWDHURY P N, SAIN A. More Photos Are All You Need: Semi-Supervised Learning for Fine-Grained Sketch Based Image Retrieval// Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2021: 4245-4254. [18] YE M, LAN X Y, LI J W, et al. Hierarchical Discriminative Learning for Visible Thermal Person Re-identification// Proc of the 32nd AAAI Conference on Artificial Intelligence and 30th Innovative Applications of Artificial Intelligence Conference and 8th AAAI Symposium on Educational Advances in Artificial Intelligence. Palo Alto, USA: AAAI, 2018: 7501-7508. [19] YE M, WANG Z, LAN X Y, et al. Visible Thermal Person Re-identification via Dual-Constrained Top-Ranking// Proc of the 27th International Joint Conference on Artificial Intelligence. Palo Alto, USA: AAAI, 2018: 1092-1099. [20] ZHANG S Z, YANG Y F, WANG P, et al. Attend to the Difference: Cross-Modality Person Re-identification via Contrastive Correlation. IEEE Transactions on Image Processing, 2021, 30: 8861-8872. [21] LU Y, WU Y, LIU B. Cross-Modality Person Re-identification with Shared-Specific Feature Transfer// Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2020: 13376-13386. [22] SANGKLOY P, BURNELL N, HAM C, et al. The Sketchy Data-base: Learning to Retrieve Badly Drawn Bunnies. ACM Transactions on Graphics, 2016, 35(4). DOI: 10.1145/2897824.2925954.