Hash Image Retrieval Based on Category Similarity Feature Expansion and Center Triplet Loss
PAN Lili1, MA Junyong1, XIONG Siyu1, DENG Zhimao1, HU Qinghua2
1. College of Computer and Information Engineering, Central South University of Forestry and Technology, Changsha 410004; 2. College of Intelligence and Computing, Tianjin University, Tianjin 300350
摘要 现有的深度哈希图像检索方法主要采用卷积神经网络,提取的深度特征的相似性表征能力不足.此外,三元组深度哈希主要从小批量数据中构建局部三元组样本,样本数量较少,数据分布缺失全局性,使网络训练不够充分且收敛困难.针对上述问题,文中提出基于类相似特征扩充与中心三元组损失的哈希图像检索模型(Hash Image Retrieval Based on Category Similarity Feature Expansion and Center Triplet Loss, HRFT-Net).设计基于Vision Transformer的哈希特征提取模块(Hash Feature Extraction Module Based on Vision Transformer, HViT),利用Vision Transformer提取表征能力更强的全局特征信息.为了扩充小批量训练样本的数据量,提出基于类约束的相似特征扩充模块(Similar Feature Expansion Based on Category Constraint, SFEC),利用同类样本间的相似性生成新特征,丰富三元组训练样本.为了增强三元组损失的全局性,提出基于Hadamard的中心三元组损失函数(Central Triplet Loss Function Based on Hadamard, CTLH),利用Hadamard为每个类建立全局哈希中心约束,通过增添局部约束与全局中心约束的中心三元组加速网络的学习和收敛,提高图像检索的精度.在CIFAR10、NUS-WIDE数据集上的实验表明,HRFT-Net在不同长度比特位哈希码检索上的平均精度均值较优,由此验证HRFT-Net的有效性.
Abstract:Convolutional neural networks are commonly employed in the existing deep hashing image retrieval methods. The similarity representation of the deep features extracted by convolutional neural networks is insufficient. In addition, the local triplet samples are mainly constructed for triplet deep hashing from the small batch data, the size of the local triplet samples is small and the data distribution is lack of globality. Consequently, the network training is insufficient and the convergence is difficult. To address these issues, a model of hash image retrieval based on category similarity feature expansion and center triplet loss is proposed. A hash feature extraction module based on vision transformer is designed to extract global feature information with stronger representation ability. To expand the size of mini-batch training samples, a similar feature expansion module based on category constraint is put forward. New feature is generated by the similarity among samples of the same category to enrich the triplet training samples. To enhance the global ability of triplet loss, a center triplet loss function based on Hadamard(CTLH) is constructed. Hadamard is utilized to establish the global hash center constraint for each class. With CLTH, the learning and the convergence of the network are accelerated by adding the center triplet of local constraint and global center constraint, and the accuracy of image retrieval is improved. Experiments on CIFAR10 and NUS-WIDE datasets show that HRFT-Net gains better mean average precision for image retrieval with different bit lengths of hash code, and the effectiveness of HRFT-Net is demonstrated.
潘丽丽, 马俊勇, 熊思宇, 邓智茂, 胡清华. 基于类相似特征扩充与中心三元组损失的哈希图像检索[J]. 模式识别与人工智能, 2023, 36(8): 685-700.
PAN Lili, MA Junyong, XIONG Siyu, DENG Zhimao, HU Qinghua. Hash Image Retrieval Based on Category Similarity Feature Expansion and Center Triplet Loss. Pattern Recognition and Artificial Intelligence, 2023, 36(8): 685-700.
[1] SUDHA S K, AJI S. An Analysis on Deep Learning Approaches: Addressing the Challenges in Remote Sensing Image Retrieval. International Journal of Remote Sensing, 2021, 42(24): 9405-9441. [2] 王苑铮,范意兴,陈薇,等.稠密向量实体检索模型的二值化提速压缩.模式识别与人工智能, 2023, 36(1): 60-69. (WANG Y Z, FAN Y X, CHEN W, et al. Binary Acceleration and Compression for Dense Vector Entity Retrieval Models, Pattern Re-cognition and Artificial Intelligence, 2023, 36(1): 60-69.) [3] MAGLIANI F, PRATI A. LSH kNN Graph for Diffusion on Image Retrieval. Information Retrieval Journal, 2021, 24(2): 114-136. [4] ZHOU X Y, MENG B W, DONG Y H, et al. An Efficient Image-Based Indoor Positioning Approach Using ORB and LSH // Proc of the China Automation Congress. Washington, USA: IEEE, 2021: 2985-2989. [5] HUANG Z Q, TANG Z J, ZHANG X Q, et al. Perceptual Image Hashing with Locality Preserving Projection for Copy Detection. IEEE Transactions on Dependable and Secure Computing, 2023, 20(1): 463-477. [6] TANG Z J, CHEN L, ZHANG X Q, et al. Robust Image Hashing with Tensor Decomposition. IEEE Transactions on Knowledge and Data Engineering, 2019, 31(3): 549-560. [7] LOWE D G. Distinctive Image Features from Scale-Invariant Keypoints. International Journal of Computer Vision, 2004, 60(2): 91-110. [8] LIN Q H, CHEN X J, ZHANG Q, et al. Deep Self-Adaptive Ha-shing for Image Retrieval// Proc of the 30th ACM International Conference on Information and Knowledge Management. New York, USA: ACM, 2021: 1028-1037. [9] JOSE A, FILBERT D, ROHLFING C, et al. Deep Hashing with Hash Center Update for Efficient Image Retrieval// Proc of the IEEE International Conference on Acoustics, Speech and Signal Processing. Washington, USA: IEEE, 2022: 4773-4777. [10] HE K M, ZHANG X Y, REN S Q, et al. Deep Residual Learning for Image Recognition// Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2016: 770-778. [11] ZHU H, LONG M S, WANG J M, et al. Deep Hashing Network for Efficient Similarity Retrieval. Proceedings of the AAAI Conference on Artificial Intelligence, 2016, 30(1): 2415-2421. [12] CAO Z J, LONG M S, WANG J M, et al. HashNet: Deep Lear-ning to Hash by Continuation// Proc of the IEEE International Conference on Computer Vision. Washington, USA: IEEE, 2017: 5609-5618. [13] SU S P, ZHANG C, HAN K, et al. Greedy Hash: Towards Fast Optimization for Accurate Hash Coding in CNN // Proc of the 32nd International Conference on Neural Information Processing Systems. Cambridge, USA: MIT Press, 2018: 806-815. [14] 林计文,刘华文,郑忠龙.面向图像检索的深度汉明嵌入哈希.模式识别与人工智能, 2020, 33(6): 542-550. (LIN J W, LIU H W, ZHENG Z L. Deep Hamming Embedding Based Hashing for Image Retrieval. Pattern Recognition and Artificial Intelligence, 2020, 33(6): 542-550.) [15] LI Y Q, PEI W J, ZHA Y F, et al. Push for Quantization: Deep Fisher Hashing[C/OL].[2023-05-22]. https://arxiv.org/pdf/1909.00206.pdf. [16] 冯浩,王年,唐俊.面向大规模图像检索的深度多尺度注意力哈希网络.华南理工大学学报(自然科学版), 2022, 50(4): 35-45. (FENG H, WANG N, TANG J. Deep Multi-scale Attention Ha-shing Network for Large-Scale Image Retrieval. Journal of South China University of Technology(Natural Science Edition), 2022, 50(4): 35-45.) [17] 张志升,曲怀敬,徐佳,等.稀疏差分网络和多监督哈希用于高效图像检索.计算机应用研究, 2022, 39(7): 2217-2223. (ZHANG Z S, QU H J, XU J, et al. Sparse Difference Network and Multi-supervised Hashing for Efficient Image Retrieval. Application Research of Computers, 2022, 39(7): 2217-2223.) [18] LUO X, WANG H X, WU D Q, et al. A Survey on Deep Hashing Methods. ACM Transactions on Knowledge Discovery from Data, 2023, 17(1). DOI: 10.1145/3532624. [19] LIANG Y C, PAN Y, LAI H J, et al. Deep Listwise Triplet Ha-shing for Fine-Grained Image Retrieval. IEEE Transactions on Image Processing, 2021, 31: 949-961. [20] SINGH A, GUPTA S. Learning to Hash: A Comprehensive Survey of Deep Learning-Based Hashing Methods. Knowledge and Information Systems, 2022, 64(10): 2565-2597. [21] FANG J S, FU H Z, LIU J. Deep Triplet Hashing Network for Case-Based Medical Image Retrieval. Medical Image Analysis, 2021, 69. DOI: 10.1016/j.media.2021.101981. [22] SCHROFF F, KALENICHENKO D, PHILBIN J. FaceNet: A Unified Embedding for Face Recognition and Clustering// Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2015: 815-823. [23] LU Z Y, LU Y G. A Balanced Triplet Loss for Person Re-identification. International Journal of Pattern Recognition and Artificial Intelligence, 2023, 37(1): 75-84. [24] 郑大刚,刘光杰,茅耀斌,等.基于三元组损失函数的深度人脸哈希方法.太赫兹科学与电子信息学报, 2021, 19(2): 313-318. (ZHENG D G, LIU G J, MAO Y B, et al. Deep Face Hashing Based on Ternary-Group Loss Function. Journal of Terahertz Science and Electronic Information Technology, 2021, 19(2): 313-318.) [25] ZHAO C R, LÜ X B, ZHANG Z, et al. Deep Fusion Feature Re-presentation Learning with Hard Mining Center-Triplet Loss for Person Re-identification. IEEE Transactions on Multimedia, 2020, 22(12): 3180-3195. [26] ZHUANG B H, LIN G S, SHEN C H, et al. Fast Training of Triplet-Based Deep Binary Embedding Networks// Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2016: 5955-5964. [27] LAI H J, CHEN J K, GENG L B, et al. Improving Deep Binary Embedding Networks by Order-Aware Reweighting of Triplets. IEEE Transactions on Circuits and Systems for Video Technology, 2020, 30(4): 1162-1172. [28] CAO Y, LONG M S, LIU B, et al. Deep Cauchy Hashing for Hamming Space Retrieval// Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2018: 1229-1237. [29] WANG X, HAN X T, HUANG W L, et al. Multi-similarity Loss with General Pair Weighting for Deep Metric Learning// Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2019: 5017-5025. [30] VASWANI A, SHAZEER N, PARMAR N, et al. Attention Is All You Need // Proc of the 31st International Conference on Neural Information Processing Systems. Cambridge, USA: MIT Press, 2017: 6000-6010. [31] DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An Image Is Worth 16×16 Words: Transformers for Image Recognition at Scale[C/OL].[2023-05-22]. https://arxiv.org/pdf/2010.11929.pdf. [32] CARION N, MASSA F, SYNNAEVE G, et al. End-to-End Object Detection with Transformers// Proc of the 16th European Confe-rence on Computer Vision. Berlin, Germany: Springer, 2020: 213-229. [33] CHENG B W, SCHWING A G, KIRILLOV A.Per-Pixel Classification Is Not All You Need for Semantic Segmentation[C/OL]. [2023-05-22].https://ar5iv.labs.arxiv.org/html/2107.06278. [34] ZHANG Q, YANG Q F, ZHANG X J, et al. Waste Image Classification Based on Transfer Learning and Convolutional Neural Network. Waste Management, 2021, 135: 150-157. [35] CHEN M, RADFORD A, CHILD R, et al. Generative Pretraining from Pixels// Proc of the 37th International Conference on Machine Learning. San Diego, USA: JMLR, 2020: 1691-1703. [36] 薛峰,洪自坤,李书杰,等.基于Vision Transformer的中文唇语识别.模式识别与人工智能, 2022, 35(12): 1111-1121. (XUE F, HONG Z K, LI S J, et al. Chinese Lipreading Network Based on Vision Transformer. Pattern Recognition and Artificial Intelligence, 2022, 35(12): 1111-1121.) [37] CHEN Y B, ZHANG S, LIU F X, et al. TransHash: Transformer-Based Hamming Hashing for Efficient Image Retrieval// Proc of the International Conference on Multimedia Retrieval. New York, USA: ACM, 2022: 127-136. [38] LI T, ZHANG Z, PEI L S, et al. HashFormer: Vision Transformer Based Deep Hashing for Image Retrieval. IEEE Signal Processing Letters, 2022, 29: 827-831. [39] HAN K, WANG Y H, CHEN H T, et al. A Survey on Vision Transformer. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(1): 87-110. [40] TURCOT P, LOWE D G. Better Matching with Fewer Features: The Selection of Useful Features in Large Database Recognition Problems// Proc of the IEEE 12th International Conference on Computer Vision Workshops. Washington, USA: IEEE, 2009: 2109-2116. [41] KO B, GU G. Embedding Expansion: Augmentation in Embedding Space for Deep Metric Learning// Proc of the IEEE/CVF Confe-rence on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2020: 7255-7264. [42] CHUM O, MIKULIK A, PERDOCH M, et al. Total Recall II: Query Expansion Revisited// Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2011: 889-896. [43] KRIZHEVSKY A. Learning Multiple Layers of Features from Tiny Images[C/OL]. [2023-05-22]. https://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=0D60E5DD558A91470E0EA1725FF36E0A?doi=10.1.1.222.9220&rep=rep1&type=pdf. [44] CHUA T S, TANG J H, HONG R C, et al. NUS-WIDE: A Real-World Web Image Database from National University of Singapore// Proc of the ACM International Conference on Image and Video Retrieval. New York, USA: ACM, 2009. DOI: 10.1145/1646396.1646452. [45] FAN L X, NG K W, JU C, et al. Deep Polarized Network for Supervised Learning of Accurate Binary Hashing Codes// Proc of the 29th International Joint Conference on Artificial Intelligence. New York, USA: ACM, 2021: 825-831. [46] LIU H M, WANG R P, SHAN S G, et al. Deep Supervised Ha-shing for Fast Image Retrieval// Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2016: 2064-2072. [47] ZHANG Z, ZOU Q, LIN Y W, et al. Improved Deep Hashing with Soft Pairwise Similarity for Multi-label Image Retrieval. IEEE Transactions on Multimedia, 2020, 22(2): 540-553. [48] YUAN L, WANG T, ZHANG X P, et al. Central Similarity Quan-tization for Efficient Image and Video Retrieval// Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2020: 3080-3089. [49] WANG X F, SHI Y, KITANI K M. Deep Supervised Hashing with Triplet Labels// Proc of the Asian Conference on Computer Vision. Berlin, Germany: Springer, 2016: 70-84. [50] RUSSAKOVSKY O, DENG J, SU H, et al. ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision, 2015, 115: 211-252. [51] LIN T Y, MAIRE M, BELONGIE S, et al. Microsoft COCO: Common Objects in Context// Proc of the European Conference on Computer Vision. Berlin, Germany: Springer, 2014: 740-755.