摘要 非约束环境下人脸图像具有背景复杂、尺度分布广泛等特点,当前检测器在标签分配和特征提取方面分别存在人脸匹配锚点数量不均衡和卷积核增长视野受限的问题,导致网络难以进行细粒度优化.针对上述问题,文中提出基于锚点损失优化的细粒度人脸检测方法(Fine-Grained Face Detection Method Based on Anchor Loss Optimization, FALO).首先,分析人脸匹配锚点数量与损失的关系,提出锚点损失优化算法,细粒度地调整训练中分类与定位损失.然后,设计上下文特征融合模块,在背景中有效提取多尺度特征.最后,综合考虑卷积神经网络和自注意力机制,构造自注意力辅助分支,补充检测器感受野的同时提高对不同纵横比人脸的注意力.在多个数据集上的实验表明,FALO可兼顾实时计算效率和高精度检测,在困难样本挖掘中具有一定优势.
Abstract:In unconstrained environments, face images exhibit the characteristics of complex backgrounds and varying scales. Current face detectors suffer from an imbalanced number of anchors matched to the faces in label assignment and the receptive field growth limited by convolutional kernels in feature extraction. These issues lead to the difficulty of fine-grained optimization of the network. To address these issues, a fine-grained face detection method based on anchor loss optimization(FALO) is proposed. First, the relationship between the number of anchors matched to the faces and the loss is analyzed, and an anchor loss optimization algorithm is introduced to fine-tune the classification and localization loss during training. Second, a context feature fusion module is designed to effectively extract multi-scale features from the background. Finally, convolutional neural networks and self-attention mechanisms are considered comprehensively, and a self-attention auxiliary branch is constructed to supplement the receptive field of the detector and improve the attention to faces with different aspect ratios. Experiments on multiple datasets demonstrate that FALO achieves both real-time computational efficiency and high-precision detection, and it exhibits certain advantages in hard sample mining.
刘家龙, 李光辉, 代成龙. 基于锚点损失优化的细粒度人脸检测方法[J]. 模式识别与人工智能, 2025, 38(5): 457-471.
LIU Jialong, LI Guanghui, DAI Chenglong. Fine-Grained Face Detection Method Based on Anchor Loss Optimization. Pattern Recognition and Artificial Intelligence, 2025, 38(5): 457-471.
[1] 苗争鸣,尹西明,陈劲.美国国家生物安全治理与中国启示:以美国生物识别体系为例.科学学与科学技术管理, 2020, 41(4): 3-18. (MIAO Z M, YIN X M, CHEN J.American National Biosafety Governance and Its Enlightenment to China: Based on the Study of the U.S. National Biometric System. Science of Science and Mana-gement of S.&T., 2020, 41(4): 3-18.) [2] MELZI P, RATHGEB C, TOLOSANA R, et al. An Overview of Privacy-Enhancing Technologies in Biometric Recognition. ACM Computing Surveys, 2024, 56(12). DOI: 10.1145/366459. [3] YU Z P, HUANG H B, CHEN W J, et al. YOLO-FaceV2: A Scale and Occlusion Aware Face Detector. Pattern Recognition, 2024, 155. DOI: 10.1016/j.patcog.2024.110714. [4] HOWARD A G, ZHU M L, CHEN B, et al. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications[C/OL].[2025-03-26]. https://arxiv.org/abs/1704.04861. [5] HE K M, ZHANG X Y, REN S Q, et al. Deep Residual Learning for Image Recognition // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2016: 770-778. [6] WU W, PENG H Y, YU S Q.YuNet: A Tiny Millisecond-Level Face Detector. Machine Intelligence Research, 2023, 20(5): 656-665. [7] LIU W, ANGUELOV D, ERHAN D, et al. SSD: Single Shot Multi-box Detector // Proc of the European Conference on Computer Vision. Berlin, Germany: Springer, 2016: 21-37. [8] TIWARI A, MANZOOR S, SEHGAL J, et al. A Comprehensive Review of Face Detection Technologies // Proc of the 2nd International Conference on Advances in Information Technology. Washington, USA: IEEE, 2024. DOI: 10.1109/ICAIT61638.2024.10690719. [9] LI S, LI M H, LI R H, et al. One-to-Few Label Assignment for End-to-End Dense Detection // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2023: 7350-7359. [10] YANG S, LUO P, LOY C C, et al. WIDER FACE: A Face Detection Benchmark // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2016: 5525-5533. [11] MA M C, XIA C Q, XIE C X, et al. Boosting Broader Receptive Fields for Salient Object Detection. IEEE Transactions on Image Processing, 2023, 32: 1026-1038. [12] ZOU Z X, CHEN K Y, SHI Z W, et al. Object Detection in 20 Years: A Survey. Proceedings of the IEEE, 2023, 111(3): 257-276. [13] DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An Image Is Worth 16×16 Words: Transformers for Image Recognition at Scale[C/OL].[2025-03-26]. https://arxiv.org/abs/2010.11929. [14] WANG A, CHEN H, LIU L H, et al. YOLOv10: Real-Time End-to-End Object Detection[C/OL].[2025-03-26]. https://arxiv.org/pdf/2405.14458. [15] LIU W, HASAN I, LIAO S C.Center and Scale Prediction: Anchor-Free Approach for Pedestrian and Face Detection. Pattern Recognition, 2023, 135. DOI: 10.1016/j.patcog.2022.109071. [16] ZHU Y J, CAI H X, ZHANG S H, et al. TinaFace: Strong But Simple Baseline for Face Detection[C/OL].[2025-03-26]. https://arxiv.org/abs/2011.13183v3. [17] LI J, WANG Y B, WANG C A, et al. DSFD: Dual Shot Face Detector // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2019: 5055-5064. [18] TANG X, DU D K, HE Z Q, et al. PyramidBox: A Context-Assisted Single Shot Face Detector // Proc of the European Conference on Computer Vision. Berlin, Germany: Springer, 2018: 812-828. [19] ZHANG S F, ZHU X Y, LEI Z, et al. S3FD: Single Shot Scale-Invariant Face Detector // Proc of the IEEE International Confe-rence on Computer Vision. Washington, USA: IEEE, 2017: 192-201. [20] YASHUNIN D, BAYDASOV T, VLASOV R.MaskFace: Multi-task Face and Landmark Detector[C/OL]. [2025-03-26].https://arxiv.org/abs/2005.09412. [21] LIU Y, WANG F, DENG J K, et al. MogFace: Towards a Deeper Appreciation on Face Detection // Proc of the IEEE/CVF Confe-rence on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2022: 4083-4092. [22] LIN T Y, DOLLÁR P, GIRSHICK R, et al. Feature Pyramid Networks for Object Detection // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2017: 936-944. [23] LIU S, QI L, QIN H F, et al. Path Aggregation Network for Instance Segmentation // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2018: 8759-8768. [24] ZHAO Y A, LÜ W Y, XU S L, et al. DETRs Beat YOLOs on Real-Time Object Detection // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2024: 16965-16974. [25] KRIZHEVSKY A, SUTSKEVER I, HINTON G E.ImageNet Cla-ssification with Deep Convolutional Neural Networks. Communications of the ACM, 2017, 60(6): 84-90. [26] VASWANI A, SHAZEER N, PARMAR N, et al.Attention Is All You Need // Proc of the 31st International Conference on Neural Information Processing Systems. Cambridge, USA: MIT Press,2017: 6000-6010. [27] LUO S, LI X F, ZHANG X L.Wide Aspect Ratio Matching for Robust Face Detection. Multimedia Tools and Applications, 2023, 82(7): 10535-10552. [28] WANG G T, LI J, WU Z J, et al. EfficientFace: An Efficient Deep Network with Feature Enhancement for Accurate Face Detection. Multimedia Systems, 2023, 29(5): 2825-2839. [29] ZHU X Z, HU H, LIN S, et al. Deformable ConvNets V2: More Deformable, Better Results // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2019: 9300-9308. [30] ZONG Z F, SONG G L, LIU Y.DETRs with Collaborative Hybrid Assignments Training // Proc of the IEEE/CVF International Conference on Computer Vision. Washington, USA: IEEE, 2023: 6725-6735. [31] TIAN Z, SHEN C H, CHEN H, et al. FCOS: A Simple and Strong Anchor-Free Object Detector. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, 44(4): 1922-1933. [32] ZHANG S F, CHI C, YAO Y Q, et al. Bridging the Gap between Anchor-Based and Anchor-Free Detection via Adaptive Training Sample Selection // Proc of the IEEE/CVF Conference on Compu-ter Vision and Pattern Recognition. Washington, USA: IEEE, 2020: 9756-9765. [33] WU S K, LI X P, WANG X G.IoU-Aware Single-Stage Object Detector for Accurate Localization. Image and Vision Computing, 2020, 97. DOI: 10.1016/j.imavis.2020.103911. [34] LIN T Y, GOYAL P, GIRSHICK R, et al. Focal Loss for Dense Object Detection // Proc of the IEEE International Conference on Computer Vision. Washington, USA: IEEE, 2017: 2999-3007. [35] ZHANG Y F, REN W Q, ZHANG Z, et al. Focal and Efficient IOU Loss for Accurate Bounding Box Regression. Neurocomputing, 2022, 506: 146-157. [36] JAIN V, LEARNED-MILLER E.FDDB: A Benchmark for Face Detection in Unconstrained Settings[C/OL]. [2025-03-26].https://people.cs.umass.edu/~elm/papers/fddb.pdf. [37] NAJIBI M, SAMANGOUEI P, CHELLAPPA R, et al. SSH: Single Stage Headless Face Detector // Proc of the IEEE International Conference on Computer Vision. Washington, USA: IEEE, 2017: 4885-4894. [38] LIU Y, TANG X, HAN J Y, et al. HAMBox: Delving into Mining High-Quality Anchors on Face Detection // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2020: 13043-13051. [39] DENG J K, GUO J, VERVERAS E, et al. RetinaFace: Single-Shot Multi-level Face Localisation in the Wild // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2020: 5202-5211. [40] QI D L, TAN W J, YAO Q, et al. YOLO5Face: Why Reinventing a Face Detector // Proc of the European Conference on Computer Vision. Berlin, Germany: Springer, 2022: 228-244. [41] 王建,宋晓宁.融合多尺度特征的轻量级人脸检测算法.模式识别与人工智能, 2022, 35(6): 507-515. (WANG J, SONG X N.Lightweight Face Detection Algorithm with Multi-scale Feature Fusion. Pattern Recognition and Artificial Intelligence, 2022, 35(6): 507-515.) [42] GUO J, DENG J K, LATTAS A, et al. Sample and Computation Redistribution for Efficient Face Detection[C/OL].[2025-03-26]. https://arxiv.org/pdf/2105.04714. [43] ZHANG S F, ZHU X Y, LEI Z, et al. FaceBoxes: A CPU Real-Time Face Detector with High Accuracy // Proc of the IEEE International Joint Conference on Biometrics. Washington, USA: IEEE, 2017. DOI: 10.1109/BTAS.2017.8272675. [44] KAIL R, FEDYANIN K, MURAVEV N, et al. ScaleFace: Uncertainty-Aware Deep Metric Learning[C/OL]. [2025-03-26]. http://arxiv.org/pdf/2209.01880. [45] ZHANG K P, ZHANG Z P, LI Z F, et al. Joint Face Detection and Alignment Using Multitask Cascaded Convolutional Networks. IEEE Signal Processing Letters, 2016, 23(10): 1499-1503. [46] YANG B, YAN J J, LEI Z, et al. Aggregate Channel Features for Multi-view Face Detection // Proc of the IEEE International Joint Conference on Biometrics. Washington, USA: IEEE, 2014. DOI: 10.1109/BTAS.2014.6996284. [47] TAN M X, LE Q V.EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. Proceedings of Machine Learning Research, 2019, 97: 6105-6114. [48] ULTRALYTICS. YOLOv5[EB/OL].[2025-03-26]. https://github.com/ultralytics/yolov5. [49] MA N N, ZHANG X Y, ZHENG H T, et al. ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design // Proc of the European Conference on Computer Vision. Berlin, Germany: Springer, 2018: 122-138. [50] HUANG G, LIU Z, VAN DER MAATEN L, et al. Densely Connected Convolutional Networks // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2017: 2261-2269. [51] WANG Y T, JI X, ZHOU Z, et al. Detecting Faces Using Region-Based Fully Convolutional Networks[C/OL].[2025-03-26]. https://arxiv.org/abs/1709.05256. [52] YU J H, JIANG Y N, WANG Z Y, et al. UnitBox: An Advanced Object Detection Network // Proc of the 24th ACM International Conference on Multimedia. New York, USA: ACM, 2016: 516-520. [53] NAM W, DOLLÁR P, HAN J H. Local Decorrelation for Improved Pedestrian Detection[C/OL]. [2025-03-26].https://arxiv.org/abs/1406.1134. [54] WANG X Y, ZHOU Z, YUAN Z H, et al. FD-CNN: A Frequency-Domain FPGA Acceleration Scheme for CNN-Based Image-Processing Applications. ACM Transactions on Embedded Computing Systems, 2023, 22(6). DOI: 10.1145/3559105. [55] RANJAN R, PATEL V M, CHELLAPPA R.A Deep Pyramid Deformable Part Model for Face Detection // Proc of the IEEE 7th International Conference on Biometrics Theory, Applications and Systems. Washington, USA: IEEE, 2015. DOI: 10.1109/BTAS.2015.7358755. [56] RANJAN R, PATEL V M, CHELLAPPA R.HyperFace: A Deep Multi-task Learning Framework for Face Detection, Landmark Localization, Pose Estimation, and Gender Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, 41(1): 121-135. [57] LIANG Z J, DING S Y, LIN L.Unconstrained Facial Landmark Localization with Backbone-Branches Fully-Convolutional Networks[C/OL]. [2025-03-26].https://arxiv.org/abs/1507.03409. [58] FU R H, CHEN C C, YAN S, et al. Gaussian Similarity-Based Adaptive Dynamic Label Assignment for Tiny Object Detection. Neurocomputing, 2023, 543. DOI: 10.1016/j.neucom.2023.126285.