|
|
|
| Fine-Grained Face Detection Method Based on Anchor Loss Optimization |
| LIU Jialong1, LI Guanghui1, DAI Chenglong1 |
| 1. School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi 214122 |
|
|
|
|
Abstract In unconstrained environments, face images exhibit the characteristics of complex backgrounds and varying scales. Current face detectors suffer from an imbalanced number of anchors matched to the faces in label assignment and the receptive field growth limited by convolutional kernels in feature extraction. These issues lead to the difficulty of fine-grained optimization of the network. To address these issues, a fine-grained face detection method based on anchor loss optimization(FALO) is proposed. First, the relationship between the number of anchors matched to the faces and the loss is analyzed, and an anchor loss optimization algorithm is introduced to fine-tune the classification and localization loss during training. Second, a context feature fusion module is designed to effectively extract multi-scale features from the background. Finally, convolutional neural networks and self-attention mechanisms are considered comprehensively, and a self-attention auxiliary branch is constructed to supplement the receptive field of the detector and improve the attention to faces with different aspect ratios. Experiments on multiple datasets demonstrate that FALO achieves both real-time computational efficiency and high-precision detection, and it exhibits certain advantages in hard sample mining.
|
|
Received: 10 April 2025
|
|
|
| Fund:National Natural Science Foundation of China(No.62372214), Suzhou Science and Technology Project(No.SGC2021070) |
|
Corresponding Authors:
LI Guanghui, Ph.D., professor. His research interests include internet of things, edge computing, non-destructive testing, and integrated circuit design verification.
|
About author:: LIU Jialong, Master student. His research interests include face detection and model compression. DAI Chenglong, Ph.D., associate profe?ssor. His research interests include brain?computer interface, EEG signal processing, and data mining. |
|
|
|
[1] 苗争鸣,尹西明,陈劲.美国国家生物安全治理与中国启示:以美国生物识别体系为例.科学学与科学技术管理, 2020, 41(4): 3-18. (MIAO Z M, YIN X M, CHEN J.American National Biosafety Governance and Its Enlightenment to China: Based on the Study of the U.S. National Biometric System. Science of Science and Mana-gement of S.&T., 2020, 41(4): 3-18.) [2] MELZI P, RATHGEB C, TOLOSANA R, et al. An Overview of Privacy-Enhancing Technologies in Biometric Recognition. ACM Computing Surveys, 2024, 56(12). DOI: 10.1145/366459. [3] YU Z P, HUANG H B, CHEN W J, et al. YOLO-FaceV2: A Scale and Occlusion Aware Face Detector. Pattern Recognition, 2024, 155. DOI: 10.1016/j.patcog.2024.110714. [4] HOWARD A G, ZHU M L, CHEN B, et al. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications[C/OL].[2025-03-26]. https://arxiv.org/abs/1704.04861. [5] HE K M, ZHANG X Y, REN S Q, et al. Deep Residual Learning for Image Recognition // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2016: 770-778. [6] WU W, PENG H Y, YU S Q.YuNet: A Tiny Millisecond-Level Face Detector. Machine Intelligence Research, 2023, 20(5): 656-665. [7] LIU W, ANGUELOV D, ERHAN D, et al. SSD: Single Shot Multi-box Detector // Proc of the European Conference on Computer Vision. Berlin, Germany: Springer, 2016: 21-37. [8] TIWARI A, MANZOOR S, SEHGAL J, et al. A Comprehensive Review of Face Detection Technologies // Proc of the 2nd International Conference on Advances in Information Technology. Washington, USA: IEEE, 2024. DOI: 10.1109/ICAIT61638.2024.10690719. [9] LI S, LI M H, LI R H, et al. One-to-Few Label Assignment for End-to-End Dense Detection // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2023: 7350-7359. [10] YANG S, LUO P, LOY C C, et al. WIDER FACE: A Face Detection Benchmark // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2016: 5525-5533. [11] MA M C, XIA C Q, XIE C X, et al. Boosting Broader Receptive Fields for Salient Object Detection. IEEE Transactions on Image Processing, 2023, 32: 1026-1038. [12] ZOU Z X, CHEN K Y, SHI Z W, et al. Object Detection in 20 Years: A Survey. Proceedings of the IEEE, 2023, 111(3): 257-276. [13] DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An Image Is Worth 16×16 Words: Transformers for Image Recognition at Scale[C/OL].[2025-03-26]. https://arxiv.org/abs/2010.11929. [14] WANG A, CHEN H, LIU L H, et al. YOLOv10: Real-Time End-to-End Object Detection[C/OL].[2025-03-26]. https://arxiv.org/pdf/2405.14458. [15] LIU W, HASAN I, LIAO S C.Center and Scale Prediction: Anchor-Free Approach for Pedestrian and Face Detection. Pattern Recognition, 2023, 135. DOI: 10.1016/j.patcog.2022.109071. [16] ZHU Y J, CAI H X, ZHANG S H, et al. TinaFace: Strong But Simple Baseline for Face Detection[C/OL].[2025-03-26]. https://arxiv.org/abs/2011.13183v3. [17] LI J, WANG Y B, WANG C A, et al. DSFD: Dual Shot Face Detector // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2019: 5055-5064. [18] TANG X, DU D K, HE Z Q, et al. PyramidBox: A Context-Assisted Single Shot Face Detector // Proc of the European Conference on Computer Vision. Berlin, Germany: Springer, 2018: 812-828. [19] ZHANG S F, ZHU X Y, LEI Z, et al. S3FD: Single Shot Scale-Invariant Face Detector // Proc of the IEEE International Confe-rence on Computer Vision. Washington, USA: IEEE, 2017: 192-201. [20] YASHUNIN D, BAYDASOV T, VLASOV R.MaskFace: Multi-task Face and Landmark Detector[C/OL]. [2025-03-26].https://arxiv.org/abs/2005.09412. [21] LIU Y, WANG F, DENG J K, et al. MogFace: Towards a Deeper Appreciation on Face Detection // Proc of the IEEE/CVF Confe-rence on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2022: 4083-4092. [22] LIN T Y, DOLLÁR P, GIRSHICK R, et al. Feature Pyramid Networks for Object Detection // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2017: 936-944. [23] LIU S, QI L, QIN H F, et al. Path Aggregation Network for Instance Segmentation // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2018: 8759-8768. [24] ZHAO Y A, LÜ W Y, XU S L, et al. DETRs Beat YOLOs on Real-Time Object Detection // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2024: 16965-16974. [25] KRIZHEVSKY A, SUTSKEVER I, HINTON G E.ImageNet Cla-ssification with Deep Convolutional Neural Networks. Communications of the ACM, 2017, 60(6): 84-90. [26] VASWANI A, SHAZEER N, PARMAR N, et al.Attention Is All You Need // Proc of the 31st International Conference on Neural Information Processing Systems. Cambridge, USA: MIT Press,2017: 6000-6010. [27] LUO S, LI X F, ZHANG X L.Wide Aspect Ratio Matching for Robust Face Detection. Multimedia Tools and Applications, 2023, 82(7): 10535-10552. [28] WANG G T, LI J, WU Z J, et al. EfficientFace: An Efficient Deep Network with Feature Enhancement for Accurate Face Detection. Multimedia Systems, 2023, 29(5): 2825-2839. [29] ZHU X Z, HU H, LIN S, et al. Deformable ConvNets V2: More Deformable, Better Results // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2019: 9300-9308. [30] ZONG Z F, SONG G L, LIU Y.DETRs with Collaborative Hybrid Assignments Training // Proc of the IEEE/CVF International Conference on Computer Vision. Washington, USA: IEEE, 2023: 6725-6735. [31] TIAN Z, SHEN C H, CHEN H, et al. FCOS: A Simple and Strong Anchor-Free Object Detector. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, 44(4): 1922-1933. [32] ZHANG S F, CHI C, YAO Y Q, et al. Bridging the Gap between Anchor-Based and Anchor-Free Detection via Adaptive Training Sample Selection // Proc of the IEEE/CVF Conference on Compu-ter Vision and Pattern Recognition. Washington, USA: IEEE, 2020: 9756-9765. [33] WU S K, LI X P, WANG X G.IoU-Aware Single-Stage Object Detector for Accurate Localization. Image and Vision Computing, 2020, 97. DOI: 10.1016/j.imavis.2020.103911. [34] LIN T Y, GOYAL P, GIRSHICK R, et al. Focal Loss for Dense Object Detection // Proc of the IEEE International Conference on Computer Vision. Washington, USA: IEEE, 2017: 2999-3007. [35] ZHANG Y F, REN W Q, ZHANG Z, et al. Focal and Efficient IOU Loss for Accurate Bounding Box Regression. Neurocomputing, 2022, 506: 146-157. [36] JAIN V, LEARNED-MILLER E.FDDB: A Benchmark for Face Detection in Unconstrained Settings[C/OL]. [2025-03-26].https://people.cs.umass.edu/~elm/papers/fddb.pdf. [37] NAJIBI M, SAMANGOUEI P, CHELLAPPA R, et al. SSH: Single Stage Headless Face Detector // Proc of the IEEE International Conference on Computer Vision. Washington, USA: IEEE, 2017: 4885-4894. [38] LIU Y, TANG X, HAN J Y, et al. HAMBox: Delving into Mining High-Quality Anchors on Face Detection // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2020: 13043-13051. [39] DENG J K, GUO J, VERVERAS E, et al. RetinaFace: Single-Shot Multi-level Face Localisation in the Wild // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2020: 5202-5211. [40] QI D L, TAN W J, YAO Q, et al. YOLO5Face: Why Reinventing a Face Detector // Proc of the European Conference on Computer Vision. Berlin, Germany: Springer, 2022: 228-244. [41] 王建,宋晓宁.融合多尺度特征的轻量级人脸检测算法.模式识别与人工智能, 2022, 35(6): 507-515. (WANG J, SONG X N.Lightweight Face Detection Algorithm with Multi-scale Feature Fusion. Pattern Recognition and Artificial Intelligence, 2022, 35(6): 507-515.) [42] GUO J, DENG J K, LATTAS A, et al. Sample and Computation Redistribution for Efficient Face Detection[C/OL].[2025-03-26]. https://arxiv.org/pdf/2105.04714. [43] ZHANG S F, ZHU X Y, LEI Z, et al. FaceBoxes: A CPU Real-Time Face Detector with High Accuracy // Proc of the IEEE International Joint Conference on Biometrics. Washington, USA: IEEE, 2017. DOI: 10.1109/BTAS.2017.8272675. [44] KAIL R, FEDYANIN K, MURAVEV N, et al. ScaleFace: Uncertainty-Aware Deep Metric Learning[C/OL]. [2025-03-26]. http://arxiv.org/pdf/2209.01880. [45] ZHANG K P, ZHANG Z P, LI Z F, et al. Joint Face Detection and Alignment Using Multitask Cascaded Convolutional Networks. IEEE Signal Processing Letters, 2016, 23(10): 1499-1503. [46] YANG B, YAN J J, LEI Z, et al. Aggregate Channel Features for Multi-view Face Detection // Proc of the IEEE International Joint Conference on Biometrics. Washington, USA: IEEE, 2014. DOI: 10.1109/BTAS.2014.6996284. [47] TAN M X, LE Q V.EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. Proceedings of Machine Learning Research, 2019, 97: 6105-6114. [48] ULTRALYTICS. YOLOv5[EB/OL].[2025-03-26]. https://github.com/ultralytics/yolov5. [49] MA N N, ZHANG X Y, ZHENG H T, et al. ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design // Proc of the European Conference on Computer Vision. Berlin, Germany: Springer, 2018: 122-138. [50] HUANG G, LIU Z, VAN DER MAATEN L, et al. Densely Connected Convolutional Networks // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2017: 2261-2269. [51] WANG Y T, JI X, ZHOU Z, et al. Detecting Faces Using Region-Based Fully Convolutional Networks[C/OL].[2025-03-26]. https://arxiv.org/abs/1709.05256. [52] YU J H, JIANG Y N, WANG Z Y, et al. UnitBox: An Advanced Object Detection Network // Proc of the 24th ACM International Conference on Multimedia. New York, USA: ACM, 2016: 516-520. [53] NAM W, DOLLÁR P, HAN J H. Local Decorrelation for Improved Pedestrian Detection[C/OL]. [2025-03-26].https://arxiv.org/abs/1406.1134. [54] WANG X Y, ZHOU Z, YUAN Z H, et al. FD-CNN: A Frequency-Domain FPGA Acceleration Scheme for CNN-Based Image-Processing Applications. ACM Transactions on Embedded Computing Systems, 2023, 22(6). DOI: 10.1145/3559105. [55] RANJAN R, PATEL V M, CHELLAPPA R.A Deep Pyramid Deformable Part Model for Face Detection // Proc of the IEEE 7th International Conference on Biometrics Theory, Applications and Systems. Washington, USA: IEEE, 2015. DOI: 10.1109/BTAS.2015.7358755. [56] RANJAN R, PATEL V M, CHELLAPPA R.HyperFace: A Deep Multi-task Learning Framework for Face Detection, Landmark Localization, Pose Estimation, and Gender Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, 41(1): 121-135. [57] LIANG Z J, DING S Y, LIN L.Unconstrained Facial Landmark Localization with Backbone-Branches Fully-Convolutional Networks[C/OL]. [2025-03-26].https://arxiv.org/abs/1507.03409. [58] FU R H, CHEN C C, YAN S, et al. Gaussian Similarity-Based Adaptive Dynamic Label Assignment for Tiny Object Detection. Neurocomputing, 2023, 543. DOI: 10.1016/j.neucom.2023.126285. |
|
|
|