Lightweight Face Detection Algorithm with Multi-scale Feature Fusion
WANG Jian1, SONG Xiaoning1
1. Jiangsu Provincial Engineering Laboratory of Pattern Recognition and Computational Intelligence, School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi 214122
Abstract:Due to the limitations in computing capacity and storage resources of mobile devices, it is still an open challenge to design an efficient and high-precision face detector. In this paper, a lightweight face detection algorithm with multi-scale feature fusion(LFDMF) is proposed. The multi-level detection structure, regarded as the core component of face detection, is removed. Firstly, the existing lightweight backbone feature extraction network is introduced to encode the input image. Then, the proposed neck network is utilized to expand the receptive field of the feature map, and the multi-scale information with different receptive fields is fused into the one-level feature map. Finally, the proposed multi-task sensitive detector head is employed to perform face classification, regression and key point detection for the one-level feature map. Compared with the face detectors with RetinaFace and DSFD, LFDMF achieves higher accuracy and less computation burden. LFDMF builds three networks of different sizes. The large model, LFDMF-L, is built to achieve the most advanced performance on the Wider Face dataset, while the medium model, LFDMF-M, and the small model, LFDMF-S, achieve impressive performance with a small number of model parameters and less computation.
[1] ZHANG K P, ZHANG Z P, LI Z F, et al. Joint Face Detection and Alignment Using Multitask Cascaded Convolutional Networks. IEEE Signal Processing Letters, 2016, 23(10): 1499-1503. [2] ZHANG S F, ZHU X Y, LEI Z, et al. FaceBoxes: A CPU Real-Time Face Detector with High Accuracy // Proc of the IEEE International Joint Conference on Biometrics. Washington, USA: IEEE, 2017. DOI: 10.1109/BTAS.2017.8272675. [3] ZHANG S F, ZHU X Y, LEI Z, et al. S3fd: Single Shot Scale-Invariant Face Detector // Proc of the IEEE International Conference on Computer Vision. Washington, USA: IEEE, 2017: 192-201. [4] LIN T Y, DOLLÁR P, GIRSHICK R, et al. Feature Pyramid Networks for Object Detection // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2017: 936-944. [5] LIU S, QI L, QIN H F, et al. Path Aggregation Network for Instance Segmentation // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2018: 8759-8768. [6] GHIASI G, LIN T Y, LE Q V. NAS-FPN: Learning Scalable Feature Pyramid Architecture for Object Detection // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2019: 7029-7038. [7] QIAO S Y, CHEN L C, YUILLE A. DetectoRS: Detecting Objects with Recursive Feature Pyramid and Switchable Atrous Convolution // Proc of the IEEE/CVF Conference on Computer Vision and Pa-ttern Recognition. Washington, USA: IEEE, 2021: 10208-10219. [8] DENG J K, GUO J, VERVERAS E, et al. RetinaFace: Single-Shot Multi-level Face Localisation in the Wild // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2020: 5202-5211. [9] TANG X, DU D K, HE Z Q, et al. PyramidBox: A Context-Assisted Single Shot Face Detector // Proc of the European Conference on Computer Vision. Berlin, Germany: Springer, 2018: 812-828. [10] LI J, WANG Y B, WANG C A, et al. DSFD: Dual Shot Face Detector // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2019: 5055-5064. [11] LI J, ZHANG B, WANG Y B, et al. ASFD: Automatic and Scalable Face Detector // Proc of the 29th ACM International Confe-rence on Multimedia. New York, USA: ACM, 2021: 2139-2147. [12] ZHANG S F, CHI C, LEI Z, et al. RefineFace: Refinement Neural Network for High Performance Face Detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 43(11): 4008-4020. [13] HE K M, ZHANG X Y, REN S Q, et al. Deep Residual Learning for Image Recognition // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2016: 770-778. [14] LAW H, DENG J. CornerNet: Detecting Objects as Paired Keypoints. International Journal of Computer Vision, 2020, 128(3): 642-656. [15] DUAN K W, BAI S, XIE L X, et al. CenterNet: Keypoint Triplets for Object Detection // Proc of the IEEE/CVF International Conference on Computer Vision. Washington, USA: IEEE, 2019: 6568-6577. [16] NEWELL A, YANG K Y, DENG J. Stacked Hourglass Networks for Human Pose Estimation // Proc of the European Conference on Computer Vision. Berlin, Germany: Springer, 2016: 483-499. [17] CHEN Q, WANG Y M, YANG T, et al. You Only Look One-Le-vel Feature // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2021: 13034-13043. [18] XU Y Y, YAN W, YANG G K,et al. CenterFace: Joint Face Detection and Alignment Using Face as Point. Scientific Progra-mming, 2020. DOI: 10.1155/2020/7845384. [19] HE K M, GKIOXARI G, DOLLÁR P, et al. Mask R-CNN // Proc of the IEEE International Conference on Computer Vision. Washington, USA: IEEE, 2017: 2980-2988. [20] CHEN D, HUA G, WEN F, et al. Supervised Transformer Network for Efficient Face Detection // Proc of the European Confe-rence on Computer Vision. Berlin, Germany: Springer, 2016: 122-138. [21] 李盼盼,王朝立,孙占全.基于注意力机制的多特征融合人脸活体检测.信息与控制, 2021, 50(5): 631-640. (LI P P, WANG C L, SUN Z Q. Face Liveness Detection Based on Multi-feature Fusion with an Attention Mechanism. Information and Control, 2021, 50(5): 631-640.) [22] SIMONYAN K, ZISSERMAN A. Very Deep Convolutional Networks for Large-Scale Image Recognition[C/OL]. [2022-03-15]. https://arxiv.org/pdf/1409.1556.pdf. [23] GUO J, DENG J K, LATTAS A, et al. Sample and Computation Redistribution for Efficient Face Detection[C/OL].[2022-03-15]. https://arxiv.org/pdf/2105.04714.pdf. [24] LI X, WANG W H, WU L J, et al. Generalized Focal Loss: Learning Qualified and Distributed Bounding Boxes for Dense Object Detection[C/OL].[2022-03-15]. https://arxiv.org/pdf/2006.04388.pdf. [25] ZHENG Z H, WANG P, LIU W, et al. Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression // Proc of the AAAI Conference on Artificial Intelligence. Palo Alto, USA: AAAI, 2020: 12993-13000. [26] YANG S, LUO P, LOY C C, et al. WIDER FACE: A Face Detection Benchmark // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2016: 5525-5533. [27] CHEN K, WANG J Q, PANG J M, et al. MMDetection: Open MMLab Detection Toolbox and Benchmark[C/OL].[2022-03-15]. https://arxiv.org/pdf/1906.07155v1.pdf. [28] LIU Y, TANG X, HAN J Y, et al. HAMBox: Delving into Mining High-Quality Anchors on Face Detection // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2020: 13043-13051. [29] TIAN W X, WANG Z X, SHEN H F, et al. Learning Better Features for Face Detection with Feature Fusion and Segmentation Supervision[C/OL].[2022-03-15]. https://arxiv.org/pdf/1811.08557.pdf. [30] WANG H, LI Z F, JI X, et al. Face R-CNN[C/OL].[2022-03-15]. https://arxiv.org/pdf/1706.01061.pdf. [31] NAJIBI M, SAMANGOUEI P, CHELLAPPA R, et al. SSH: Single Stage Headless Face Detector // Pro of the IEEE International Conference on Computer Vision. Washington, USA: IEEE, 2017: 4875-4884. [32] CHEN D, REN S Q, WEI Y C, et al. Joint Cascade Face Detection and Alignment // Proc of the European Conference on Computer Vision. Berlin, Germany: Springer, 2014: 109-122. [33] SHEN X H, LIN Z, BRANDT J, et al. Detecting and Aligning Faces by Image Retrieval // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2013: 3460-3467. [34] VIOLA P, JONES M J. Robust Real-Time Face Detection. International Journal of Computer Vision, 2004, 57(2): 137-154. [35] KÖSTINGER M, WOHLHART P, ROTH P M, et al. Annotated Facial Landmarks in the Wild: A Large-Scale, Real-World Database for Facial Landmark Localization // Proc of the IEEE International Conference on Computer Vision Workshops. Washington, USA: IEEE, 2011: 2144-2151.