Few-Shot Deepfake Face Detection Method Based on Vision-Language Model
YANG Hongyu1,2, LI Xinghang1, CHENG Xiang3, HU Ze1
1. School of Safety Science and Engineering, Civil Aviation University of China, Tianjin 300300; 2. College of Computer Science and Technology, Civil Aviation University of China, Tianjin 300300; 3. College of Information Engineering, Yangzhou University, Yangzhou 225127
Abstract Aiming at the limitations of existing deepfake face detection methods in terms of model complexity, sample size requirements and adaptability to new deepfake techniques, a few-shot deepfake face detection method based on visual-language model(FDFD-VLM) is proposed. FDFD-VLM is built upon contrastive language-image pre-training(CLIP). Visual features are optimized through a face region extraction and high-frequency feature enhancement module. Prompt adaptability is improved by a classless differentiated prompt optimization module, while multimodal feature representation is strengthened by CLIP encoding attention optimization module. Additionally, a triplet loss function is introduced to improve the model discriminative capability. Experimental results demonstrate that FDFD-VLM outperforms existing methods on multiple deepfake face datasets and achieves efficient detection performance in few-shot deepfake face detection scenarios.
Fund:National Natural Science Foundation of China(No.U2433205), National Natural Science Foundation of China(No.62201576,U1833107), Jiangsu Provincial Basic Research Program Natural Science Foundation-Youth Fund(No.BK20230558)
Corresponding Authors:YANG Hongyu, Ph.D., professor. His research interests include network and system security.
About author:: LI Xinghang, Master student. His research interests include AI security. CHENG Xiang, Ph.D., lecturer. His research interests include network and system security. HU Ze, Ph.D., lecturer. His research interests include natural language processing.
YANG Hongyu,LI Xinghang,CHENG Xiang等. Few-Shot Deepfake Face Detection Method Based on Vision-Language Model[J]. Pattern Recognition and Artificial Intelligence, 2025, 38(3): 205-220.
[1] GOODFELLOW I, POUGET-ABADIE J, MIRZA M, et al. Generative Adversarial Networks. Communications of the ACM, 2020, 63(11): 139-144. [2] KINGMA D P, WELLING M.Auto-Encoding Variational Bayes[C/OL]. [2024-11-29]. https://arxiv.org/pdf/1312.6114. [3] RADFORD A, KIM J W, HALLACY C, et al. Learning Transfe-rable Visual Models from Natural Language Supervision. Proceedings of Machine Learning Research, 2021, 139: 8748-8763. [4] MIRZA M, OSINDERO S.Conditional Generative Adversarial Nets[C/OL]. [2024-11-29]. https://arxiv.org/pdf/1411.1784. [5] KARRAS T, LAINE S, AILA T.A Style-Based Generator Architecture for Generative Adversarial Networks // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2019: 4396-4405. [6] NIRKIN Y, KELLER Y, HASSNER T.FSGAN: Subject Agnostic Face Swapping and Reenactment // Proc of the IEEE/CVF International Conference on Computer Vision. Washington, USA: IEEE, 2019: 7183-7192. [7] LIU M, DING Y K, XIA M, et al. STGAN: A Unified Selective Transfer Network for Arbitrary Image Attribute Editing // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2019: 3668-3677. [8] XIA W H, YANG Y J, XUE J H, et al. TediGAN: Text-Guided Diverse Face Image Generation and Manipulation // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2021: 2256-2265. [9] PERNUŠ M, ŠTRUC V, DOBRIŠEK S. MaskFaceGAN: High Re-solution Face Editing with Masked GAN Latent Code Optimization. IEEE Transactions on Image Processing, 2023, 32: 5893-5908. [10] HO J, JAIN A, ABBEEL P.Denoising Diffusion Probabilistic Mo-dels // Proc of the 34th International Conference on Neural Information Processing Systems. Cambridge, USA: MIT Press, 2020: 6840-6851. [11] HUANG Z Q, CHAN K C K, JIANG Y M, et al. Collaborative Diffusion for Multi-modal Face Generation and Editing // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2023: 6080-6090. [12] KIM M, LIU F, JAIN A, et al. DCFace: Synthetic Face Generation with Dual Condition Diffusion Model // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2023: 12715-12725. [13] ZHAO W L, RAO Y M, SHI W K, et al. DiffSwap: High-Fidelity and Controllable Face Swapping via 3D-Aware Masked Diffusion // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2023: 8568-8577. [14] YE H, ZHANG J, LIU S B, et al. IP-Adapter: Text Compatible Image Prompt Adapter for Text-to-Image Diffusion Models[C/OL].[2024-11-29]. https://arxiv.org/pdf/2308.06721. [15] LI Z, CAO M D, WANG X T, et al. PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2024: 8640-8650. [16] HALIASSOS A, VOUGIOUKAS K, PETRIDIS S, et al. Lips Don't Lie: A Generalisable and Robust Approach to Face Forgery Detection // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2021: 5037-5047. [17] QI H, GUO Q, JUEFEI-XU F, et al. DeepRhythm: Exposing DeepFakes with Attentional Visual Heartbeat Rhythms // Proc of the 28th ACM International Conference on Multimedia. New York, USA: ACM, 2020: 4318-4327. [18] MANDELLI S, BONETTINI N, BESTAGINI P, et al. Detecting GAN-Generated Images by Orthogonal Training of Multiple CNNs // Proc of the IEEE International Conference on Image Processing. Washington, USA: IEEE, 2022: 3091-3095. [19] LI L Z, BAO J M, ZHANG T, et al. Face X-Ray for More General Face Forgery Detection // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2020: 5000-5009. [20] WANG Z D, BAO J M, ZHOU W G, et al. DIRE for Diffusion-Generated Image Detection // Proc of the IEEE/CVF International Conference on Computer Vision. Washington, USA: IEEE, 2023: 22388-22398. [21] QIAN Y Y, YIN G J, SHENG L, et al. Thinking in Frequency: Face Forgery Detection by Mining Frequency-Aware Clues // Proc of the European Conference on Computer Vision. Berlin, Ger-many: Springer, 2020: 86-103. [22] GAO J, XIA Z Q, MARCIALIS G L, et al. DeepFake Detection Based on High-Frequency Enhancement Network for Highly Compressed Content. Expert Systems with Applications, 2024, 249. DOI: 10.1016/j.eswa.2024.123732. [23] WOLTER M, BLANKE F, HEESE R, et al. Wavelet-Packets for Deepfake Image Analysis and Detection. Machine Learning, 2022, 111(11): 4295-4327. [24] MASI I, KILLEKAR A, MASCARENHAS R M, et al. Two-Branch Recurrent Network for Isolating DeepFakes in Videos // Proc of the European Conference on Computer Vision. Berlin, Germany: Springer, 2020: 667-684. [25] GU Z H, YAO T P, CHEN Y, et al. Region-Aware Temporal Inconsistency Learning for DeepFake Video Detection // Proc of the 31st International Joint Conference on Artificial Intelligence. San Francisco, USA: IJCAI, 2022: 920-926. [26] GUARNERA L, GIUDICE O, BATTIATO S.Mastering Deepfake Detection: A Cutting-Edge Approach to Distinguish GAN and Di-ffusion-Model Images. ACM Transactions on Multimedia Computing, Communications and Applications, 2024. DOI: 10.1145/3652027. [27] OJHA U, LI Y H, LEE Y J.Towards Universal Fake Image Detectors That Generalize across Generative Models // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2023: 24480-24489. [28] KHAN S A, DANG-NGUYEN D T. CLIPping the Deception: Adap-ting Vision-Language Models for Universal Deepfake Detection // Proc of the International Conference on Multimedia Retrieval. New York, USA: ACM, 2024: 1006-1015. [29] ZOU B, YANG C, GUAN J Z, et al. DFCP: Few-Shot DeepFake Detection via Contrastive Pretraining // Proc of the IEEE International Conference on Multimedia and Expo. Washington, USA: IEEE, 2023: 2303-2308. [30] KING D E.Dlib-ml: A Machine Learning Toolkit. The Journal of Machine Learning Research, 2009, 10: 1755-1758. [31] FRANK J, EISENHOFER T, SCHÖNHERR L, et al. Leveraging Frequency Analysis for Deep Fake Image Recognition. Proceedings of Machine Learning Research, 2020, 119: 3247-3258. [32] ZHOU K Y, YANG J K, LOY C C, et al. Learning to Prompt for Vision-Language Models. International Journal of Computer Vision, 2022, 130(9): 2337-2348. [33] SCHROFF F, KALENICHENKO D, PHILBIN J.FaceNet: A Unified Embedding for Face Recognition and Clustering // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2015: 815-823. [34] DANG H, LIU F, STEHOUWER J, et al. On the Detection of Digital Face Manipulation // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2020: 5780-5789. [35] NEVES J C, TOLOSANA R, VERA-RODRIGUEZ R, et al. GAN-printR: Improved Fakes and Evaluation of the State of the Art in Face Manipulation Detection. IEEE Journal of Selected Topics in Signal Processing, 2020, 14(5): 1038-1048. [36] KARRAS T, LAINE S, AITTALA M, et al. Analyzing and Improving the Image Quality of StyleGAN // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2020: 8107-8116. [37] CHEN Z X, SUN K, ZHOU Z Y, et al. DiffusionFace: Towards a Comprehensive Dataset for Diffusion-Based Face Forgery Analysis[C/OL].[2024-11-29]. https://arxiv.org/pdf/2403.18471. [38] SONG J M, MENG C L, ERMON S.Denoising Diffusion Implicit Models[C/OL]. [2024-11-29].https://arxiv.org/pdf/2010.02502v1. [39] LIU L P, REN Y, LIN Z J, et al. Pseudo Numerical Methods for Diffusion Models on Manifolds[C/OL].[2024-11-29]. https://arxiv.org/pdf/2202.09778. [40] ROMBACH R, BLATTMANN A, LORENZ D, et al. High-Resolution Image Synthesis with Latent Diffusion Models // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2022: 10674-10685. [41] DHARIWAL P, NICHOL A.Diffusion Models Beat GANs on Image Synthesis // Proc of the 35th International Conference on Neural Information Processing Systems. Cambridge, USA: MIT Press, 2021: 8780-8794. [42] LUGMAYR A, DANELLJAN M, ROMERO A, et al. RePaint: Inpainting Using Denoising Diffusion Probabilistic Models // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Re-cognition. Washington, USA: IEEE, 2022: 11451-11461. [43] KINGMA D P, BA J.Adam: A Method for Stochastic Optimization[C/OL]. [2024-11-29]. https://arxiv.org/pdf/1412.6980. [44] SELVARAJU R R, COGSWELL M, DAS A, et al. Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Loca-lization // Proc of the IEEE International Conference on Computer Vision. Washington, USA: IEEE, 2017: 618-626.