基于视觉-语言模型的小样本深度伪造人脸检测方法

doi:10.16451/j.cnki.issn1003-6059.202503002

摘要
图/表
参考文献
相关文章 (1)

全文: PDF (2818 KB) HTML (1 KB)
输出: BibTeX | EndNote (RIS)

摘要针对现有深度伪造人脸检测方法在模型复杂性、样本量需求和应对新型深度伪造技术上的局限,提出基于视觉-语言模型的小样本深度伪造人脸检测方法(Few-Shot Deepfake Face Detection Method Based on Visual-Language Model, FDFD-VLM).基于CLIP(Contrastive Language-Image Pre-training),通过人脸区域提取与高频特征增强模块优化视觉特征,采用无类名-差异化Prompt优化模块提升Prompt适应性,利用CLIP编码结果优化模块强化多模态特征表示,通过三元组损失函数增强模型区分能力.实验表明,FDFD-VLM在多个深度伪造人脸数据集上的准确率较高,能在较少的训练样本下实现高效的深度伪造人脸检测.

	服务

	把本文推荐给朋友
	加入我的书架
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章
	杨宏宇
	李星航
	成翔
	胡泽

关键词 ：深度伪造检测, 视觉-语言模型, 提示工程, 小样本检测

Abstract：Aiming at the limitations of existing deepfake face detection methods in terms of model complexity, sample size requirements and adaptability to new deepfake techniques, a few-shot deepfake face detection method based on visual-language model(FDFD-VLM) is proposed. FDFD-VLM is built upon contrastive language-image pre-training(CLIP). Visual features are optimized through a face region extraction and high-frequency feature enhancement module. Prompt adaptability is improved by a classless differentiated prompt optimization module, while multimodal feature representation is strengthened by CLIP encoding attention optimization module. Additionally, a triplet loss function is introduced to improve the model discriminative capability. Experimental results demonstrate that FDFD-VLM outperforms existing methods on multiple deepfake face datasets and achieves efficient detection performance in few-shot deepfake face detection scenarios.

Key words： Key Words Deepfake Detection Visual-Language Model Prompt Engineering Few-Shot Detection

收稿日期: 2024-12-13

ZTFLH:

TP391.41

基金资助:国家自然科学基金民航联合研究基金重点项目(No.U2433205)、国家自然科学基金项目(No.62201576,U1833107)、江苏省基础研究计划自然科学基金青年基金项目(No.BK20230558)资助

通讯作者: 杨宏宇,博士,教授,主要研究方向为网络与系统安全.E-mail:yhyxlx@hotmail.com.

作者简介: 李星航,硕士研究生,主要研究方向为人工智能安全.E-mail:lxh991225@163.com.
成翔,博士,讲师,主要研究方向为网络与系统安全.E-mail:huozhai9527@126.com.
胡泽,博士,讲师,主要研究方向为自然语言处理.E-mail:zhu@cauc.edu.cn.

引用本文:

杨宏宇, 李星航, 成翔, 胡泽. 基于视觉-语言模型的小样本深度伪造人脸检测方法[J]. 模式识别与人工智能, 2025, 38(3): 205-220. YANG Hongyu, LI Xinghang, CHENG Xiang, HU Ze. Few-Shot Deepfake Face Detection Method Based on Vision-Language Model. Pattern Recognition and Artificial Intelligence, 2025, 38(3): 205-220.

链接本文:

http://manu46.magtech.com.cn/Jweb_prai/CN/10.16451/j.cnki.issn1003-6059.202503002 或 http://manu46.magtech.com.cn/Jweb_prai/CN/Y2025/V38/I3/205

[1] GOODFELLOW I, POUGET-ABADIE J, MIRZA M, et al. Generative Adversarial Networks. Communications of the ACM, 2020, 63(11): 139-144.
[2] KINGMA D P, WELLING M.Auto-Encoding Variational Bayes[C/OL]. [2024-11-29]. https://arxiv.org/pdf/1312.6114.
[3] RADFORD A, KIM J W, HALLACY C, et al. Learning Transfe-rable Visual Models from Natural Language Supervision. Proceedings of Machine Learning Research, 2021, 139: 8748-8763.
[4] MIRZA M, OSINDERO S.Conditional Generative Adversarial Nets[C/OL]. [2024-11-29]. https://arxiv.org/pdf/1411.1784.
[5] KARRAS T, LAINE S, AILA T.A Style-Based Generator Architecture for Generative Adversarial Networks // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2019: 4396-4405.
[6] NIRKIN Y, KELLER Y, HASSNER T.FSGAN: Subject Agnostic Face Swapping and Reenactment // Proc of the IEEE/CVF International Conference on Computer Vision. Washington, USA: IEEE, 2019: 7183-7192.
[7] LIU M, DING Y K, XIA M, et al. STGAN: A Unified Selective Transfer Network for Arbitrary Image Attribute Editing // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2019: 3668-3677.
[8] XIA W H, YANG Y J, XUE J H, et al. TediGAN: Text-Guided Diverse Face Image Generation and Manipulation // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2021: 2256-2265.
[9] PERNUŠ M, ŠTRUC V, DOBRIŠEK S. MaskFaceGAN: High Re-solution Face Editing with Masked GAN Latent Code Optimization. IEEE Transactions on Image Processing, 2023, 32: 5893-5908.
[10] HO J, JAIN A, ABBEEL P.Denoising Diffusion Probabilistic Mo-dels // Proc of the 34th International Conference on Neural Information Processing Systems. Cambridge, USA: MIT Press, 2020: 6840-6851.
[11] HUANG Z Q, CHAN K C K, JIANG Y M, et al. Collaborative Diffusion for Multi-modal Face Generation and Editing // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2023: 6080-6090.
[12] KIM M, LIU F, JAIN A, et al. DCFace: Synthetic Face Generation with Dual Condition Diffusion Model // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2023: 12715-12725.
[13] ZHAO W L, RAO Y M, SHI W K, et al. DiffSwap: High-Fidelity and Controllable Face Swapping via 3D-Aware Masked Diffusion // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2023: 8568-8577.
[14] YE H, ZHANG J, LIU S B, et al. IP-Adapter: Text Compatible Image Prompt Adapter for Text-to-Image Diffusion Models[C/OL].[2024-11-29]. https://arxiv.org/pdf/2308.06721.
[15] LI Z, CAO M D, WANG X T, et al. PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2024: 8640-8650.
[16] HALIASSOS A, VOUGIOUKAS K, PETRIDIS S, et al. Lips Don't Lie: A Generalisable and Robust Approach to Face Forgery Detection // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2021: 5037-5047.
[17] QI H, GUO Q, JUEFEI-XU F, et al. DeepRhythm: Exposing DeepFakes with Attentional Visual Heartbeat Rhythms // Proc of the 28th ACM International Conference on Multimedia. New York, USA: ACM, 2020: 4318-4327.
[18] MANDELLI S, BONETTINI N, BESTAGINI P, et al. Detecting GAN-Generated Images by Orthogonal Training of Multiple CNNs // Proc of the IEEE International Conference on Image Processing. Washington, USA: IEEE, 2022: 3091-3095.
[19] LI L Z, BAO J M, ZHANG T, et al. Face X-Ray for More General Face Forgery Detection // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2020: 5000-5009.
[20] WANG Z D, BAO J M, ZHOU W G, et al. DIRE for Diffusion-Generated Image Detection // Proc of the IEEE/CVF International Conference on Computer Vision. Washington, USA: IEEE, 2023: 22388-22398.
[21] QIAN Y Y, YIN G J, SHENG L, et al. Thinking in Frequency: Face Forgery Detection by Mining Frequency-Aware Clues // Proc of the European Conference on Computer Vision. Berlin, Ger-many: Springer, 2020: 86-103.
[22] GAO J, XIA Z Q, MARCIALIS G L, et al. DeepFake Detection Based on High-Frequency Enhancement Network for Highly Compressed Content. Expert Systems with Applications, 2024, 249. DOI: 10.1016/j.eswa.2024.123732.
[23] WOLTER M, BLANKE F, HEESE R, et al. Wavelet-Packets for Deepfake Image Analysis and Detection. Machine Learning, 2022, 111(11): 4295-4327.
[24] MASI I, KILLEKAR A, MASCARENHAS R M, et al. Two-Branch Recurrent Network for Isolating DeepFakes in Videos // Proc of the European Conference on Computer Vision. Berlin, Germany: Springer, 2020: 667-684.
[25] GU Z H, YAO T P, CHEN Y, et al. Region-Aware Temporal Inconsistency Learning for DeepFake Video Detection // Proc of the 31st International Joint Conference on Artificial Intelligence. San Francisco, USA: IJCAI, 2022: 920-926.
[26] GUARNERA L, GIUDICE O, BATTIATO S.Mastering Deepfake Detection: A Cutting-Edge Approach to Distinguish GAN and Di-ffusion-Model Images. ACM Transactions on Multimedia Computing, Communications and Applications, 2024. DOI: 10.1145/3652027.
[27] OJHA U, LI Y H, LEE Y J.Towards Universal Fake Image Detectors That Generalize across Generative Models // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2023: 24480-24489.
[28] KHAN S A, DANG-NGUYEN D T. CLIPping the Deception: Adap-ting Vision-Language Models for Universal Deepfake Detection // Proc of the International Conference on Multimedia Retrieval. New York, USA: ACM, 2024: 1006-1015.
[29] ZOU B, YANG C, GUAN J Z, et al. DFCP: Few-Shot DeepFake Detection via Contrastive Pretraining // Proc of the IEEE International Conference on Multimedia and Expo. Washington, USA: IEEE, 2023: 2303-2308.
[30] KING D E.Dlib-ml: A Machine Learning Toolkit. The Journal of Machine Learning Research, 2009, 10: 1755-1758.
[31] FRANK J, EISENHOFER T, SCHÖNHERR L, et al. Leveraging Frequency Analysis for Deep Fake Image Recognition. Proceedings of Machine Learning Research, 2020, 119: 3247-3258.
[32] ZHOU K Y, YANG J K, LOY C C, et al. Learning to Prompt for Vision-Language Models. International Journal of Computer Vision, 2022, 130(9): 2337-2348.
[33] SCHROFF F, KALENICHENKO D, PHILBIN J.FaceNet: A Unified Embedding for Face Recognition and Clustering // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2015: 815-823.
[34] DANG H, LIU F, STEHOUWER J, et al. On the Detection of Digital Face Manipulation // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2020: 5780-5789.
[35] NEVES J C, TOLOSANA R, VERA-RODRIGUEZ R, et al. GAN-printR: Improved Fakes and Evaluation of the State of the Art in Face Manipulation Detection. IEEE Journal of Selected Topics in Signal Processing, 2020, 14(5): 1038-1048.
[36] KARRAS T, LAINE S, AITTALA M, et al. Analyzing and Improving the Image Quality of StyleGAN // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2020: 8107-8116.
[37] CHEN Z X, SUN K, ZHOU Z Y, et al. DiffusionFace: Towards a Comprehensive Dataset for Diffusion-Based Face Forgery Analysis[C/OL].[2024-11-29]. https://arxiv.org/pdf/2403.18471.
[38] SONG J M, MENG C L, ERMON S.Denoising Diffusion Implicit Models[C/OL]. [2024-11-29].https://arxiv.org/pdf/2010.02502v1.
[39] LIU L P, REN Y, LIN Z J, et al. Pseudo Numerical Methods for Diffusion Models on Manifolds[C/OL].[2024-11-29]. https://arxiv.org/pdf/2202.09778.
[40] ROMBACH R, BLATTMANN A, LORENZ D, et al. High-Resolution Image Synthesis with Latent Diffusion Models // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2022: 10674-10685.
[41] DHARIWAL P, NICHOL A.Diffusion Models Beat GANs on Image Synthesis // Proc of the 35th International Conference on Neural Information Processing Systems. Cambridge, USA: MIT Press, 2021: 8780-8794.
[42] LUGMAYR A, DANELLJAN M, ROMERO A, et al. RePaint: Inpainting Using Denoising Diffusion Probabilistic Models // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Re-cognition. Washington, USA: IEEE, 2022: 11451-11461.
[43] KINGMA D P, BA J.Adam: A Method for Stochastic Optimization[C/OL]. [2024-11-29]. https://arxiv.org/pdf/1412.6980.
[44] SELVARAJU R R, COGSWELL M, DAS A, et al. Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Loca-lization // Proc of the IEEE International Conference on Computer Vision. Washington, USA: IEEE, 2017: 618-626.