基于扩散模型的无条件反事实解释生成方法

doi:10.16451/j.cnki.issn1003-6059.202411006

摘要
图/表
参考文献
相关文章 (15)

全文: PDF (4222 KB) HTML (1 KB)
输出: BibTeX | EndNote (RIS)

摘要反事实解释通过对输入数据实施最小且具解释性的改动改变模型输出,揭示影响模型决策的关键因素.现有基于扩散模型的反事实解释方法依赖条件生成,需要额外获取与分类相关的语义信息,难以保证语义信息质量并增加计算成本.针对上述问题,文中基于生成扩散模型中的DDIMs(Denoising Diffusion Implicit Models),提出基于扩散模型的无条件反事实解释生成方法.首先,利用隐式去噪扩散模型在反向去噪过程中展现的一致性,将噪声图像视为隐变量以控制输出生成,从而使扩散模型适用于无条件的反事实解释生成流程.然后,充分利用隐式去噪扩散模型在过滤高频噪声和分布外扰动方面的优势,重塑无条件的反事实解释生成流程,生成具有解释性的语义改动.在不同数据集上的实验表明,文中方法的多项指标值较优.

	服务

	把本文推荐给朋友
	加入我的书架
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章
	仲智
	王宇
	祝子烨
	李云

关键词 ：深度学习, 可解释性, 反事实解释, 扩散模型, 对抗攻击

Abstract：Counterfactual explanations alter the model output by implementing minimal and interpretable modifications to input data, revealing key factors influencing model decisions. Existing counterfactual explanation methods based on diffusion models rely on conditional generation, requiring additional semantic information related to classification. However, ensuring semantic quality of the semantic information is challenging and computational costs are increased. To address these issues, an unconditional counterfactual explanation generation method based on the denoising diffusion implicit model(DDIM)is proposed. By leveraging the consistency exhibited by DDIM during the reverse denoising process, noisy images are treated as latent variables to control the generated outputs, thus making the diffusion model suitable for unconditional counterfactual explanation generation workflows. Then, the advantages of DDIM in filtering high-frequency noise and out-of-distribution perturbations are fully utilized, thereby reconstructing the unconditional counterfactual explanation workflow to generate semantically interpretable modifications. Extensive experiments on different datasets demonstrate that the proposed method achieves superior results across multiple metrics.

Key words： Deep Learning Interpretability Counterfactual Explanation Diffusion Model Adversarial Attack

收稿日期: 2024-09-12

ZTFLH:

TP 391

基金资助:国家自然科学基金项目(No.61772284,62406148,62306339)、江苏省自然科学基金项目(No.SBK2024047556)资助

通讯作者: 李云,博士,教授,主要研究方向为可信人工智能.E-mail:liyun@njupt.edu.cn.

作者简介: 仲智,硕士研究生,主要研究方向为机器学习.E-mail:1022040906@njupt.edu.cn.王宇,博士,讲师,主要研究方向为机器学习、自然语言处理.E-mail:wangyu@cpu.edu.cn. 祝子烨,博士,讲师,主要研究方向为机器学习、自然语言处理.E-mail:zhuziye@njupt.edu.cn.

引用本文:

仲智, 王宇, 祝子烨, 李云. 基于扩散模型的无条件反事实解释生成方法[J]. 模式识别与人工智能, 2024, 37(11): 1010-1021. ZHONG Zhi, WANG Yu, ZHU Ziye, LI Yun. Diffusion Models Based Unconditional Counterfactual Explanations Generation. Pattern Recognition and Artificial Intelligence, 2024, 37(11): 1010-1021.

链接本文:

http://manu46.magtech.com.cn/Jweb_prai/CN/10.16451/j.cnki.issn1003-6059.202411006 或 http://manu46.magtech.com.cn/Jweb_prai/CN/Y2024/V37/I11/1010

[1] CHANG C H, CREAGER E, GOLDENBERG A, et al. Explaining Image Classifiers by Counterfactual Generation[C/OL].[2024-08-15]. https://arxiv.org/abs/1807.08024.
[2] VERMA S, BOONSANONG V, HOANG M, et al. Counterfactual Explanations for Machine Learning: A Review[C/OL].[2024-08-15]. https://arxiv.org/abs/2010.10596.
[3] WACHTER S, MITTELSTADT B, RUSSELL C. Counterfactual Explanations without Opening the Black Box: Automated Decisions and the GDPR[C/OL]. [2024-08-15]. https://arxiv.org/pdf/1711.00399.
[4] GOODFELLOW I J, SHLENS J, SZEGEDY C. Explaining and Har-nessing Adversarial Examples[C/OL]. [2024-08-15]. https://arxiv.org/pdf/1412.6572.
[5] GOODFELLOW I J, POUGET-ABADIE J, MIRZA M, et al. Gene-rative Adversarial Nets // Proc of the 27th International Conference on Neural Information Processing Systems. Cambridge, USA: MIT Press, 2014, II: 2672-2680.
[6] KINGMA D P, WELLING M. Auto-Encoding Variational Bayes[C/OL]. [2024-08-15]. https://arxiv.org/pdf/1312.6114.
[7] HO J, JAIN A, ABBEEL P. Denoising Diffusion Probabilistic Mo-dels // Proc of the 34th International Conference on Neural Informa-tion Processing Systems. Cambridge, USA: MIT Press, 2020: 6840-6851.
[8] SONG J M, MENG C L, ERMON S. Denoising Diffusion Implicit Models[C/OL]. [2024-08-15]. https://arxiv.org/pdf/2010.02502.
[9] BÖHLE M, FRITZ M, SCHIELE B. Convolutional Dynamic Alignment Networks for Interpretable Classifications // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2021: 10024-10033.
[10] BÖHLE M, FRITZ M, SCHIELE B. B-cos Networks: Alignment Is All We Need for Interpretability // Proc of the IEEE/CVF Confe-rence on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2022: 10319-10328.
[11] HUANG Z X, LI Y. Interpretable and Accurate Fine Grained Re-cognition via Region Grouping // Proc of the IEEE/CVF Confe-rence on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2020: 8659-8669.
[12] RYMARCZYK D, STRUSKI Ł, GÖRSZCZAK M, et al. Interpre-table Image Classification with Differentiable Prototypes Assignment // Proc of the European Conference on Computer Vision. Berlin, Germany: Springer, 2022: 351-368.
[13] CHATTOPADHAY A, SARKAR A, HOWLADER P, et al. Grad-CAM++: Generalized Gradient-Based Visual Explanations for Deep Convolutional Networks // Proc of the IEEE Winter Conference on Applications of Computer Vision. Washington, USA: IEEE, 2018: 839-847.
[14] JALWANA M A A K, AKHTAR N, BENNAMOUN M, et al. CA-MERAS: Enhanced Resolution and Sanity Preserving Class Activation Mapping for Image Saliency // Proc of the IEEE/CVF Confe-rence on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2021: 16322-16331.
[15] KIM S S Y, MEISTER N, RAMASWAMY V V, et al. HIVE: Evaluating the Human Interpretability of Visual Explanations // Proc of the European Conference on Computer Vision. Berlin, Germany: Springer, 2022: 280-298.
[16] LEE J R, KIM S, PARK I, et al. Relevance-CAM: Your Model Already Knows Where to Look // Proc of the IEEE/CVF Confe-rence on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2021: 14939-14948.
[17] WANG H F, WANG Z F, DU M N, et al. Score-CAM: Score-Weighted Visual Explanations for Convolutional Neural Networks // Proc of the IEEE/CVF Conference on Computer Vision and Pa-ttern Recognition. Washington, USA: IEEE, 2020: 111-119.
[18] ZHENG Q, WANG Z W, ZHOU J, et al. Shap-CAM: Visual Explanations for Convolutional Neural Networks Based on Shapley Value // Proc of the European Conference on Computer Vision. Berlin, Germany: Springer, 2022: 459-474.
[19] GHORBANI A, WEXLER J, ZOU J, et al. Towards Automatic Concept-Based Explanations // Proc of the 33rd International Conference on Neural Information Processing Systems. Cambridge, USA: MIT Press, 2019: 9277-9286.
[20] KIM B, WATTENBERG M, GILMER J, et al. Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors(TCAV)[C/OL].[2024-08-15]. https://arxiv.org/abs/1711.11279.
[21] KOLEK S, NGUYEN D A, LEVIE R, et al. Cartoon Explanations of Image Classifiers // Proc of the European Conference on Computer Vision. Berlin, Germany: Springer, 2022: 443-458.
[22] GE Y H, XIAO Y, XU Z, et al. A Peek into the Reasoning of Neural Networks: Interpreting with Structural Visual Concepts // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2021: 2195-2204.
[23] TAN S, CARUANA R, HOOKER G, et al. Learning Global Additive Explanations for Neural Nets Using Model Distillation.Statistical Analysis and Data Mining[C/OL]. [2024-08-15]. https://arxiv.org/abs/1801.08640v2.
[24] GOYAL Y, WU Z Y, ERNST J, et al. Counterfactual Visual Explanations[C/OL].[2024-08-15]. https://arxiv.org/abs/1904.07451.
[25] VANDENHENDE S, MAHAJAN D, RADENOVIC F, et al. Ma-king Heads or Tails: Towards Semantically Consistent Visual Counterfactuals // Proc of the European Conference on Computer Vision. Berlin, Germany: Springer, 2022: 261-279.
[26] WANG P, LI Y J, SINGH K K, et al. IMAGINE: Image Synthesis by Image-Guided Model Inversion // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2021: 3680-3689.
[27] WANG P, VASCONCELOS N. SCOUT: Self-Aware Discriminant Counterfactual Explanations // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2020: 8978-8987.
[28] VAN LOOVEREN A, KLAISE J. Interpretable Counterfactual Explanations Guided by Prototypes // Proc of the European Confe-rence on Machine Learning and Knowledge Discovery in Databases. Berlin, Germany: Springer, 2021: 650-665.
[29] THIAGARAJAN J J, NARAYANASWAMY V, RAJAN D, et al. Designing Counterfactual Generators Using Deep Model Inversion // Proc of the 35th International Conference on Neural Information Processing Systems. Cambridge, USA: MIT Press, 2021: 16873-16884.
[30] HVILSHØJ F, IOSIFIDIS A, ASSENT I. ECINN: Efficient Counterfactuals from Invertible Neural Networks[C/OL].[2024-08-15]. https://arxiv.org/abs/2103.13701.
[31] BOREIKO V, AUGUSTIN M, CROCE F, et al. Sparse Visual Coun-terfactual Explanations in Image Space // Proc of the 44th DAGM German Conference on Pattern Recognition. Berlin, Germany: Springer, 2022: 133-148.
[32] SCHUT L, KEY O, MC GRATH R, et al. Generating Interpretable Counterfactual Explanations by Implicit Minimisation of Epistemic and Aleatoric Uncertainties // Proc of the 24th International Conference on Artificial Intelligence and Statistics. San Diego, USA: JMLR, 2021: 1756-1764.
[33] SINGLA S, POLLACK B, CHEN J X, et al. Explanation by Progressive Exaggeration[C/OL].[2024-08-15]. https://arxiv.org/pdf/1911.00483.
[34] VAN LOOVEREN A, KLAISE J, VACANTI G, et al. Conditional Generative Models for Counterfactual Explanations[C/OL].[2024-08-15]. https://arxiv.org/abs/2101.10123.
[35] JEANNERET G, SIMON L, JURIE F. Diffusion Models for Counterfactual Explanations. Computer Vision and Image Understan-ding, 2024, 249. DOI: 10.1016/j.cviu.2024.104207.
[36] KHORRAM S, FUXIN L. Cycle-Consistent Counterfactuals by Latent Transformations // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2022: 10193-10202.
[37] NEMIROVSKY D, THIEBAUT N, XU Y, et al. CounteRGAN: Generating Realistic Counterfactuals with Residual Generative Adversarial Nets[C/OL].[2024-08-15]. https://arxiv.org/abs/2009.05199.
[38] RODRIGUEZ P, CACCIA M, LACOSTE A, et al. Beyond Trivial Counterfactual Explanations with Diverse Valuable Explanations // Proc of the IEEE/CVF International Conference on Computer Vision. Washington, USA: IEEE, 2021: 1036-1045.
[39] SHIH S M, TIEN P J, KARNIN Z. GANMEX: One-vs-One Attributions Using GAN-Based Model Explainability // Proc of the 38th International Conference on Machine Learning. San Diego, USA: JMLR, 2021: 9592-9602.
[40] ZHAO Z L, DUA D, SINGH S. Generating Natural Adversarial Examples[C/OL]. [2024-08-15]. https://arxiv.org/pdf/1710.11342.
[41] JEANNERET G, SIMON L, JURIE F. Adversarial Counterfactual Visual Explanations // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2023: 16425-16435.
[42] JOSHI S, KOYEJO O, KIM B, et al. XGEMs: Generating Exam-plars to Explain Black-Box Models[C/OL].[2024-08-15]. https://arxiv.org/abs/1806.08867.
[43] HUANG G, LIU Z, VAN DER MAATEN L, et al. Densely Connected Convolutional Networks // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2017: 2261-2269.
[44] DENG J, DONG W, SOCHER R, et al. ImageNet: A Large-Scale Hierarchical Image Database // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2009: 248-255.
[45] HE K M, ZHANG X Y, REN S Q, et al. Deep Residual Learning for Image Recognition // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2016: 770-778.
[46] JACOB P, ZABLOCKI E, BEN-YOUNES H,et al. STEEX: Stee-ring Counterfactual Explanations with Semantics // Proc of the European Conference on Computer Vision. Berlin, Germany: Sprin-ger, 2022: 387-403.
[47] CAO Q, SHEN L, XIE W D, et al. VGGFace2: A Dataset for Recognising Faces Across Pose and Age // Proc of the 13th IEEE International Conference on Automatic Face Gesture Recognition. Washington, USA: IEEE, 2018: 67-74.
[48] HEUSEL M, RAMSAUER H, UNTERTHINER T, et al. GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium // Proc of the 31st International Conference on Neural Information Processing Systems. Cambridge, USA: MIT Press, 2017: 6629-6640.
[49] KINGMA D P, BA J L. Adam: A Method for Stochastic Optimization[C/OL]. [2024-08-15]. https://arxiv.org/pdf/1412.6980.
[50] FARID K, SCHRODI S, ARGUS M, et al. Latent Diffusion Counterfactual Explanations[C/OL].[2024-08-15]. https://arxiv.org/abs/2310.06668.
[51] JEANNERET G, SIMON L, JURIE F. Text-to-Image Models for Counterfactual Explanations: A Black-Box Approach // Proc of the IEEE/CVF Winter Conference on Applications of Computer Vision. Washington, USA: IEEE, 2024: 4745-4755.