Image Generation Method for Cognizing Image Attribute Features from the Perspective of Disentangled Representation Learning
CAI Jianghai1,2, HUANG Chengquan1,2,3, WANG Shunxia2, LUO Senyan2, YANG Guiyan2, ZHOU Lihua2
1. Key Laboratory of Pattern Recognition and Intelligent Systems of Guizhou Province, Guizhou Minzu University, Guiyang 550025; 2. School of Data Sciences and Information Engineering, Guizhou Minzu University, Guiyang 550025; 3. Engineering Training Center, Guizhou Minzu University, Gui-yang 550025
Abstract:In the field of generative artificial intelligence, the research of disentangled representation learning further promotes the development of image generation methods. However, existing disentanglement methods pay more attention to low-dimensional representation of image generation, ignoring inherent interpretable factors of the target variation image. This oversight results in generated image being susceptible to the influence of other irrelevant attribute features. To address this issue, an image generation method for cognizing image attribute features from the perspective of disentangled representation learning is proposed. Firstly, candidate traversal directions for the target variation image are obtained by training, starting from the latent space of the generative model. Secondly, an unsupervised semantic decomposition strategy is constructed, and the interpretable directions embedded in the latent space are jointly discovered based on the direction of candidate traversals. Finally, a contrast simulator and a variation space are constructed using disentangled encoders and contrastive learning. Consequently, the disentangled representations of the target variation image are extracted from the interpretable directions and the image is generated. Extensive experiments on five popular disentanglement datasets demonstrate the superior performance of the proposed method.
蔡江海, 黄成泉, 王顺霞, 罗森艳, 杨贵燕, 周丽华. 解耦表征学习视角下认知图像属性特征的图像生成方法[J]. 模式识别与人工智能, 2024, 37(7): 638-651.
CAI Jianghai, HUANG Chengquan, WANG Shunxia, LUO Senyan, YANG Guiyan, ZHOU Lihua. Image Generation Method for Cognizing Image Attribute Features from the Perspective of Disentangled Representation Learning. Pattern Recognition and Artificial Intelligence, 2024, 37(7): 638-651.
[1] ABDAL R, QIN Y P, WONKA P. Image2StyleGAN: How to Embed Images into the StyleGAN Latent Space? // Proc of the IEEE/CVF International Conference on Computer Vision. Washington, USA: IEEE, 2019: 4431-4440. [2] PARK T, ZHU J Y, WANG O, et al. Swap Autoencoder for Deep Image Manipulation // Proc of the 34th International Conference on Neural Information Processing Systems. Cambridge, USA: MIT Press, 2020: 7198-7211. [3] YANG W L, HUANG J D, LUO D C, et al. Efficient Disentangled Representation Learning for Multi-modal Finger Biometrics. Pattern Recognition, 2024, 145. DOI: 10.1016/j.patcog.2023.109944. [4] ZHANG C X, WANG C, ZHAO Y F, et al. DR2: Disentangled Recurrent Representation Learning for Data-Efficient Speech Video Synthesis // Proc of the IEEE/CVF Winter Conference on Applications of Computer Vision. Washington, USA: IEEE, 2024: 6192-6202. [5] HIGGINS I, MATTHEY L, PAL A, et al. β-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework[C/OL].[2024-03-19]. https://openreview.net/pdf?id=Sy2fzU9gl. [6] CHEN X, DUAN Y, HOUTHOOFT R, et al. InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets // Proc of the 30th International Conference on Neural Information Processing Systems. Cambridge, USA: MIT Press, 2016: 2180-2188. [7] ZHU X Q, XU C, TAO D C. Where and What? Examining Interpretable Disentangled Representations // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2021: 5857-5866. [8] KAZEMI H, IRANMANESH S M, NASRABADI N. Style and Content Disentanglement in Generative Adversarial Networks // Proc of the IEEE Winter Conference on Applications of Computer Vision. Washington, USA: IEEE, 2019: 848-856. [9] KARRAS T, LAINE S, AILA T. A Style-Based Generator Architecture for Generative Adversarial Networks // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2019: 4396-4405. [10] REN X C, YANG T, WANG Y W, et al. Learning Disentangled Representation by Exploiting Pretrained Generative Models: A Contrastive Learning View[C/OL].[2024-03-19]. https://arxiv.org/pdf/2102.10543.pdf. [11] GOETSCHALCKX L, ANDONIAN A, OLIVA A, et al. GANalyze: Toward Visual Definitions of Cognitive Image Properties // Proc of the IEEE/CVF International Conference on Computer Vision. Washington, USA: IEEE, 2019: 5743-5752. [12] HÄRKÖNEN E, HERTZMANN A, LEHTINEN J, et al. GANspace: Discovering Interpretable GAN Controls // Proc of the 34th International Conference on Neural Information Processing Systems. Cambridge, USA: MIT Press, 2020: 9841-9850. [13] SHEN Y J, YANG C Y, TANG X O, et al. InterFaceGAN: Interpreting the Disentangled Face Representation Learned by GANs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, 44(4): 2004-2018. [14] SHEN Y J, ZHOU B L. Closed-Form Factorization of Latent Semantics in GANs // Proc of the IEEE/CVF Conference on Compu-ter Vision and Pattern Recognition. Washington, USA: IEEE, 2021: 1532-1540. [15] 李雅琪,王杰,王锋,等. 基于层次对比学习的半监督节点分类算法.模式识别与人工智能, 2023, 36(8): 712-720. (LI Y Q, WANG J, WANG F, et al. Semi-Supervised Node Cla-ssification Algorithm Based on Hierarchical Contrastive Learning. Pattern Recognition and Artificial Intelligence, 2023, 36(8): 712-720.) [16] TAO C X, ZHU X Z, SU W J, et al. Siamese Image Modeling for Self-Supervised Vision Representation Learning // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2023: 2132-2141. [17] WANG X, QI G J. Contrastive Learning with Stronger Augmentations. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(5): 5549-5560. [18] 许睿,邵帅,曹维佳,等. 基于重构对比的广义零样本图像分类.模式识别与人工智能, 2022, 35(12): 1078-1088. (XU R, SHAO S, CAO W J, et al. Generalized Zero-Shot Image Classification Based on Reconstruction Contrast. Pattern Recognition and Artificial Intelligence, 2022, 35(12):1078-1088.) [19] LIU R, GE Y X, CHOI C L, et al. DivCo: Diverse Conditional Image Synthesis via Contrastive Generative Adversarial Network // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2021: 16372-16381. [20] HÉNAFF O J, SRINIVAS A, DE FAUW J, et al. Data-Efficient Image Recognition with Contrastive Predictive Coding // Proc of the 37th International Conference on Machine Learning. San Diego, USA: JMLR, 2020: 4182-4192. [21] EASTWOOD C, WILLIAMS C K I. A Framework for the Quantitative Evaluation of Disentangled Representations[C/OL].[2024-03-19].https://openreview.net/pdf?id=By-7dz-AZ. [22] CHEN R T Q C, LI X C, GROSSE R, et al. Isolating Sources of Disentanglement in Variational Autoencoders[C/OL].[2024-03-19]. https://arxiv.org/pdf/1802.04942. [23] KIM H, MNIH A. Disentangling by Factorising. Proceedings of Machine Learning Research, 2018, 80: 2649-2658. [24] LEEB F, LANZILLOTTA G, ANNADANI Y, et al. Structure by Architecture: Structured Representations without Regularization[C/OL].[2024-03-19]. https://openreview.net/pdf?id=O_lFCPaF48t. [25] YANG T, WANG Y W, LAN C L, et al. Vector-Based Representation Is the Key: A Study on Disentanglement and Compositional Generalization[C/OL].[2024-03-19]. https://arxiv.org/pdf/2305.18063. [26] ESTERMANN B, WATTENHOFER R. DAVA: Disentangling Ad-versarial Variational Autoencoder[C/OL]. [2024-03-19].https://arxiv.org/pdf/2303.01384.pdf. [27] YANG T, WANG Y W, LU Y, et al. Visual Concepts Tokenization // Proc of the 36th International Conference on Neural Information Processing Systems. Cambridge, USA: MIT Press, 2022: 31571-31582. [28] PREECHAKUL K, CHATTHEE N, WIZADWONGSA S, et al. Di-ffusion Autoencoders: Toward a Meaningful and Decodable Representation // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2022: 10609-10619. [29] WU A C, ZHENG W S. Factorized Diffusion Autoencoder for Unsupervised Disentangled Representation Learning. Proceedings of the AAAI Conference on Artificial Intelligence, 2024,38(6): 5930-5939. [30] KHRULKOV V, MIRVAKHABOVA L, OSELEDETS I, et al. Dis-entangled Representations from Non-disentangled Models[C/OL].[2024-03-19]. https://arxiv.org/pdf/2102.06204.pdf. [31] YANG T, WANG Y W, LÜ Y, et al. DisDiff: Unsupervised Disentanglement of Diffusion Probabilistic Models // Proc of the 37th International Conference on Neural Information Processing Systems. Cambridge, USA: MIT Press, 2023: 69130-69156. [32] YANG T, REN X C, WANG Y W, et al. Towards Building a Group-Based Unsupervised Representation Disentanglement Framework[C/OL].[2024-03-19]. https://arxiv.org/pdf/2102.10303.