基于单步扩散和量化语义的极端图像压缩方法

doi:10.16451/j.cnki.issn1003-6059.202602005

摘要
图/表
参考文献
相关文章 (4)

全文: PDF (3057 KB) HTML (1 KB)
输出: BibTeX | EndNote (RIS)

摘要近年来,基于扩散模型的极端图像压缩方法在极低码率场景中性能显著优于传统方法.然而,这类方法依赖扩散模型的逐步去噪策略,通常需要多次采样才能完成解码,在重建保真度与推理效率之间存在一定的局限性,并且现有方法难以精准保留航拍场景中的地形结构与细节特征.因此,文中提出基于单步扩散和量化语义的极端图像压缩方法.设计单步扩散策略,从压缩特征而非纯噪声出发,仅通过一次采样即可实现高质量的图像重建.同时,引入量化CLIP特征替代文本条件,兼顾语义表达与传输效率,为重建过程提供精细稳定的语义约束.此外,在训练中融合像素级损失,结合潜在特征空间优化与像素域监督,缓解分布差异问题,进一步提升重建质量.大量实验表明,文中方法在仅使用一次采样的情况下即可达到较优的重建效果.

	服务

	把本文推荐给朋友
	加入我的书架
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章
	张洲弘
	乔欣
	李智远
	安宁
	孔贺

关键词 ：图像压缩, 图像重建, 极低码率, 量化CLIP语义特征

Abstract：Diffusion-based extreme image compression methods exhibit significant performance advantages under extremely low bitrate scenarios. However, existing methods typically decode the image after multiple sampling steps due to their reliance on the step-by-step denoising strategy of diffusion models, resulting in a trade-off between reconstruction fidelity and inference efficiency. To address this issue, a toward extreme image compression method with one-step diffusion and quantization semantics is proposed. A one-step diffusion strategy is designed. It starts from compressed latent features rather than pure noise. High-quality image reconstruction is achieved with only a single sampling step. Moreover, quantized contrastive language-image pretraining(CLIP) features are introduced to replace text as semantic conditions, providing more fine-grained and reliable semantic guidance for reconstruction. Finally, a pixel-level loss is added to the training to alleviate the distribution discrepancy caused by optimization in the latent feature space, further improving reconstruction quality. Extensive experiments demonstrate that the proposed method achieves superior reconstruction quality with only a single sampling step.

Key words： Image Compression Image Reconstruction Extremely Low Bitrates Quantized CLIP Semantic Features

收稿日期: 2025-12-23

ZTFLH:

TP 391

基金资助:国家自然科学基金项目(No.62503379,U24A20265)、陕西省自然科学基础研究计划项目(No.2025SYS-SYSZD-021)、四川省科技计划项目(No.2025ZNSFSC1501)、中国博士后科学基金项目(No.2025M771538)资助

通讯作者: 安宁,博士,研究员,主要研究方向为三维感知、计算机视觉.E-mail:ning.an.010@foxmail.com.

作者简介: 张洲弘,硕士研究生,主要研究方向为计算机视觉.E-mail:zaczhang@stu.xjtu.edu.cn.
乔欣,博士,助理教授,主要研究方向为三维感知、计算机视觉.E-mail:wudiqx@xjtu.edu.cn.
李智远,硕士研究生,主要研究方向为计算机视觉.E-mail:lizhiyuan2839@163.com.
孔贺,博士,副教授,主要研究方向为机器人智能感知与决策.E-mail:kongh@sustech.edu.cn.

引用本文:

张洲弘, 乔欣, 李智远, 安宁, 孔贺. 基于单步扩散和量化语义的极端图像压缩方法[J]. 模式识别与人工智能, 2026, 39(2): 157-169. ZHANG Zhouhong, QIAO Xin, LI Zhiyuan, AN Ning, KONG He. Toward Extreme Image Compression with One-Step Diffusion and Quantization Semantics. Pattern Recognition and Artificial Intelligence, 2026, 39(2): 157-169.

链接本文:

http://manu46.magtech.com.cn/Jweb_prai/CN/10.16451/j.cnki.issn1003-6059.202602005 或 http://manu46.magtech.com.cn/Jweb_prai/CN/Y2026/V39/I2/157

[1] WALLACE G K.The JPEG Still Picture Compression Standard. Communications of the ACM, 1991, 34(4): 30-44.
[2] BROSS B, WANG Y K, YE Y, et al. Overview of the Versatile Video Coding(VVC) Standard and Its Applications. IEEE Transactions on Circuits and Systems for Video Technology, 2021, 31(10): 3736-3764.
[3] HO J, JAIN A, ABBEEL P.Denoising Diffusion Probabilistic Mo-dels[C/OL]. [2025-11-17].https://arxiv.org/pdf/2006.11239.
[4] PAN Z H, ZHOU X, TIAN H.Extreme Generative Image Compre-ssion by Learning Text Embedding from Diffusion Models[C/OL]. [2025-11-17].https://arxiv.org/pdf/2211.07793.
[5] LEI E, USLU Y B, HASSANI H, et al. Text+Sketch: Image Compression at Ultra-Low Rates[C/OL].[2025-11-17]. https://arxiv.org/pdf/2307.01944
[6] ZHANG L M, RAO A Y, AGRAWALA M.Adding Conditional Control to Text-to-Image Diffusion Models[C/OL]. [2025-11-17].https://arxiv.org/pdf/2302.05543.
[7] CAREIL M, MUCKLEY M J, VERBEEK J, et al. Towards Image Compression with Perfect Realism at Ultra-Low Bitrates[C/OL].[2025-11-17]. https://arxiv.org/pdf/2310.10325
[8] VAN DER OORD A, VINYALS O, KAVUKCUOGLU K. Neural Discrete Representation Learning // Proc of the 31st International Conference on Neural Information Processing Systems. Cambridge, USA: MIT Press, 2017: 6309-6318.
[9] HUFFMAN D A.A Method for the Construction of Mini-Mum-Redundancy Codes. Proceedings of the IRE, 1952, 40(9): 1098-1101.
[10] GOLOMB S. Run-Length Encodings. IEEE Transactions on Information Theory, 1966, 12(3): 399-401.
[11] WITTEN I H, NEAL R M, CLEARY J G.Arithmetic Coding for Data Compression. Communications of the ACM, 1987, 30(6): 520-540.
[12] AHMED N, NATARAJAN T, RAO K R. Discrete Cosine Transform. IEEE Transactions on Computers, 1974, C-23(1): 90-93.
[13] BALLÉ J, LAPARRA V, SIMONCELLI E P.End-to-End Optimized Image Compression[C/OL]. [2025-11-17]. https://arxiv.org/pdf/1611.01704.
[14] BALLÉ J, MINNEN D, SINGH S, et al. Variational Image Compression with a Scale Hyperprior[C/OL].[2025-11-17]. https://arxiv.org/pdf/1802.01436.
[15] MINNEN D, BALLÉ J, TODERICI G.Joint Autoregressive and Hier-archical Priors for Learned Image Compression // Proc of the 32nd International Conference on Neural Information Processing Systems. Cambridge, USA: MIT Press, 2018: 10794-10803.
[16] HE D L, YANG Z M, PENG W K, et al. ELIC: Efficient Learned Image Compression with Unevenly Grouped Space-Channel Contextual Adaptive Coding // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2022: 5708-5717.
[17] BLAU Y, MICHAELI T.Rethinking Lossy Compression: The Rate-Distortion-Perception Tradeoff // Proc of the 36th International Conference on Machine Learning. San Diego, USA: JMLR, 2019: 675-685.
[18] AGUSTSSON E, TSCHANNEN M, MENTZER F, et al. Generative Adversarial Networks for Extreme Learned Image Compression // Proc of the IEEE/CVF International Conference on Computer Vision. Washington, USA: IEEE, 2019: 221-231.
[19] MENTZER F, TODERICI G, TSCHANNEN M, et al. High-Fide-lity Generative Image Compression // Proc of the 34th International Conference on Neural Information Processing Systems. Cambridge, USA: MIT Press, 2020: 11913-11924.
[20] YANG R H, MANDT S.Lossy Image Compression with Conditio-nal Diffusion Models // Proc of the 37th International Confe-rence on Neural Information Processing Systems. Cambridge, USA: MIT Press, 2023: 64971-64995.
[21] KUANG H W, MA Y Y, YANG W H, et al. Consistency Guided Diffusion Model with Neural Syntax for Perceptual Image Compre-ssion // Proc of the 32nd ACM International Conference on Multimedia. New York, USA: ACM, 2024: 1622-1631.
[22] GAO F Y, DENG X, JING J P, et al. Extremely Low Bit-Rate Image Compression via Invertible Image Generation. IEEE Transactions on Circuits and Systems for Video Technology, 2024, 34(8): 6993-7004.
[23] JIANG X H, TAN W M, TAN T, et al. Multi-modality Deep Network for Extreme Learned Image Compression. Proceedings of the AAAI Conference on Artificial Intelligence, 2023, 37(1): 1033-1041.
[24] LU L, XIE Y Y, JIANG W, et al. HybridFlow: Infusing Continuity into Masked Codebook for Extreme Low-Bitrate Image Compre-ssion // Proc of the 32nd ACM International Conference on Multimedia. New York, USA: ACM, 2024,7: 3010-3018.
[25] VONDERFECHT J, LIU F.Lossy Compression with Pretrained Di-ffusion Models[C/OL]. [2025-11-17].https://arxiv.org/pdf/2501.09815.
[26] ZHANG R, ISOLA P, EFROS A A, et al. The Unreasonable Effec-tiveness of Deep Features as a Perceptual Metric // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2018: 586-595.
[27] LI Z Y, ZHOU Y H, WEI H, et al. Toward Extreme Image Compression with Latent Feature Guidance and Diffusion Prior. IEEE Transactions on Circuits and Systems for Video Technology, 2025, 35(1): 888-899.
[28] KINGMA D P, BA J L.ADAM: A Method for Stochastic Optimization[C/OL]. [2025-11-17].https://arxiv.org/pdf/1412.6980.
[29] NICHOL A, DHARIWAL P.Improved Denoising Diffusion Probabilistic Models[C/OL]. [2025-11-17].https://proceedings.mlr.press/v139/nichol21a/nichol21a.pdf.
[30] DING K Y, MA K D, WANG S Q, et al. Image Quality Assessment: Unifying Structure and Texture Similarity. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 44(5): 2567-2581.
[31] HEUSEL M, RAMSAUER H, UNTERTHINER T, et al. GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium[C/OL].[2025-11-17]. https://arxiv.org/pdf/1706.08500.
[32] BIŃKOWSKI M, SUTHERLAND D J, ARBEL M, et al. Demystifying MMD GANs[C/OL].[2025-11-17]. https://arxiv.org/pdf/1801.01401.
[33] WANG Z, BOVIK A C, SHEIKH H R, et al. Image Quality Asse-ssment: From Error Visibility to Structural Similarity. IEEE Transactions on Image Processing, 2004, 13(4): 600-612.
[34] WANG Z, SIMONCELLI E P, BOVIK A C.Multiscale Structural Similarity for Image Quality Assessment // Proc of the 37th Asilomar Conference on Signals, Systems Computers. Washington, USA: IEEE, 2003: 1398-1402.
[35] MUCKLEY M, EL-NOUBY A, ULLRICH K, et al. Improving Sta-tistical Fidelity for Neural Image Compression with Implicit Local Likelihood Models // Proc of the 40th International Conference on Machine Learning. San Diego, USA: JMLR, 2023: 25426-25443.
[36] WEI H, GE C Y, LI Z Y, et al. Toward Extreme Image Rescaling with Generative Prior and Invertible Prior. IEEE Transactions on Circuits and Systems for Video Technology, 2024, 34(7): 6181-6193.
[37] KE A L, ZHANG X, CHEN T, et al. Ultra Lowrate Image Compression with Semantic Residual Coding and Compression-Aware Diffusion // Proc of the 42nd International Conference on Machine Learning. San Diego, USA: JMLR, 2025: 29626-29650.
[38] KÖRBER N, KROMER E, SIEBERT A, et al. PerCo(SD): Open Perceptual Compression[C/OL].[2025-11-17]. https://arxiv.org/pdf/2409.20255.