Toward Extreme Image Compression with One-Step Diffusion and Quantization Semantics
ZHANG Zhouhong1, QIAO Xin1,2, LI Zhiyuan1, AN Ning3,4, KONG He5
1. State Key Laboratory of Human-Machine Hybrid Augmented Intelligence, Institute of Artificial Intelligence and Robotics, Xi'an Jiaotong University, Xi'an 710049; 2. Sichuan Digital Economy Industry Development Research Institute, Chengdu 610037; 3. Institute of Mining Artificial Intelligence, Chinese Institute of Coal Science, Beijing 100013; 4. State Key Laboratory of Intelligent Coal Mining and Strata Con-trol, Beijing 100013; 5. School of Automation and Intelligent Manufacturing, Southern University of Science and Technology, Shenzhen 518055
Abstract:Diffusion-based extreme image compression methods exhibit significant performance advantages under extremely low bitrate scenarios. However, existing methods typically decode the image after multiple sampling steps due to their reliance on the step-by-step denoising strategy of diffusion models, resulting in a trade-off between reconstruction fidelity and inference efficiency. To address this issue, a toward extreme image compression method with one-step diffusion and quantization semantics is proposed. A one-step diffusion strategy is designed. It starts from compressed latent features rather than pure noise. High-quality image reconstruction is achieved with only a single sampling step. Moreover, quantized contrastive language-image pretraining(CLIP) features are introduced to replace text as semantic conditions, providing more fine-grained and reliable semantic guidance for reconstruction. Finally, a pixel-level loss is added to the training to alleviate the distribution discrepancy caused by optimization in the latent feature space, further improving reconstruction quality. Extensive experiments demonstrate that the proposed method achieves superior reconstruction quality with only a single sampling step.
[1] WALLACE G K.The JPEG Still Picture Compression Standard. Communications of the ACM, 1991, 34(4): 30-44. [2] BROSS B, WANG Y K, YE Y, et al. Overview of the Versatile Video Coding(VVC) Standard and Its Applications. IEEE Transactions on Circuits and Systems for Video Technology, 2021, 31(10): 3736-3764. [3] HO J, JAIN A, ABBEEL P.Denoising Diffusion Probabilistic Mo-dels[C/OL]. [2025-11-17].https://arxiv.org/pdf/2006.11239. [4] PAN Z H, ZHOU X, TIAN H.Extreme Generative Image Compre-ssion by Learning Text Embedding from Diffusion Models[C/OL]. [2025-11-17].https://arxiv.org/pdf/2211.07793. [5] LEI E, USLU Y B, HASSANI H, et al. Text+Sketch: Image Compression at Ultra-Low Rates[C/OL].[2025-11-17]. https://arxiv.org/pdf/2307.01944 [6] ZHANG L M, RAO A Y, AGRAWALA M.Adding Conditional Control to Text-to-Image Diffusion Models[C/OL]. [2025-11-17].https://arxiv.org/pdf/2302.05543. [7] CAREIL M, MUCKLEY M J, VERBEEK J, et al. Towards Image Compression with Perfect Realism at Ultra-Low Bitrates[C/OL].[2025-11-17]. https://arxiv.org/pdf/2310.10325 [8] VAN DER OORD A, VINYALS O, KAVUKCUOGLU K. Neural Discrete Representation Learning // Proc of the 31st International Conference on Neural Information Processing Systems. Cambridge, USA: MIT Press, 2017: 6309-6318. [9] HUFFMAN D A.A Method for the Construction of Mini-Mum-Redundancy Codes. Proceedings of the IRE, 1952, 40(9): 1098-1101. [10] GOLOMB S. Run-Length Encodings. IEEE Transactions on Information Theory, 1966, 12(3): 399-401. [11] WITTEN I H, NEAL R M, CLEARY J G.Arithmetic Coding for Data Compression. Communications of the ACM, 1987, 30(6): 520-540. [12] AHMED N, NATARAJAN T, RAO K R. Discrete Cosine Transform. IEEE Transactions on Computers, 1974, C-23(1): 90-93. [13] BALLÉ J, LAPARRA V, SIMONCELLI E P.End-to-End Optimized Image Compression[C/OL]. [2025-11-17]. https://arxiv.org/pdf/1611.01704. [14] BALLÉ J, MINNEN D, SINGH S, et al. Variational Image Compression with a Scale Hyperprior[C/OL].[2025-11-17]. https://arxiv.org/pdf/1802.01436. [15] MINNEN D, BALLÉ J, TODERICI G.Joint Autoregressive and Hier-archical Priors for Learned Image Compression // Proc of the 32nd International Conference on Neural Information Processing Systems. Cambridge, USA: MIT Press, 2018: 10794-10803. [16] HE D L, YANG Z M, PENG W K, et al. ELIC: Efficient Learned Image Compression with Unevenly Grouped Space-Channel Contextual Adaptive Coding // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2022: 5708-5717. [17] BLAU Y, MICHAELI T.Rethinking Lossy Compression: The Rate-Distortion-Perception Tradeoff // Proc of the 36th International Conference on Machine Learning. San Diego, USA: JMLR, 2019: 675-685. [18] AGUSTSSON E, TSCHANNEN M, MENTZER F, et al. Generative Adversarial Networks for Extreme Learned Image Compression // Proc of the IEEE/CVF International Conference on Computer Vision. Washington, USA: IEEE, 2019: 221-231. [19] MENTZER F, TODERICI G, TSCHANNEN M, et al. High-Fide-lity Generative Image Compression // Proc of the 34th International Conference on Neural Information Processing Systems. Cambridge, USA: MIT Press, 2020: 11913-11924. [20] YANG R H, MANDT S.Lossy Image Compression with Conditio-nal Diffusion Models // Proc of the 37th International Confe-rence on Neural Information Processing Systems. Cambridge, USA: MIT Press, 2023: 64971-64995. [21] KUANG H W, MA Y Y, YANG W H, et al. Consistency Guided Diffusion Model with Neural Syntax for Perceptual Image Compre-ssion // Proc of the 32nd ACM International Conference on Multimedia. New York, USA: ACM, 2024: 1622-1631. [22] GAO F Y, DENG X, JING J P, et al. Extremely Low Bit-Rate Image Compression via Invertible Image Generation. IEEE Transactions on Circuits and Systems for Video Technology, 2024, 34(8): 6993-7004. [23] JIANG X H, TAN W M, TAN T, et al. Multi-modality Deep Network for Extreme Learned Image Compression. Proceedings of the AAAI Conference on Artificial Intelligence, 2023, 37(1): 1033-1041. [24] LU L, XIE Y Y, JIANG W, et al. HybridFlow: Infusing Continuity into Masked Codebook for Extreme Low-Bitrate Image Compre-ssion // Proc of the 32nd ACM International Conference on Multimedia. New York, USA: ACM, 2024,7: 3010-3018. [25] VONDERFECHT J, LIU F.Lossy Compression with Pretrained Di-ffusion Models[C/OL]. [2025-11-17].https://arxiv.org/pdf/2501.09815. [26] ZHANG R, ISOLA P, EFROS A A, et al. The Unreasonable Effec-tiveness of Deep Features as a Perceptual Metric // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2018: 586-595. [27] LI Z Y, ZHOU Y H, WEI H, et al. Toward Extreme Image Compression with Latent Feature Guidance and Diffusion Prior. IEEE Transactions on Circuits and Systems for Video Technology, 2025, 35(1): 888-899. [28] KINGMA D P, BA J L.ADAM: A Method for Stochastic Optimization[C/OL]. [2025-11-17].https://arxiv.org/pdf/1412.6980. [29] NICHOL A, DHARIWAL P.Improved Denoising Diffusion Probabilistic Models[C/OL]. [2025-11-17].https://proceedings.mlr.press/v139/nichol21a/nichol21a.pdf. [30] DING K Y, MA K D, WANG S Q, et al. Image Quality Assessment: Unifying Structure and Texture Similarity. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 44(5): 2567-2581. [31] HEUSEL M, RAMSAUER H, UNTERTHINER T, et al. GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium[C/OL].[2025-11-17]. https://arxiv.org/pdf/1706.08500. [32] BIŃKOWSKI M, SUTHERLAND D J, ARBEL M, et al. Demystifying MMD GANs[C/OL].[2025-11-17]. https://arxiv.org/pdf/1801.01401. [33] WANG Z, BOVIK A C, SHEIKH H R, et al. Image Quality Asse-ssment: From Error Visibility to Structural Similarity. IEEE Transactions on Image Processing, 2004, 13(4): 600-612. [34] WANG Z, SIMONCELLI E P, BOVIK A C.Multiscale Structural Similarity for Image Quality Assessment // Proc of the 37th Asilomar Conference on Signals, Systems Computers. Washington, USA: IEEE, 2003: 1398-1402. [35] MUCKLEY M, EL-NOUBY A, ULLRICH K, et al. Improving Sta-tistical Fidelity for Neural Image Compression with Implicit Local Likelihood Models // Proc of the 40th International Conference on Machine Learning. San Diego, USA: JMLR, 2023: 25426-25443. [36] WEI H, GE C Y, LI Z Y, et al. Toward Extreme Image Rescaling with Generative Prior and Invertible Prior. IEEE Transactions on Circuits and Systems for Video Technology, 2024, 34(7): 6181-6193. [37] KE A L, ZHANG X, CHEN T, et al. Ultra Lowrate Image Compression with Semantic Residual Coding and Compression-Aware Diffusion // Proc of the 42nd International Conference on Machine Learning. San Diego, USA: JMLR, 2025: 29626-29650. [38] KÖRBER N, KROMER E, SIEBERT A, et al. PerCo(SD): Open Perceptual Compression[C/OL].[2025-11-17]. https://arxiv.org/pdf/2409.20255.