GitHub - LuizScarlet/StableCodec: [ICCV 2025] StableCodec: Taming One-Step Diffusion for Extreme Image Compression

StableCodec: Taming One-Step Diffusion for Extreme Image Compression

Tianyu Zhang, Xin Luo, Li Li, Dong Liu

University of Science and Technology of China

⭐ If StableCodec is helpful to you, please star this repo. Thanks! 🤗

⌛ Updates

[2025/12/30] Release all source code. Leave it in 2025!
[2025/12/29] Release additional checkpoints for training and inference.
[2025/08/21] Training logs and reported results are now available, see results/.

⌛ TODO

📝 Abstract

Diffusion-based image compression has shown remarkable potential for achieving ultra-low bitrate coding (less than 0.05 bits per pixel) with high realism. However, current approaches: (1) Require a large number of denoising steps at the decoder to generate realistic results under extreme bitrate constraints. (2) Sacrifice reconstruction fidelity, as diffusion models typically fail to guarantee pixel-level consistency. To address these challenges, we introduce StableCodec, which enables one-step diffusion for high-fidelity and high-realism extreme image compression with improved coding efficiency. To achieve ultra-low bitrates, we first develop an efficient Deep Compression Latent Codec to transmit a noisy latent representation for a single-step denoising process. We then propose a Dual-Branch Coding Structure, consisting of a pair of auxiliary encoder and decoder, to enhance reconstruction fidelity. Furthermore, we adopt end-to-end optimization with joint bitrate and pixel-level constraints. StableCodec outperforms existing methods in terms of FID, KID and DISTS by a significant margin, even at bitrates as low as 0.005 bits per pixel, while maintaining (1) strong fidelity and (2) inference speeds comparable to mainstream transform coding schemes.

😍 Main Results

Rate-distortion-perception comparison on benchmarks:

Compressing high-resolution images for more than 1000 times:

⚙ Installation

conda create -n stablecodec python=3.10
conda activate stablecodec
pip install -r requirements.txt

⚡ Inference

Step 1: Prepare your datasets for inference

<PATH_TO_DATASET>/*.png

In our paper, we adopt the following test datasets:

Kodak: Contains 24 natural images with 512x768 pixels.
DIV2K Validation Set: Contains 100 2K-resolution images.
CLIC 2020 Test Set: Contains 428 2K-resolution images.

Step 2: Download pretrained models

Download SD-Turbo.
Download checkpoints for StableCodec and Auxiliary Encoder (ELIC):

--- List ---
stablecodec_base.pkl		# A base model for Stage 2 finetuning
stablecodec_ft2.pkl			# ~ 0.035bpp on Kodak
stablecodec_ft3.pkl			# ~ 0.029bpp on Kodak
stablecodec_ft4.pkl			# ~ 0.025bpp on Kodak
stablecodec_ft6.pkl			# ~ 0.020bpp on Kodak
stablecodec_ft8.pkl			# ~ 0.017bpp on Kodak
stablecodec_ft12.pkl		# ~ 0.013bpp on Kodak
stablecodec_ft16.pkl		# ~ 0.010bpp on Kodak
stablecodec_ft24.pkl		# ~ 0.008bpp on Kodak
stablecodec_ft32.pkl		# ~ 0.005bpp on Kodak
elic_official.pth			# Pretrained ELIC model for Auxiliary Encoder

Step 3: Inference for StableCodec

Please modify the paths in compress.sh:

python src/compress.py \
    --sd_path="<PATH_TO_SD_TURBO>/sd-turbo" \
    --elic_path="<PATH_TO_ELIC>/elic_official.pth" \
    --img_path="<PATH_TO_DATASET>/" \
    --rec_path="<PATH_TO_SAVE_OUTPUTs>/rec/" \
    --bin_path="<PATH_TO_SAVE_OUTPUTs>/bin/" \
    --codec_path="<PATH_TO_STABLECODEC>/stablecodec_ft2.pkl" \
    # --color_fix

Note: Color fix is recommended when inferring high-resolution images with tiling (e.g., DIV2K, CLIC 2020).

Then run:

bash compress.sh

You may find your bitstreams in specified bin_path and reconstructions in rec_path 🤗.

🍭 Evaluation (Optional)

Run the evaluation script to compute reconstruction metrics with src/evaluate.py

bash eval_folders.sh

Please make sure recon_dir and gt_dir are specified.

🔥 Training

Preparations

We perform lightweight training on 2x RTX 3090 (24G) GPUs. Consider adjusting train_batch_size and gradient_accumulation_steps in src/my_utils/training_utils.py for faster or better training performance.

Our training data includes:

Flickr2K: Contains 2560 2K-resolution images.
DIV2K Training Set: Contains 800 2K-resolution images.
CLIC: Contains 585 (CLIC 2020 Training) + 41 (CLIC 2020 Validation) + 60 (CLIC 2021 Test) 2K-resolution images.

We use h5py to organize training data. To construct a .hdf5 training file, please refer to src/my_utils/build_h5.py.

Note: An empirical finding suggests adding additional training data in Stage 1 improves stability. We adopt the first 10K images from LSDIR.

Stage 1: Train a base model with relaxed bitrates

Note: You may skip Stage 1 with our pretrained stablecodec_base.pkl.

Please modify the paths in train.sh:

accelerate launch --num_processes=2 --gpu_ids="0,1," --main_process_port 29300 src/train.py \
    --sd_path="<PATH_TO_SD_TURBO>/sd-turbo" \
    --elic_path="<PATH_TO_ELIC>/elic_official.pth" \
    --train_dataset="<PATH_TO_DATASET>/dataset.hdf5" \
    --test_dataset="<PATH_TO_DATASET>/Kodak/" \
    --output_dir="<PATH_TO_SAVE_OUTPUTS>/" \
    --max_train_steps 120000 \
    --lambda_rate 0.5

Then run:

bash train.sh

Stage 2: Finetune the base model with GAN and target extreme bitrates

Please modify the paths in finetune.sh:

accelerate launch --num_processes=2 --gpu_ids="0,1," --main_process_port 29300 src/finetune.py \
    --sd_path="<PATH_TO_SD_TURBO>/sd-turbo" \
    --elic_path="<PATH_TO_ELIC>/elic_official.pth" \
    --codec_path="<PATH_TO_STABLECODEC>/stablecodec_base.pkl" \
    --train_dataset="<PATH_TO_DATASET>/dataset.hdf5" \
    --test_dataset="<PATH_TO_DATASET>/Kodak/" \
    --output_dir="<PATH_TO_SAVE_OUTPUTS>/" \
    --max_train_steps 21000 \
    --lambda_rate 2 # [2, 3, 4, 6, 8, 12, 16, 24, 32]

Then run:

bash finetune.sh

📖 Citation

If you find our work inspiring, please consider citing:

@InProceedings{Zhang_2025_ICCV,
    author    = {Zhang, Tianyu and Luo, Xin and Li, Li and Liu, Dong},
    title     = {StableCodec: Taming One-Step Diffusion for Extreme Image Compression},
    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
    month     = {October},
    year      = {2025},
    pages     = {17379-17389}
}

📓 License

This work is licensed under MIT license.

🥰 Acknowledgement

This work is implemented based on CompressAI, ELIC-Unofficial, StableSR, StableDiffusion and DCVC. Thanks for their awesome work!

✉️ Contact

If you have any questions, please feel free to drop me an email:

zhangtianyu[at]mail.ustc.edu.cn

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

StableCodec: Taming One-Step Diffusion for Extreme Image Compression

⌛ Updates

⌛ TODO

📝 Abstract

😍 Main Results

⚙ Installation

⚡ Inference

🍭 Evaluation (Optional)

🔥 Training

📖 Citation

📓 License

🥰 Acknowledgement

✉️ Contact

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
ELIC/model		ELIC/model
assets		assets
results		results
src		src
LICENSE		LICENSE
README.md		README.md
compress.sh		compress.sh
eval_folders.sh		eval_folders.sh
finetune.sh		finetune.sh
requirements.txt		requirements.txt
train.sh		train.sh

License

LuizScarlet/StableCodec

Folders and files

Latest commit

History

Repository files navigation

StableCodec: Taming One-Step Diffusion for Extreme Image Compression

⌛ Updates

⌛ TODO

📝 Abstract

😍 Main Results

⚙ Installation

⚡ Inference

🍭 Evaluation (Optional)

🔥 Training

📖 Citation

📓 License

🥰 Acknowledgement

✉️ Contact

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages