GitHub - dlfcodec/Dual-generative-Latent-Fusion: [ICCV 2025 Highlight] official code of paper "DLF: Extreme Image Compression with Dual-generative Latent Fusion"

DLF: Extreme Image Compression with Dual-generative Latent Fusion

✨ ICCV 2025 Highlight ✨

Naifu Xue, Zhaoyang Jia, Jiahao Li, Bin Li, Yuan Zhang, Yan Lu

⭐ If you find DLF helpful, please consider starring this repository. Thank you! 🤗

👍 More Works

One-Step Diffusion-Based Image Compression with Semantic Distillation (NeurIPS 2025)
Generative Latent Coding for Ultra-Low Bitrate Image Compression (CVPR 2024)

📝 Abstract

Recent studies in extreme image compression have achieved remarkable performance by compressing the tokens from generative tokenizers. However, these methods often prioritize clustering common semantics within the dataset, while overlooking the diverse details of individual objects. Consequently, this results in suboptimal reconstruction fidelity, especially at low bitrates. To address this issue, we introduce a Dual-generative Latent Fusion (DLF) paradigm. DLF decomposes the latent into semantic and detail elements, compressing them through two distinct branches. The semantic branch clusters high-level information into compact tokens, while the detail branch encodes perceptually critical details to enhance the overall fidelity. Additionally, we propose a cross-branch interactive design to reduce redundancy between the two branches, thereby minimizing the overall bit cost. Experimental results demonstrate the impressive reconstruction quality of DLF even below 0.01 bits per pixel (bpp). On the CLIC2020 test set, our method achieves bitrate savings of up to 27.93% on LPIPS and 53.55% on DISTS compared to MS-ILLM. Furthermore, DLF surpasses recent diffusion-based codecs in visual fidelity while maintaining a comparable level of generative realism. Code will be available later.

💿 Installation

1. Create environment & install dependencies

conda create -n DLF python=3.10
conda activate DLF
pip install -r requirements.txt

Notes:

If necessary, skip PyTorch packages in requirements.txt and install PyTorch 2.2.0 manually for your CUDA version.
You may need to downgrade pip to v24.0.
Installing ninja may be required for torchac compilation.

2. Build the entropy coder for detail branch

sudo apt-get install cmake g++
cd src
mkdir build
cd build
conda activate $YOUR_PY38_ENV_NAME
cmake ../cpp -DCMAKE_BUILD_TYPE=Release[Debug]
make -j

💻 Inference

Download our model weights
Run inference script:

cd src
python test.py \
    --base_config ./config/config_test.yaml \
    --ckpt_path [checkpoint path] \
    --dataset_dir [your image folder] \
    --save_dir [output folder] \
    --gpu_idx 0

🚀 Train

Prepare pretrained models: Semantic Tokenizer and VQGAN Tokenizer
Prepare dataset

Download Open Images v4 dataset and randomly sample 400,000 images. (Other high-quality datasets or larger samples may further improve results if storage and GPU resources allow.)
Prepare your validation dataset.

Generate training/validation text lists:

find [image folder] -name "*.png" > [output txt path]  # or .jpg

Update the training config (./src/config/train) with tokenizer paths and dataset file paths. See notes in config for more details.

Start training

cd src
python train.py \
  --outdir [your output path] \
  --name [your save name] \
  --base .config/train/config_qp3_256train.yaml \   # according to your need
  --gpus 0,1,2,3

where qp indicates compression level, 256 or 512 indicates training resolution.

Notes:

Begin with 256×256 patches for faster initialization (e.g., xxx_256train.yaml).
Then resume from the 256×256 checkpoint and continue with 512×512 training (e.g., xxx_512train.yaml) for high-resolution adaptation.
(Optional) Finally, fine-tune pretrained semantic weights jointly with a smaller learning rate (see notes in config file).
Adjust the lambda strategy in config to reach your target bitrate.

🥰 Acknowledgement

We sincerely thank the following outstanding works, which greatly inspired and supported our research:

📕 Citation

If you find our work inspiring, please cite:

@InProceedings{xue2025dlf,
  author={Xue, Naifu and Jia, Zhaoyang and Li, Jiahao and Li, Bin and Zhang, Yuan and Lu, Yan},
  title={DLF: Extreme Image Compression with Dual-generative Latent Fusion},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
  month = {Oct},
  year={2025},
}

⚖️ License

This work is licensed under the Apache 2.0 License.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
assets		assets
src		src
test_scripts		test_scripts
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

DLF: Extreme Image Compression with Dual-generative Latent Fusion

✨ ICCV 2025 Highlight ✨

👍 More Works

📝 Abstract

💿 Installation

💻 Inference

🚀 Train

🥰 Acknowledgement

📕 Citation

⚖️ License

About

Uh oh!

Releases

Packages

Contributors 2

Languages

License

dlfcodec/Dual-generative-Latent-Fusion

Folders and files

Latest commit

History

Repository files navigation

DLF: Extreme Image Compression with Dual-generative Latent Fusion

✨ ICCV 2025 Highlight ✨

👍 More Works

📝 Abstract

💿 Installation

💻 Inference

🚀 Train

🥰 Acknowledgement

📕 Citation

⚖️ License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages