Skip to content

Official PyTorch implementation of the paper "dLLM-Cache: Accelerating Diffusion Large Language Models with Adaptive Caching" (dLLM-Cache).

License

Notifications You must be signed in to change notification settings

maomaocun/dLLM-cache

Repository files navigation

dLLM-Cache: Accelerating Diffusion Large Language Models with Adaptive Caching

Official PyTorch implementation of the paper "dLLM-Cache: Accelerating Diffusion Large Language Models with Adaptive Caching" (dLLM-Cache).

🔥 News

  • [2025/06/15] Our dLLM-Cache is compatible with MMaDA.
  • [2025/05/31] Our dLLM-Cache is integrated into LLaDA-V.
  • [2025/05/23] The code of our paper has been released.
  • [2025/05/17] Our paper has been released.

✨️ Key Highlights

radar_speed

  • Currently supported models: LLaDA, Dream, LLaDA-V and MMaDA.
  • Speedup: Achieves up to 9.1x speedup over standard dLLM pipelines, with no performance loss on most tasks.
  • Evaluation: Evaluated on LLaDA 8B and Dream 7B.
  • Latency: Approaches ARM-level inference speeds in many scenarios.

🚀 Pipeline

Here's an overview of the process behind our dLLM-Cache method: pipeline

🛠️ Installation

To get started with dLLM-Cache, follow the installation instructions below.

  1. Clone the Repository:
git clone https://github.com/maomaocun/dLLM-Cache.git
cd dLLM-Cache
  1. Set Up the Environment: Create a Python environment with conda or virtualenv and install dependencies:
bash install.sh
  1. Demo:
python demo_{model_name}.py
  1. Running Experiments: Run experiments using the provided scripts:
bash scripts/run_{model_name}_{task_name}_base.sh

📘 Example Usage

  1. GSM8K with LLaDA
bash scripts/run_LLaDA_gsm8k_base.sh
  1. BBH with Dream
bash scripts/run_Dream_bbh_base.sh

📮 Contact

If you have any questions, please email yangyicun187@gmail.com.

🎉 Acknowledgements

This repository was built off of LLaDA, Dream, LLaDA-V, MMaDA and lm-evaluation-harness.

📌 Citation

If you find dLLM-Cache useful for your research and applications, please cite using this BibTeX:

@article{liu2025dllm,
  title={dLLM-Cache: Accelerating Diffusion Large Language Models with Adaptive Caching},
  author={Liu, Zhiyuan and Yang, Yicun and Zhang, Yaojie and Chen, Junjie and Zou, Chang and Wei, Qingyuan and Wang, Shaobo and Zhang, Linfeng},
  journal={arXiv preprint arXiv:2506.06295},
  year={2025}
}

🌟 Star History

Star History Chart

About

Official PyTorch implementation of the paper "dLLM-Cache: Accelerating Diffusion Large Language Models with Adaptive Caching" (dLLM-Cache).

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •