DiffBIR v2.1
News 📰
- A new model trained on the unsplash dataset with captions generated by LLaVA v1.5. It's based on SD2.1-zsnr. It offers better reconstruction quality and a greater understanding of prompts. Please follow README to have a try 😄!
- New samplers have been added, allowing results to be achieved in just 10 steps. The new samplers include ddim, dpm-solver, and various edm samplers from k-diffusion.
- Two captioners are available to automatically generate prompts: LLaVA and RAM.
- Noise augmentation has been added. Now you can enhance the model's creativity by adding noise to conditions.
- Supports fp16/bf16 inference and batch processing.
- Supports tiled inference for the stage-1 model. Fixes the issue of brightness changes in naively-implemented Tiled VAE by integrating with Tiled VAE.
- A Gradio demo is provided, allowing you to quickly experience all the updates mentioned above!
- Fixes a bug in last version for '--better_start' option. Input image for cldm.vae_encode should range in [-1, 1], rather than [0, 1].
With tiled inference, DiffBIR v2.1 can run on graphics cards with 8GB of VRAM. If you encounter any issues, feel free to open an issue. We hope you enjoy using it :D
TODO
- Supports Multi-GPU inference.
- A more stable online application.
- ComfyUI, replicate, diffusers, etc..
- Continue to optimize the model's performance while keeping the model architecture unchanged.