Skip to content

Double diffusion for more detailed upscaling

Jack Qiao edited this page Sep 10, 2022 · 2 revisions

TLDR: use diffusion instead of VQ-GAN for decoding = more details

The latent diffusion decoder in stable diffusion works well in general, but it still generates noticeable artifacts. These artifacts are especially apparent when upscaled using a traditional upscaler.

What we really want to do is to generate a new image loosely based on the low-res image, re-generating the details without the artifacts.

prompt "the saint of astronauts wears an ornate ceremonial space suit"

astro

real-esrgan 2x:

astro-realesrgan

gfpgan 2x:

astro-gfpgan

it's possible to decode the LDM embeddings with another diffusion model instead of a GAN. In this case fine-tuned from OpenAI's 256x256 unconditional guided diffusion model.

glid-3 upscale:

astro500001

This being a diffusion model, the results can be a bit random and will be different for each random seed (non-cherrypicked examples):

astro500003 astro500002 astro500001 astro500000

Clone this wiki locally