Skip to content

Official implementation of Würstchen: Efficient Pretraining of Text-to-Image Models

License

Notifications You must be signed in to change notification settings

phymhan/Wuerstchen

 
 

Repository files navigation

Open In Colab

Würstchen

main-figure-github

What is this?

Würstchen is a new framework for training text-conditional models by moving the computationally expensive text-conditional stage into a highly compressed latent space. Common approaches make use of a single stage compression, while Würstchen introduces another Stage that introduces even more compression. In total we have Stage A & B that are responsible for compressing images and Stage C that learns the text-conditional part in the low dimensional latent space. With that Würstchen achieves a 42x compression factor, while still reconstructing images faithfully. This enables training of Stage C to be fast and computationally cheap. We refer to the paper for details.

Use Würstchen

You can use the model simply through the notebooks here. The Stage B notebook only for reconstruction and the Stage C notebook is for the text-conditional generation. You can also try the text-to-image generation on Google Colab.

Train your own Würstchen

Training Würstchen is considerably faster and cheaper than other text-to-image as it trains in a much smaller latent space of 12x12. We provide training scripts for both Stage B and Stage C.

Download Models

Model Download Parameters Conditioning
Würstchen v1 Huggingface 1B (Stage C) + 600M (Stage B) + 19M (Stage A) CLIP-H-Text

Acknowledgment

Special thanks to Stability AI for providing compute for our research.

About

Official implementation of Würstchen: Efficient Pretraining of Text-to-Image Models

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 99.7%
  • Other 0.3%