This repo contains the logic to do inferencing for the popular Stable Diffusion deep learning model in C#. Stable Diffusion models take a text prompt and create an image that represents the text.
For the below example sentence the CLIP model creates a text embedding that connects text to image. A random noise image is created and then denoised with the unet
model and scheduler algorithm to create an image that represents the text prompt. Lastly the decoder model vae_decoder
is used to create a final image that is the result of the text prompt and the latent image.
"make a picture of green tree with flowers around it and a red sky"
Auto Generated Random Latent Seed Input | Resulting image output |
---|---|
- Visual Studio or VS Code
- A GPU enabled machine with CUDA or DirectML on Windows
- Configure CUDA EP. Follow this tutorial to configure CUDA and cuDNN for GPU with ONNX Runtime and C# on Windows 11
- Windows comes with DirectML support. No additional configuration is needed. Be sure to clone the
direct-ML-EP
branch of this repo if you choose this option. - This was built on a GTX 3070 and it has not been tested on anything smaller.
- Clone this repo
git clone https://github.com/cassiebreviu/StableDiffusion.git
Download the ONNX Stable Diffusion models from Hugging Face.
Once you have selected a model version repo, click Files and Versions
, then select the ONNX
branch. If there isn't an ONNX model branch available, use the main
branch and convert it to ONNX. See the ONNX conversion tutorial for PyTorch for more information.
- Clone the model repo:
git lfs install
git clone https://huggingface.co/CompVis/stable-diffusion-v1-4 -b onnx
-
Copy the folders with the ONNX files to the C# project folder
\StableDiffusion\StableDiffusion
. The folders to copy are:unet
,vae_decoder
,text_encoder
,safety_checker
. -
Set Build for x64
-
Hit
F5
to run the project in Visual Studio ordotnet run
in the terminal to run the project in VS Code.