Simple Latent Diffusion Model

This repository provides a lightweight and modular implementation of a Latent Diffusion Model (LDM), which performs efficient image generation by operating in a lower-dimensional latent space instead of pixel space.

Dataset	Generation Process of Latents	Generated Data
Swiss-roll
CIFAR-10
CelebA

Text-to-image synthesis using CLIP-guided latent diffusion

The table below showcases text-to-image generation using CLIP. The dataset used is the Asian Composite Dataset, with input text in Korean.

English Text	Generated Image
A round face with voluminous, slightly long short hair, along with barely visible vocal cords, gives off a more feminine aura than a masculine one. The well-defined eyes and lips enhance the subject's delicate features, making them appear more refined and intellectual.
The hairstyle appears slightly unpolished, lacking a refined touch. The slightly upturned eyes give off a sharp and somewhat sensitive impression. Overall, they seem to have a slender physique and appear efficient in handling tasks, though their social interactions may not be particularly smooth.

Interactive Demo on Hugging Face

This project is live on Hugging Face Spaces. Click the badge below to try it out!

Note: This demo runs on a free tier of Hugging Face Spaces. The image generation might take up to 10 minutes.

Tutorials

Tutorial for Latent Diffusion Model

Usage

The following example demonstrates how to train a Latent Diffusion Model and generate data using the code in this repository.

import torch

from helper.painter import Painter
from helper.trainer import Trainer
from helper.data_generator import DataGenerator
from helper.loader import Loader
from helper.cond_encoder import CLIPEncoder

from auto_encoder.models.variational_auto_encoder import VariationalAutoEncoder
from clip.models.ko_clip import KoCLIPWrapper
from diffusion_model.sampler.ddim import DDIM
from diffusion_model.models.latent_diffusion_model import LatentDiffusionModel
from diffusion_model.network.unet import Unet
from diffusion_model.network.unet_wrapper import UnetWrapper

# Path to the configuration file
CONFIG_PATH = './configs/cifar10_config.yaml'

# Set device
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

# Instantiate helper classes
painter = Painter()
loader = Loader()
data_generator = DataGenerator()

# Load CIFAR-10 dataset
data_loader = data_generator.cifar10(batch_size=128)

# Load CLIP model
clip = KoCLIPWrapper() # Any CLIP model from Hugging Face
cond_encoder = CLIPEncoder(clip, CONFIG_PATH) # Set encoder

# Train the Variational Autoencoder (VAE)
vae = VariationalAutoEncoder(CONFIG_PATH)  # Initialize the VAE model
trainer = Trainer(vae, vae.loss)  # Create a trainer for the VAE
trainer.train(dl=data_loader, epochs=100, file_name='vae', no_label=True)  # Train the VAE

# Train the Latent Diffusion Model (LDM)
sampler = DDIM(CONFIG_PATH)  # Initialize the DDIM sampler
network = UnetWrapper(Unet, CONFIG_PATH, cond_encoder)  # Initialize the U-Net network
ldm = LatentDiffusionModel(network, sampler, vae)  # Initialize the LDM
trainer = Trainer(ldm, ldm.loss)  # Create a trainer for the LDM
trainer.train(dl=data_loader, epochs=100, file_name='ldm', no_label=False)
# Train the LDM; set 'no_label=True' if the dataset does not include labels

# Load the trained models
vae = loader.model_load('models/vae', vae, is_ema=True)
ldm = loader.model_load('models/ldm', ldm, is_ema=True)

# Generate samples using the trained latent diffusion model
ldm.eval()
ldm = ldm.to(device)
sample = ldm(n_samples=4, y = '...', gamma = 3)  # Generate 4 sample images, 'y' represents any conditions, 'gamma' means guidance scale
painter.show_images(sample)  # Display the generated images

Name		Name	Last commit message	Last commit date
Latest commit History 209 Commits
assets		assets
notebook		notebook
simple-latent-diffusion-model		simple-latent-diffusion-model
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
README_ko.md		README_ko.md
requirements.txt		requirements.txt
simple-latent-diffusion-model.sln		simple-latent-diffusion-model.sln

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Simple Latent Diffusion Model

Text-to-image synthesis using CLIP-guided latent diffusion

Interactive Demo on Hugging Face

Tutorials

Usage

References

About

Uh oh!

Languages

License

Won-Seong/simple-latent-diffusion-model

Folders and files

Latest commit

History

Repository files navigation

Simple Latent Diffusion Model

Text-to-image synthesis using CLIP-guided latent diffusion

Interactive Demo on Hugging Face

Tutorials

Usage

References

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Languages