explainingai-code
diff --git a/‎.gitignore‎
Lines changed: 20 additions & 0 deletions b/‎.gitignore‎
Lines changed: 20 additions & 0 deletions
diff --git a/‎README.md‎
Lines changed: 83 additions & 0 deletions b/‎README.md‎
Lines changed: 83 additions & 0 deletions
diff --git a/‎config/__init__.py‎ b/‎config/__init__.py‎
diff --git a/‎config/celebhq.yaml‎
Lines changed: 65 additions & 0 deletions b/‎config/celebhq.yaml‎
Lines changed: 65 additions & 0 deletions
diff --git a/‎config/mnist.yaml‎
Lines changed: 64 additions & 0 deletions b/‎config/mnist.yaml‎
Lines changed: 64 additions & 0 deletions
diff --git a/‎dataset/__init__.py‎ b/‎dataset/__init__.py‎
diff --git a/‎dataset/celeb_dataset.py‎
Lines changed: 73 additions & 0 deletions b/‎dataset/celeb_dataset.py‎
Lines changed: 73 additions & 0 deletions
diff --git a/‎dataset/mnist_dataset.py‎
Lines changed: 76 additions & 0 deletions b/‎dataset/mnist_dataset.py‎
Lines changed: 76 additions & 0 deletions
diff --git a/‎models/__init__.py‎ b/‎models/__init__.py‎
@@ -0,0 +1,20 @@
+# Ignore all image files
+*.jpg
+*.png
+*.jpeg
+
+# Ignore pycharm and system files
+.DS_Store
+*.idea
+__pycache__
+*.zip
+
+# Ignore dataset files
+*.csv
+*.json
+
+# Ignore checkpoints
+*.pth
+
+# Ignore pickle files
+*.pkl
@@ -0,0 +1,83 @@
+Stable Diffusion Implementation in PyTorch
+========
+
+This repository implements Stable Diffusion.
+As of now this only implements unconditional latent diffusion models and trains on mnist and celebhq dataset.
+Pretty soon it will also have code for conditional ldm.
+
+For autoencoder I provide code for vae as well as vqvae.
+But both the stages of training use VQVAE only. One can easily change that to vae if needed
+
+For diffusion part, as of now it only implements DDPM with linear schedule.
+
+
+## Stable Diffusion Videos
+
+
+
+## Sample Output for Autoencoder on CelebHQ
+Image - Top, Reconstructions - Below
+
+
+## Sample Output for LDM on CelebHQ
+
+
+## Data preparation
+For setting up the mnist dataset:
+
+Follow - https://github.com/explainingai-code/Pytorch-VAE#data-preparation
+
+For setting up on CelebHQ, simply download the images from the official site.
+And mention the right path in the configuration.
+
+
+For training on your own dataset 
+* Create your own config and have the path point to images (look at celebhq.yaml for guidance)
+* Create your own dataset class, similar to celeb_dataset.py 
+* Map the dataset name to the right class in the training code 
+
+
+# Quickstart
+* Create a new conda environment with python 3.8 then run below commands
+* ```git clone https://github.com/explainingai-code/StableDiffusion-PyTorch.git```
+* ```cd StableDiffusion-PyTorch```
+* ```pip install -r requirements.txt```
+* Download lpips from https://github.com/richzhang/PerceptualSimilarity/blob/master/lpips/weights/v0.1/vgg.pth and put it in ```models/weights/v0.1/vgg.pth```
+* For training autoencoder
+* ```python -m tools.train_vqvae --config config/mnist.yaml``` for training vqvae
+* ```python -m tools.infer_vqvae --config config/mnist.yaml``` for generating reconstructions
+* For training ldm
+* ```python -m tools.train_ddpm_vqvae --config config/mnist.yaml``` for training ddpm
+* ```python -m tools.sample_ddpm_vqvae --config config/mnist.yaml``` for generating images
+
+## Configuration
+ Allows you to play with different components of ddpm and autoencoder training
+* ```config/mnist.yaml``` - Small autoencoder and ldm can even be trained on CPU
+* ```config/celebhq.yaml``` - Configuration used for celebhq dataset
+
+Relevant configuration parameters
+
+Most parameters are self explanatory but below I mention couple which are specific to this repo.
+* ```autoencoder_acc_steps``` : For accumulating gradients if image size is too large for larger batch sizes
+* ```save_latents``` : Enable this to save the latents , during inference of autoencoder. That way ddpm training will be faster
+
+## Output 
+Outputs will be saved according to the configuration present in yaml files.
+
+For every run a folder of ```task_name``` key in config will be created
+
+During training of autoencoder the following output will be saved 
+* Latest Autoencoder and discriminator checkpoint in ```task_name``` directory
+* Sample reconstructions in ```task_name/vqvae_autoencoder_samples```
+
+During inference of autoencoder the following output will be saved
+* Reconstructions for random images in  ```task_name```
+* Latents will be save in ```task_name/vqvae_latent_dir_name``` if mentioned in config
+
+During training of DDPM we will save the latest checkpoint in ```task_name``` directory
+During sampling, sampled image grid for all timesteps in ```task_name/samples/*.png``` 
+
+
+
+
+
@@ -0,0 +1,65 @@
+dataset_params:
+  im_path: 'data/celeba_hq_256'
+  im_channels : 3
+  im_size : 256
+  name: 'celebhq'
+
+diffusion_params:
+  num_timesteps : 1000
+  beta_start : 0.0015
+  beta_end : 0.0195
+
+ldm_params:
+  down_channels: [ 256, 384, 512, 768 ]
+  mid_channels: [ 768, 512 ]
+  down_sample: [ True, True, True ]
+  attn_down : [True, True, True]
+  time_emb_dim: 512
+  norm_channels: 32
+  num_heads: 16
+  conv_out_channels : 128
+  num_down_layers : 2
+  num_mid_layers : 2
+  num_up_layers : 2
+
+autoencoder_params:
+  z_channels: 3
+  codebook_size : 8192
+  down_channels : [64, 128, 256, 256]
+  mid_channels : [256, 256]
+  down_sample : [True, True, True]
+  attn_down : [False, False, False]
+  norm_channels: 32
+  num_heads: 4
+  num_down_layers : 2
+  num_mid_layers : 2
+  num_up_layers : 2
+
+
+train_params:
+  seed : 1111
+  task_name: 'celebhq'
+  ldm_batch_size: 16
+  autoencoder_batch_size: 4
+  disc_start: 15000
+  disc_weight: 0.5
+  codebook_weight: 1
+  commitment_beta: 0.2
+  perceptual_weight: 1
+  kl_weight: 0.000005
+  ldm_epochs: 100
+  autoencoder_epochs: 20
+  num_samples: 1
+  num_grid_rows: 1
+  ldm_lr: 0.000005
+  autoencoder_lr: 0.00001
+  autoencoder_acc_steps: 4
+  autoencoder_img_save_steps: 64
+  save_latents : False
+  vae_latent_dir_name: 'vae_latents'
+  vqvae_latent_dir_name: 'vqvae_latents'
+  ldm_ckpt_name: 'ddpm_ckpt.pth'
+  vqvae_autoencoder_ckpt_name: 'vqvae_autoencoder_ckpt.pth'
+  vae_autoencoder_ckpt_name: 'vae_autoencoder_ckpt.pth'
+  vqvae_discriminator_ckpt_name: 'vqvae_discriminator_ckpt.pth'
+  vae_discriminator_ckpt_name: 'vae_discriminator_ckpt.pth'
@@ -0,0 +1,64 @@
+dataset_params:
+  im_path: '/Users/tusharkumar/PycharmProjects/explainingai-repos/StableDiffusion-Pytorch/data/train/images'
+  im_channels : 1
+  im_size : 28
+  name: 'mnist'
+
+diffusion_params:
+  num_timesteps : 1000
+  beta_start : 0.0015
+  beta_end : 0.0195
+
+ldm_params:
+  down_channels: [ 128, 256, 256, 256]
+  mid_channels: [ 256, 256]
+  down_sample: [ False, False, False ]
+  attn_down : [True, True, True]
+  time_emb_dim: 256
+  norm_channels : 32
+  num_heads : 16
+  conv_out_channels : 128
+  num_down_layers: 2
+  num_mid_layers: 2
+  num_up_layers: 2
+
+autoencoder_params:
+  z_channels: 3
+  codebook_size : 20
+  down_channels : [32, 64, 128]
+  mid_channels : [128, 128]
+  down_sample : [True, True]
+  attn_down : [False, False]
+  norm_channels: 32
+  num_heads: 16
+  num_down_layers : 1
+  num_mid_layers : 1
+  num_up_layers : 1
+
+train_params:
+  seed : 1111
+  task_name: 'mnist'
+  ldm_batch_size: 64
+  autoencoder_batch_size: 64
+  disc_start: 1000
+  disc_weight: 0.5
+  codebook_weight: 1
+  commitment_beta: 0.2
+  perceptual_weight: 1
+  kl_weight: 0.000005
+  ldm_epochs : 100
+  autoencoder_epochs : 10
+  num_samples : 25
+  num_grid_rows : 5
+  ldm_lr: 0.00001
+  autoencoder_lr: 0.0001
+  autoencoder_acc_steps : 1
+  autoencoder_img_save_steps : 8
+  save_latents : False
+  vae_latent_dir_name : 'vae_latents'
+  vqvae_latent_dir_name : 'vqvae_latents'
+  ldm_ckpt_name: 'ddpm_ckpt.pth'
+  vqvae_autoencoder_ckpt_name: 'vqvae_autoencoder_ckpt.pth'
+  vae_autoencoder_ckpt_name: 'vae_autoencoder_ckpt.pth'
+  vqvae_discriminator_ckpt_name: 'vqvae_discriminator_ckpt.pth'
+  vae_discriminator_ckpt_name: 'vae_discriminator_ckpt.pth'
@@ -0,0 +1,73 @@
+import glob
+import os
+import random
+import torch
+import torchvision
+from PIL import Image
+from torch.utils.data import DataLoader
+from utils.diffusion_utils import load_latents
+from tqdm import tqdm
+from torch.utils.data.dataset import Dataset
+
+
+class CelebDataset(Dataset):
+    r"""
+    Celeb dataset will by default resize the images.
+    This can be replaced by any other dataset. As long as all the images
+    are under one directory.
+    """
+    
+    def __init__(self, split, im_path, im_size=256, im_channels=3, im_ext='jpg',
+                 use_latents=False, latent_path=None):
+        self.split = split
+        self.im_size = im_size
+        self.im_channels = im_channels
+        self.im_ext = im_ext
+        self.latent_maps = None
+        self.use_latents = False
+        self.images = self.load_images(im_path)
+        
+        # Whether to load images or to load latents
+        if use_latents and latent_path is not None:
+            latent_maps = load_latents(latent_path)
+            if len(latent_maps) == len(self.images):
+                self.use_latents = True
+                self.latent_maps = latent_maps
+                print('Found {} latents'.format(len(self.latent_maps)))
+            else:
+                print('Latents not found')
+    
+    def load_images(self, im_path):
+        r"""
+        Gets all images from the path specified
+        and stacks them all up
+        """
+        assert os.path.exists(im_path), "images path {} does not exist".format(im_path)
+        ims = []
+        fnames = glob.glob(os.path.join(im_path, '*.{}'.format('png')))
+        fnames += glob.glob(os.path.join(im_path, '*.{}'.format('jpg')))
+        fnames += glob.glob(os.path.join(im_path, '*.{}'.format('jpeg')))
+        for fname in fnames:
+            ims.append(fname)
+        print('Found {} images'.format(len(ims)))
+        return ims
+    
+    def __len__(self):
+        return len(self.images)
+    
+    def __getitem__(self, index):
+        if self.use_latents:
+            latent = self.latent_maps[self.images[index]]
+            return latent
+        else:
+            im = Image.open(self.images[index])
+            im_tensor = torchvision.transforms.Compose([
+                torchvision.transforms.Resize(self.im_size),
+                torchvision.transforms.CenterCrop(self.im_size),
+                torchvision.transforms.ToTensor(),
+            ])(im)
+            im.close()
+        
+            # Convert input to -1 to 1 range.
+            im_tensor = (2 * im_tensor) - 1
+            return im_tensor
@@ -0,0 +1,76 @@
+import glob
+import os
+import pickle
+import torchvision
+from PIL import Image
+from tqdm import tqdm
+from utils.diffusion_utils import load_latents
+from torch.utils.data.dataloader import DataLoader
+from torch.utils.data.dataset import Dataset
+
+
+class MnistDataset(Dataset):
+    r"""
+    Nothing special here. Just a simple dataset class for mnist images.
+    Created a dataset class rather using torchvision to allow
+    replacement with any other image dataset
+    """
+    
+    def __init__(self, split, im_path, im_size, im_channels,
+                 use_latents=False, latent_path=None):
+        r"""
+        Init method for initializing the dataset properties
+        :param split: train/test to locate the image files
+        :param im_path: root folder of images
+        :param im_ext: image extension. assumes all
+        images would be this type.
+        """
+        self.split = split
+        self.latent_maps = None
+        self.use_latents = False
+        self.images, self.labels = self.load_images(im_path)
+        # Whether to load images or to load latents
+        if use_latents and latent_path is not None:
+            latent_maps = load_latents(latent_path)
+            if len(latent_maps) == len(self.images):
+                self.use_latents = True
+                self.latent_maps = latent_maps
+                print('Found {} latents'.format(len(self.latent_maps)))
+            else:
+                print('Latents not found')
+        
+    def load_images(self, im_path):
+        r"""
+        Gets all images from the path specified
+        and stacks them all up
+        :param im_path:
+        :return:
+        """
+        assert os.path.exists(im_path), "images path {} does not exist".format(im_path)
+        ims = []
+        labels = []
+        for d_name in tqdm(os.listdir(im_path)):
+            fnames = glob.glob(os.path.join(im_path, d_name, '*.{}'.format('png')))
+            fnames += glob.glob(os.path.join(im_path, d_name, '*.{}'.format('jpg')))
+            fnames += glob.glob(os.path.join(im_path, d_name, '*.{}'.format('jpeg')))
+            for fname in fnames:
+                ims.append(fname)
+                #labels.append(int(d_name))
+        print('Found {} images for split {}'.format(len(ims), self.split))
+        return ims, labels
+    
+    def __len__(self):
+        return len(self.images)
+    
+    def __getitem__(self, index):
+        if self.use_latents:
+            latent = self.latent_maps[self.images[index]]
+            return latent
+        else:
+            im = Image.open(self.images[index])
+            im_tensor = torchvision.transforms.ToTensor()(im)
+            
+            # Convert input to -1 to 1 range.
+            im_tensor = (2 * im_tensor) - 1
+            return im_tensor
+