Minimal code aiming to reproduce E-LatentLPIPS

This is an unofficial implementation of E-LatentLPIPS in Diffusion2GAN.

Quick Start

import torch
from elatentlpips import ELatentLPIPS

latent0 = torch.randn(1, 4, 64, 64).to("cuda") # Encoded image1
latent1 = torch.randn(1, 4, 64, 64).to("cuda") # Encoded image2
elatentlpips = ELatentLPIPS(pretrained_latent_lpips_path="/path/to/model.safetensors", aug_type="bgcfnc").to("cuda")
distance = elatentlpips(latent0, latent1)

Installation

Quick Use:

pip install -e .

or

pip install git+https://github.com/SNU-VGILab/e-latentlpips

Training:

pip install -e ".[train]"

Download the pretrained model from here and put the appropriate path in the initialization code.

Overfitting experiment results

In our experiment, initial loss scales (0.1~0.3) were smaller compared to the original paper (0.5+).
Toy (reconstruction) experiment results got better as we increased the type of applied augmentations (bgcfnc > bgcf_cutout > bgc_cutout > bg_cutout >> none).
Careful selection of augmentations would be important to avoid undesired artifacts in actual model training.

Features

Pretrained models for SD 1.5 VAE encoder and LPIPS-like loss calculation using elatentlpips package.
Training code for Latent VGG Network using ImageNet.
Training code for LPIPS-style linear calibration using BAPPS.
Training code for toy experiments (overfitting) using 1-8 images.

Performances

ImageNet Validation Scores for Latent VGG Network (SD 1.5 VAE encoder)

Top-1 Acc.	VGG16	VGG16-bn	VGG16-gn (reproduced)
Pixels	71.59	73.36	-
Latent	64.25	68.26	72.63200

BAPPS 2AFC Scores

Perceptual Metric	BAPPS 2AFC Traditional	BAPPS 2AFC CNN	BAPPS 2AFC REAL
LPIPS	73.36	82.20	63.23
LatentLPIPS	74.29	81.99	63.21
LatentLPIPS (reproduced)	75.24	82.33	63.40

Some design choices

While the original E-LatentLPIPS uses a batch norm variant of VGG16, we have used GroupNorm for the ease of distributed training.
Not many details were provided in the paper for the latent VGG network other than 3 removed maxpooling layers. We manually modified the VGG16 layers to reflect this.

Notes

Check here for the list of augmentations supported.
Inputs to VAE are expected to be scaled properly (check utils.py for how they should be scaled).
The code uses mixed precision by default (and fp16 for VAE except for FLUX which uses bf16).
The Latent VGG network was trained on 8xA6000 for ~ 5 days.
Linear layers for LPIPS were trained for ~ 10 hours on 4xA6000.
Toy experiments take about 40 minutes on a single A6000.
While the code should handle other variants of encoders as well (e.g., SDXL, SD3, and FLUX), we only tested with SD 1.5 due to resource constraints.
We welcome results for other VAE variants as well! (You can just change the vae_type argument)
Check sample_scripts.sh for how to run the code.

Credits

Augmentation code is brought from the StyleGAN2-ADA repo.
Some parts of the model code are modified from the original LPIPS repo.
Huge thanks to the authors of Diffusion2GAN for the amazing work!

Citation

If you find this code useful, please leave appropriate attribution/citation to our code as well as the original paper:

@inproceedings{kang2024diffusion2gan,
  author    = {Kang, Minguk and Zhang, Richard and Barnes, Connelly and Paris, Sylvain and Kwak, Suha and Park, Jaesik and Shechtman, Eli and Zhu, Jun-Yan and Park, Taesung},
  title     = {{Distilling Diffusion Models into Conditional GANs}},
  booktitle = {European Conference on Computer Vision (ECCV)},
  year      = {2024},
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Minimal code aiming to reproduce E-LatentLPIPS

Quick Start

Installation

Overfitting experiment results

Features

Performances

Some design choices

Notes

Credits

Citation

Files

README.md

Latest commit

History

README.md

File metadata and controls

Minimal code aiming to reproduce E-LatentLPIPS

Quick Start

Installation

Overfitting experiment results

Features

Performances

Some design choices

Notes

Credits

Citation