scopic-diffusion

Project to fine tune Stable diffusion with images from Ecuadorian artist Oswaldo Gauyasamín.

In this repo are included the auxiliar jupyter notebooks and configuration files used for the project.

Hugging Face

The following resources are published in Hugging Face.

The dataset is here.
The model is here.
The web app is here.

Generating dataset

The images of the dataset are scraped using BS4 and captioned using BLIP. The dataset is published and published

Instructions

Run the following notebooks in order:

scraping_OG.ipynb: scraps and downloads the images of the artist in folder ./images/
captioning_images.ipynb: uses BLIP to generate captions for the artist paintings.
publishing_dataset.ipynb: publishes the dataset ztjona/oswaldo-guayasamin-blip-captions-v2 in Hugging Face.

Fine tune

The training is done in Google Colab PRO using a A100 GPU (40Gb).

Change the configurations in the local art-ecu.yaml file.
Run the notebook scopic_diffusion.ipynb.
- NOTE: the dependencies are installed in the first cell of the notebook.

Changes in configuration file

The configuration file used as template was pokemon.yaml. The changes made are:

# ...
# line 9
timesteps: 11 # Scopic-Diffusion: reduce to 1 epoch

# ...
#line 74
batch_size: 3 # Scopic-Diffusion: reduced to avoid memory GPU run out
    num_workers: 4
    num_val_workers: 0 # Avoid a weird val dataloader issue
    train:
      target: ldm.data.simple.hf_dataset
      params:
        name: ztjona/oswaldo-guayasamin-blip-captions-v2 # Scopic-Diffusion: pointing to our dataset
        image_transforms:
        - target: torchvision.transforms.Resize
          params:
            size: 512
            interpolation: 3
        - target: torchvision.transforms.RandomCrop
          params:
            size: 512
        - target: torchvision.transforms.RandomHorizontalFlip
    validation:
      target: ldm.data.simple.TextOnly
      params:
        captions:
        - "A woman with fire in her hands" # Scopic-Diffusion: changing validation prompt
        - " Painting of clouds over a city" # Scopic-Diffusion: changing validation prompt
        - "Yoda"
        - "An epic landscape photo of a mountain"
        output_size: 512

Model description

Evaluation metric FID, LPIPS
Model complexity

  | Name              | Type               | Params
---------------------------------------------------------
0 | model             | DiffusionWrapper   | 859 M 
1 | model_ema         | LitEma             | 0     
2 | first_stage_model | AutoencoderKL      | 83.7 M
3 | cond_stage_model  | FrozenCLIPEmbedder | 123 M 
---------------------------------------------------------
859 M     Trainable params
206 M     Non-trainable params
1.1 B     Total params
4,264.941 Total estimated model params size (MB)

Installation

### Download the code
git clone https://github.com/ztjona/scopic-diffusion.git
cd scopic-diffusion

pip install -r requirements.txt

Execute each notebook in the order preivously described.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

scopic-diffusion

Hugging Face

Generating dataset

Instructions

Fine tune

Changes in configuration file

Model description

Installation

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
images		images
.gitignore		.gitignore
README.md		README.md
art-ecu.yaml		art-ecu.yaml
captioning_images.ipynb		captioning_images.ipynb
publishing_dataset.ipynb		publishing_dataset.ipynb
requirements.txt		requirements.txt
scraping_OG.ipynb		scraping_OG.ipynb

ztjona/guayasamin-diffusion

Folders and files

Latest commit

History

Repository files navigation

scopic-diffusion

Hugging Face

Generating dataset

Instructions

Fine tune

Changes in configuration file

Model description

Installation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages