Skip to content

Conversation

@manolo-lolo
Copy link
Collaborator

@manolo-lolo manolo-lolo commented Dec 8, 2021

Adding the code for scene image generation as done in High-Resolution Complex Scene Synthesis with Transformers. Adding the Open Images dataset, COCO had been added in this PR. Also, added pre-trained models & provided first 100 layouts so that people can try it out easily.

I did not update the project main page to announce this new piece of code, probably this should be done before merging.

Proposing the following announcement on project page:

Scene Image Synthesis

teaser

Scene image generation based on bounding box conditionals as done in High-Resolution Complex Scene Synthesis with Transformers (see talk on workshop page). Supporting the datasets COCO and Open Images.

Training

Download first-stage models COCO-8k-VQGAN for COCO or COCO/Open-Images-8k-VQGAN for Open Images.
Change ckpt_path in data/coco_scene_images_transformer.yaml and data/open_images_scene_images_transformer.yaml to point to the downloaded first-stage models.
Download the full COCO/OI datasets and adapt data_path in the same files, unless working with the 100 files provided for training and validation suits your needs already.

Code can be run with
python main.py --base configs/coco_scene_images_transformer.yaml -t True --gpus 0,
or
python main.py --base configs/open_images_scene_images_transformer.yaml -t True --gpus 0,

Sampling

Train a model as described above or download a pre-trained model:

  • Open Images 1 billion parameter model available that trained 100 epochs. On 256x256 pixels, FID 41.48±0.21, SceneFID 14.60±0.15, Inception Score 18.47±0.27. The model was trained with 2d crops of images and is thus well-prepared for the task of generating high-resolution images, e.g. 512x512.
  • Open Images distilled version of the above model with 125 million parameters allows for sampling on smaller GPUs (4 GB is enough for sampling 256x256 px images). Model was trained for 60 epochs with 10% soft loss, 90% hard loss. On 256x256 pixels, FID 43.07±0.40, SceneFID 15.93±0.19, Inception Score 17.23±0.11.
  • COCO 30 epochs
  • COCO 60 epochs (find model statistics for both COCO versions in assets/coco_scene_images_training.svg)

When downloading a pre-trained model, remember to change ckpt_path in configs/*project.yaml to point to your downloaded first-stage model (see ->Training).

Scene image generation can be run with
python scripts/make_scene_samples.py --outdir=/some/outdir -r /path/to/pretrained/model --resolution=512,512

@manolo-lolo manolo-lolo requested a review from rromb December 8, 2021 16:52
@rromb rromb merged commit 09298dc into master Jan 13, 2022
rromb added a commit that referenced this pull request Jan 13, 2022
Adding the description of scene-synthesis models as proposed in #127.
@rromb
Copy link
Collaborator

rromb commented Jan 13, 2022

Great, thanks! I just added the announcement you suggested to the README (see 29b803f).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants