MGAD-multimodal-guided-artwork-diffusion

Official Pytorch implementation of "Draw Your Art Dream: Diverse Digital Art Synthesis with Multimodal Guided Diffusion" (ACM Multimedia 2022 Accepted) paper https://arxiv.org/abs/2209.13360

Draw Your Art Dream: Diverse Digital Art Synthesis with Multimodal Guided Diffusion
ACM Multimedia 2022

Abstract

Digital art creation is getting more attention in the multimedia community for providing effective engagement of the public with art. Current digital art generation methods usually use single modality inputs as guidance, limiting the expressiveness of the model and the diversity of generated results. To solve this problem, we propose the multimodal guided artwork diffusion (MGAD) model, a diffusion-based digital artwork generation method that utilizes multimodal prompts as guidance to control the classifier-free diffusion model. Additionally, the contrastive language-image pretraining (CLIP) model is used to unify text and image modalities. However, the semantic content of multimodal prompts may conflict with each other, which leads to a collapse in generating progress. Extensive experimental results on the quality and quantity of the generated digital art paintings confirm the effectiveness of the combination of the diffusion model and multimodal guidance.

Environment

Pytorch 1.9.0, Python 3.9
NVIDIA A40
512 defaults: 38 GB

conda create -n mgad python=3.9
conda activate mgad
pip3 install torch==1.9.0+cu111 torchvision==0.10.0+cu111 torchaudio==0.9.0 -f https://download.pytorch.org/whl/torch_stable.html

Install dependencies

git clone https://github.com/openai/CLIP
pip install -e ./CLIP
pip install -e ./guided-diffusion
pip install lpips

Download the diffusion model

curl -OL --http1.1 'https://the-eye.eu/public/AI/models/512x512_diffusion_unconditional_ImageNet/512x512_diffusion_uncond_finetune_008100.pt'

Download the model checkpoint

An unconditional model trained by Katherine Crowson(https://twitter.com/RiversHaveWings) on a 33 million image original resolution subset of Yahoo Flickr Creative Commons 100 Million.

curl -OL --http1.1 'https://the-eye.eu/public/AI/models/v-diffusion/yfcc_1.pth'

Run

python mgad.py -p "A stunning natural landscape painting is created by an artist Paul Cezanne in post-impressionism style." --image_prompts "./image_prompts/1.jpg" -t 2000 -ds 2000 -tvs 300 -o "./results/PC-landscape/PC-landscape"

Acknowledgments

This code borrows heavily from v-diffusion-pytorch and CLIP-Guided-Diffusion. We also thank CLIP and guided-diffusion.

License

The codes and the pretrained model in this repository are under the MIT license as specified by the LICENSE file.

Citation

If you find our work is useful in your research, please consider citing:

@inproceedings{huang2022draw,
  title={Draw your art dream: Diverse digital art synthesis with multimodal guided diffusion},
  author={Huang, Nisha and Tang, Fan and Dong, Weiming and Xu, Changsheng},
  booktitle={Proceedings of the 30th ACM International Conference on Multimedia},
  pages={1085--1094},
  year={2022}
}

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
guided-diffusion		guided-diffusion
image_prompts		image_prompts
LICENSE		LICENSE
README.md		README.md
mgad.py		mgad.py
teaser.jpg		teaser.jpg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MGAD-multimodal-guided-artwork-diffusion

Official Pytorch implementation of "Draw Your Art Dream: Diverse Digital Art Synthesis with Multimodal Guided Diffusion" (ACM Multimedia 2022 Accepted) paper https://arxiv.org/abs/2209.13360

Abstract

Environment

Install dependencies

Download the diffusion model

Download the model checkpoint

Run

Acknowledgments

License

Citation

About

Releases

Packages

Languages

License

haha-lisa/MGAD-multimodal-guided-artwork-diffusion

Folders and files

Latest commit

History

Repository files navigation

MGAD-multimodal-guided-artwork-diffusion

Official Pytorch implementation of "Draw Your Art Dream: Diverse Digital Art Synthesis with Multimodal Guided Diffusion" (ACM Multimedia 2022 Accepted) paper https://arxiv.org/abs/2209.13360

Abstract

Environment

Install dependencies

Download the diffusion model

Download the model checkpoint

Run

Acknowledgments

License

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages