Skip to content

32V/SyncTweedies

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SyncTweedies: A General Generative Framework Based on Synchronized Diffusions

teaser

Jaihoon Kim*, Juil Koo*, Kyeongmin Yeo*, Minhyuk Sung (* Denotes equal contribution)

| Website | Paper | arXiv |


Introduction

This repository contains the official implementation of SyncTweedies. SyncTweedies can be applied to various downstread applications including ambiguous image generation, wide image generation, 360° panorama generation and texturing 3D mesh and Gaussians. More results can be found at our project webpage.

We introduce a general diffusion synchronization framework for generating diverse visual content, including ambiguous images, panorama images, 3D mesh textures, and 3D Gaussian splats textures, using a pretrained image diffusion model. We first present an analysis of various scenarios for synchronizing multiple diffusion processes through a canonical space. Based on the analysis, we introduce a novel synchronized diffusion method, SyncTweedies, which averages the outputs of Tweedie’s formula while conducting denoising in multiple instance spaces. Compared to previous work that achieves synchronization through finetuning, SyncTweedies is a zero-shot method that does not require any finetuning, preserving the rich prior of diffusion models trained on Internet-scale image datasets without overfitting to specific domains. We verify that SyncTweedies offers the broadest applicability to diverse applications and superior performance compared to the previous state-of-the-art for each application.


Environment Setup

Software Requirements

  • Python 3.8
  • CUDA 11.7
  • PyTorch 2.0.0
git clone https://github.com/KAIST-Visual-AI-Group/SyncTweedies
conda env create -f environment.yml
pip install git+https://github.com/openai/CLIP.git
pip install -e .
3D Mesh Texturing (PyTorch3D)
conda install pytorch3d -c pytorch3d
pip install --no-index --no-cache-dir pytorch3d -f https://dl.fbaipublicfiles.com/pytorch3d/packaging/wheels/py38_cu117_pyt200/download.html
3D Gaussians Texturing (Differentiable 3D Gaussian Rasterizer - gsplat)
pip install synctweedies/renderer/gaussian_splatting/submodules/simple-knn
cd synctweedies/renderer/gaussian/gsplat
python setup.py install
pip install .

Data

3D Mesh Texturing

We used 3D mesh and prompt pairs from Text2Tex and TEXTure. Text2Tex uses a subset of Objaverse dataset.

For 3D mesh texturing, we used generated 3D meshes from Luma AI.

  • Texture generation - turtle.obj (TEXTure), clutch_bag.obj (Text2Tex)
  • Texture editing - lantern/* (Luma AI)

3D Gaussians Texturing

Download Synthetic NeRF dataset and reconstruct 3D Gaussians using gsplat. You can also use outdoor scenes such as Mip-NeRF360 dataset.


Inference

Please run the commands below to run each application.

Ambiguous Image

1-to-1 Projection

python main.py --app ambiguous_image --case_num 2

1-to-n Projection

python main.py --app ambiguous_image --case_num 2 --views_names identity inner_rotate

n-to-1 Projection

python main.py --app ambiguous_image --case_num 2 --optimize_inverse_mapping

--prompts

Text prompts to guide the generation process. (Provide a prompt per view)

--save_top_dir

Directory to save intermediate/final outputs.

--tag

Tag output directory.

--save_dir_now

Save output directory with current time.

--case_num

Denoising case num. Refer to the main paper for other cases. (Case 2 - SyncTweedies)

--seed

Random seed.

--views_names

View transformation to each denoising process.

--rotate_angle

Rotation angle for rotation transformations.

--initialize_xt_from_zt

Initialize the initial random noise by projecting from the canonical space.

--optimize_inverse_mapping

Use optimization for projection operation. (n-to-1 projection)

Wide Image
python main.py --app wide_image --prompt "A photo of a mountain range at twilight" --save_top_dir ./output --case_num 2 --seed 0 --sampling_method ddim --num_inference_steps 50 --panorama_height 512 --panorama_width    3072 --mvd_end 1.0 --initialize_xt_from_zt

--prompts

Text prompts to guide the generation process.

--save_top_dir

Directory to save intermediate/final outputs.

--tag

Tag output directory.

--save_dir_now

Save output directory with current time.

--case_num

Denoising case num. Refer to the main paper for other cases. (Case 2 - SyncTweedies)

--seed

Random seed.

--sampling_method

Denoising sampling method.

--num_inference_steps

Number of sampling steps.

--panorama_height

The height of the image to generate.

--panorama_width

The width of the image to generate.

--mvd_end

Step to stop the synchronization. (1.0 - Synchronize all timesteps, 0.0 - No synchronizaiton)

--initialize_xt_from_zt

Initialize the initial random noise by projecting from the canonical space.

3D Mesh
python main.py --app mesh --prompt "A hand carved wood turtle" --output ./output --prefix mesh_tex --case_num 2 --mesh ./data/mesh/turtle.obj --seed 0 --sampling_method ddim --initialize_xt_from_zt

--prompts

Text prompts to guide the generation process.

--save_top_dir

Directory to save intermediate/final outputs.

--tag

Tag output directory.

--save_dir_now

Save output directory with current time.

--case_num

Denoising case num. Refer to the main paper for other cases. (Case 2 - SyncTweedies)

--mesh

Path to input 3D mesh.

--seed

Random seed.

--sampling_method

Denoising sampling method.

--initialize_xt_from_zt

Initialize the initial random noise by projecting from the canonical space.

--steps

Number of sampling steps.

3D Mesh Texture Editing

python main.py --app mesh --prompt "lantern" --output ./output --prefix mesh_edit --case_num 2 --mesh ./data/mesh/sdedit/mesh.obj --seed 0 --sampling_method ddim --initialize_xt_from_zt --sdedit --sdedit_prompt "A Chinese style lantern" --sdedit_timestep 0.2

--sdedit

Editing 3D mesh texture.

--sdedit_prompt

Target editing prompt. This overrides the original prompt.

--sdedit_timestep

Timestep to add noise. (1.0 - x_0, 0.0 - x_T)

360° Panorama
python main.py --app panorama --tag panorama --prompt "An old looking library" --depth_data_path ./data/cf726b6c0144425282245b34fc4efdca_depth.dpt --case_num 2 --average_rgb --initialize_xt_from_zt --model controlnet --save_top_dir ./output

--prompts

Text prompts to guide the generation process.

--save_top_dir

Directory to save intermediate/final outputs.

--tag

Tag output directory.

--save_dir_now

Save output directory with current time.

--depth_data_path

Path to depth map image.

--case_num

Denoising case num. Refer to the main paper for other cases. (Case 2 - SyncTweedies)

--mesh

Path to input 3D mesh.

--seed

Random seed.

--sampling_method

Denoising sampling method.

--initialize_xt_from_zt

Initialize the initial random noise by projecting from the canonical space.

--steps

Number of sampling steps.

--canonical_rgb_h

Resolution (height) of the RGB canonical space.

--canonical_rgb_w

Resolution (width) of the RGB canonical space.

--canonical_latent_h

Resolution (width) of the latent canonical space.

--canonical_latent_w

Resolution (width) of the latent canonical space.

--instance_latent_size

Resolution of the latent instance space.

--instance_rgb_size

Resolution of the RGB instance space.

--theta_range

Azimuthal range (0-360)

--theta_interval

Interval of the azimuth.

--FOV

Resolution of the RGB instance space.

--average_rgb

Perform averaging in the RGB domain (Only valid for Case 2 and Case 5).

3D Gaussians Texturing
python main.py --app gs --output ./output --prompt "A pink chair" --source_path ${DATA_PATH} --model_path ${3DGS_PATH} --dataset_type blender --case_num 2 --guidance_scale 35

--prompts

Text prompts to guide the generation process.

--save_top_dir

Directory to save intermediate/final outputs.

--tag

Tag output directory.

--save_dir_now

Save output directory with current time.

--case_num

Denoising case num. Refer to the main paper for other cases. (Case 2 - SyncTweedies)

--source_path

Path to input dataset (Refer to 3D Gaussian Splatting repo for data format).

--plyfile

Path to 3D Gaussians model plyfile.

--dataset_type

Input dataset type {colmap, blender}.

--zt_init

Initialize the initial random noise by projecting from the canonical space.


Citation

@article{kim2024synctweedies,
  title={SyncTweedies: A General Generative Framework Based on Synchronized Diffusions},
  author={Kim, Jaihoon and Koo, Juil and Yeo, Kyeongmin and Sung, Minhyuk},
  journal={arXiv preprint arXiv:2403.14370},
  year={2024}
}

Acknowledgement

This repository is based on Visual Anagrams, SyncMVD, and gsplat. We thank the authors for publicly releasing their codes.

About

Official implementation of SyncTweedies.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 69.9%
  • Cuda 27.6%
  • C++ 2.2%
  • Other 0.3%