LucidFusion: Reconstructing 3D Gaussians with Arbitrary Unposed Images

Hao He$^{*}$ Yixun Liang$^{*}$, Luozhou Wang, Yuanhao Cai, Xinli Xu, Hao-Xiang Guo, Xiang Wen, Yingcong Chen$^{**}$

*: Equal contribution. **: Corresponding author.

Paper PDF (Arxiv) | Project Page | Model Weights | [Gradio Demo](Coming Soon)

Note: we compress these motion pictures for faster previewing.

📢 News

2024-11-08: We added 3 new preprocessed example cases for 256x256 resolution inputs. You can now run "dr_strange", "superman", and "minions_stuart" using demo.sh
2024-11-04: LucidFusion now supports 512x512 resolution inputs. Demo results released, and we will release the model soon!

🎏 Abstract

We present a flexible end-to-end feed-forward framework, named the LucidFusion, to reconstruct high-resolution 3D Gaussians from unposed, sparse, and arbitrary numbers of multiview images.

CLICK for the full abstract

Recent large reconstruction models have made notable progress in generating high-quality 3D objects from single images. However, current reconstruction methods often rely on explicit camera pose estimation or fixed viewpoints, restricting their flexibility and practical applicability. We reformulate 3D reconstruction as image-to-image translation and introduce the Relative Coordinate Map (RCM), which aligns multiple unposed images to a “main” view without pose estimation. While RCM simplifies the process, its lack of global 3D supervision can yield noisy outputs. To address this, we propose Relative Coordinate Gaussians (RCG) as an extension to RCM, which treats each pixel’s coordinates as a Gaussian center and employs differentiable rasterization for consistent geometry and pose recovery. Our LucidFusion framework handles an arbitrary number of unposed inputs, producing robust 3D reconstructions within seconds and paving the way for more flexible, pose-free 3D pipelines.

🔧 Training Instructions

Our inference code is now released! We will release our training code soon!

Install

conda create -n LucidFusion python=3.9.19
conda activate LucidFusion

# For example, we use torch 2.3.1 + cuda 11.8, and tested with latest torch (2.4.1) which works with the latest xformers (0.0.28).
pip install torch==2.4.1 torchvision==0.19.1 torchaudio==2.4.1 --index-url https://download.pytorch.org/whl/cu118

# Xformers is required! please refer to https://github.com/facebookresearch/xformers for details.
# [linux only] cuda 11.8 version
pip3 install -U xformers --index-url https://download.pytorch.org/whl/cu118

# For 3D Gaussian Splatting, we use LGM modified version, details please refer to https://github.com/3DTopia/LGM
git clone --recursive https://github.com/ashawkey/diff-gaussian-rasterization
pip install ./diff-gaussian-rasterization

# Other dependencies
pip install -r requirements.txt

Pretrained Weights

Our pre-trained weight is now released! Please check weights.

To download the pre-trained weights, simply run

python download.py

🔥 Inference

A shell script is provided with example files. Please make sure pre-trained weights is downloaded in the "pretrained" folder

cd LucidFusion
mkdir output/demo

We have also provided some preprocessed examples.

For GSO files, the example objects are "alarm", "chicken", "hat", "lunch_bag", "mario", and "shoe1".

To run GSO demo:

# You can adjust "DEMO" field inside the gso_demo.sh to load other examples.

bash scripts/gso_demo.sh

To run the images demo, masks are obtained using preprocess.py. The example objects are "nutella_new", "monkey_chair", "dog_chair".

bash scripts/demo.sh

To run the diffusion demo as a single-image-to-multi-view setup, we use the pixel diffusion trained in the CRM, as described in the paper. You can also use other multi-view diffusion models to generate multi-view outputs from a single image.

For dependencies issue, please check https://github.com/thu-ml/CRM

We also provide LGM's imagegen diffusion, simply set --crm=false in diffusion_demo.sh. You can change the --seed with different seed option.

bash script/diffusion_demo.sh

You can also try your own example! To do that:

Obtain images and place them in the examples folder:

LucidFusion
├── examples/
|   ├── "your obj name"/
|   |   ├── "image_01.png"
|   |   ├── "image_02.png"
|   |   ├── ...

Run preprocess.py to extract the recentered image and its mask:

# Run the following will create two folders (images, masks) in "your-obj-name" folder.
# You can check to see if the extract mask is corrected.
python preprocess.py examples/you-obj-name --outdir examples/your-obj-name

Modify demo.sh to set DEMO=“examples/your-obj-name”, then run the script:

bash scripts/demo.sh

🤗 Gradio Demo

For Gradio Demo test version, simply run

python app.py

Please note this demo is still under development, and check back later for the full version!

🚧 Todo

Release the inference codes
Release our weights
Release our high resolution input model weights
Release the Gardio Demo
Release the Stage 1 and 2 training codes

📍 Citation

If you find our work useful, please consider citing our paper.

@misc{he2024lucidfusion,
      title={LucidFusion: Generating 3D Gaussians with Arbitrary Unposed Images}, 
      author={Hao He and Yixun Liang and Luozhou Wang and Yuanhao Cai and Xinli Xu and Hao-Xiang Guo and Xiang Wen and Yingcong Chen},
      year={2024},
      eprint={2410.15636},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2410.15636}, 
}

💼 Acknowledgement

This work is built on many amazing research works and open-source projects:

Thanks for their excellent work and great contribution to 3D generation area.

Name		Name	Last commit message	Last commit date
Latest commit History 51 Commits
CRM		CRM
data		data
examples		examples
models		models
mvdream		mvdream
options		options
pretrained		pretrained
resources		resources
scripts		scripts
utils		utils
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md
app.py		app.py
demo.py		demo.py
download.py		download.py
preprocess.py		preprocess.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LucidFusion: Reconstructing 3D Gaussians with Arbitrary Unposed Images

📢 News

🎏 Abstract

🔧 Training Instructions

Install

Pretrained Weights

🔥 Inference

🤗 Gradio Demo

🚧 Todo

📍 Citation

💼 Acknowledgement

About

Releases

Packages

Contributors 2

Languages

License

EnVision-Research/LucidFusion

Folders and files

Latest commit

History

Repository files navigation

LucidFusion: Reconstructing 3D Gaussians with Arbitrary Unposed Images

📢 News

🎏 Abstract

🔧 Training Instructions

Install

Pretrained Weights

🔥 Inference

🤗 Gradio Demo

🚧 Todo

📍 Citation

💼 Acknowledgement

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages