forked from microsoft/TRELLIS
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Release inference code and TRELLIS-image-large
- Loading branch information
1 parent
ba4fcc0
commit 334d3b2
Showing
130 changed files
with
9,201 additions
and
2 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
[submodule "trellis/representations/mesh/flexicubes"] | ||
path = trellis/representations/mesh/flexicubes | ||
url = https://github.com/MaxtirError/FlexiCubes.git |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,205 @@ | ||
## Official repo for paper "Structured 3D Latents for Scalable and Versatile 3D Generation". | ||
<img src="assets/logo.webp" width="100%" align="center"> | ||
<h1 align="center">Structured 3D Latents<br>for Scalable and Versatile 3D Generation</h1> | ||
<p align="center"><a href="https://arxiv.org/abs/2412.01506"><img src='https://img.shields.io/badge/arXiv-Paper-red?logo=arxiv&logoColor=white' alt='arXiv'></a> | ||
<a href='https://trellis3d.github.io'><img src='https://img.shields.io/badge/Project_Page-Website-green?logo=googlechrome&logoColor=white' alt='Project Page'></a> | ||
<a href='https://huggingface.co/spaces/JeffreyXiang/TRELLIS'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Live_Demo-blue'></a> | ||
</p> | ||
<p align="center"><img src="assets/teaser.png" width="100%"></p> | ||
|
||
<span style="font-size: 16px; font-weight: 600;">T</span><span style="font-size: 12px; font-weight: 700;">RELLIS</span> is a large 3D asset generation model. It takes in text or image prompts and generates high-quality 3D assets in various formats, such as Radiance Fields, 3D Gaussians, and meshes. The cornerstone of <span style="font-size: 16px; font-weight: 600;">T</span><span style="font-size: 12px; font-weight: 700;">RELLIS</span> is a unified Structured LATent (<span style="font-size: 16px; font-weight: 600;">SL</span><span style="font-size: 12px; font-weight: 700;">AT</span>) representation that allows decoding to different output formats and Rectified Flow Transformers tailored for <span style="font-size: 16px; font-weight: 600;">SL</span><span style="font-size: 12px; font-weight: 700;">AT</span> as the powerful backbones. We provide large-scale pre-trained models with up to 2 billion parameters on a large 3D asset dataset of 500K diverse objects. <span style="font-size: 16px; font-weight: 600;">T</span><span style="font-size: 12px; font-weight: 700;">RELLIS</span> significantly surpasses existing methods, including recent ones at similar scales, and showcases flexible output format selection and local 3D editing capabilities which were not offered by previous models. | ||
|
||
***Check out our [Project Page](https://trellis3d.github.io) for more videos and interactive demos!*** | ||
|
||
<!-- Features --> | ||
## 🌟 Features | ||
- **High Quality**: It produces diverse 3D assets at high quality with intricate shape and texture details. | ||
- **Versatility**: It takes text or image prompts and can generate various final 3D representations including but not limited to Radiance Fields, 3D Gaussians, and meshes, accommodating diverse downstream requirements. | ||
- **Flexible Editing**: It allows for easy editings of generated 3D assets, such as generating variants of the same object or local editing of the 3D asset. | ||
|
||
<!-- TODO List --> | ||
## 🚧 TODO List | ||
- [x] Release inference code and TRELLIS-image-large model | ||
- [ ] Release TRELLIS-text model series | ||
- [ ] Release training code and data | ||
|
||
<!-- Installation --> | ||
## 📦 Installation | ||
|
||
### Prerequisites | ||
- Linux is recommended for running the code. The code is not tested on other platforms. | ||
- [Conda](https://docs.anaconda.com/miniconda/install/#quick-command-line-install) is recommended for managing the dependencies. | ||
- Python 3.8 or higher is required. | ||
- NVIDIA GPU with more than 16GB memory is required. The code has been tested on NVIDIA A100 and A6000 GPUs. | ||
- [CUDA Toolkit](https://developer.nvidia.com/cuda-toolkit-archive) is required to compile some of the submodules. We tested the code on CUDA 11.8 and 12.2. | ||
|
||
### Installation Steps | ||
1. Clone the repo: | ||
```sh | ||
git clone --recurse-submodules https://github.com/microsoft/TRELLIS.git | ||
cd TRELLIS | ||
``` | ||
|
||
2. Install the dependencies: | ||
|
||
**Before running the following command there are somethings to note:** | ||
- By adding `--new-env`, a new conda environment named `trellis` will be created. If you want to use an existing conda environment, please remove this flag. | ||
- By default the `trellis` environment will use pytorch 2.4.0 with CUDA 11.8. If you want to use a different version of CUDA (e.g., if you have CUDA Toolkit 12.2 installed and do not want to install another 11.8 version for submodule compilation), you can remove the `--new-env` flag and manually install the required dependencies. Refer to [PyTorch](https://pytorch.org/get-started/previous-versions/) for the installation command. | ||
- If you have multiple CUDA Toolkit versions installed, `PATH` should be set to the correct version before running the command. For example, if you have CUDA Toolkit 11.8 and 12.2 installed, you should run `export PATH=/usr/local/cuda-11.8/bin:$PATH` before running the command. | ||
- By default, the code uses the `flash-attn` backend for attention. For GPUs do not support `flash-attn` (e.g., NVIDIA V100), you can remove the `--flash-attn` flag to install `xformers` only and set the `ATTN_BACKEND` environment variable to `xformers` before running the code. See the [Minimal Example](#minimal-example) for more details. | ||
- The installation may take a while due to the large number of dependencies. Please be patient. If you encounter any issues, you can try to install the dependencies one by one, specifying one flag at a time. | ||
- If you encounter any issues during the installation, feel free to open an issue or contact us. | ||
|
||
Create a new conda environment named `trellis` and install the dependencies: | ||
```sh | ||
. ./setup.sh --new-env --basic --xformers --flash-attn --diffoctreerast --spconv --mipgaussian --kaolin --nvdiffrast | ||
``` | ||
The detailed usage of `setup.sh` can be found by running `. ./setup.sh --help`. | ||
```sh | ||
Usage: setup.sh [OPTIONS] | ||
Options: | ||
-h, --help Display this help message | ||
--new-env Create a new conda environment | ||
--basic Install basic dependencies | ||
--xformers Install xformers | ||
--flash-attn Install flash-attn | ||
--diffoctreerast Install diffoctreerast | ||
--vox2seq Install vox2seq | ||
--spconv Install spconv | ||
--mipgaussian Install mip-splatting | ||
--kaolin Install kaolin | ||
--nvdiffrast Install nvdiffrast | ||
--demo Install all dependencies for demo | ||
``` | ||
|
||
<!-- Pretrained Models --> | ||
## 🤖 Pretrained Models | ||
|
||
We provide the following pretrained models: | ||
|
||
| Model | Description | #Params | Download | | ||
| --- | --- | --- | --- | | ||
| TRELLIS-image-large | Large image-to-3D model | 1.2B | [Download](https://huggingface.co/JeffreyXiang/TRELLIS-image-large) | | ||
| TRELLIS-text-base | Base text-to-3D model | 342M | Coming Soon | | ||
| TRELLIS-text-large | Large text-to-3D model | 1.1B | Coming Soon | | ||
| TRELLIS-text-xlarge | Extra-large text-to-3D model | 2.0B | Coming Soon | | ||
|
||
The models are hosted on Hugging Face. You can directly load the models with their repository names in the code: | ||
```python | ||
TrellisImageTo3DPipeline.from_pretrained("JeffreyXiang/TRELLIS-image-large") | ||
``` | ||
|
||
If you prefer loading the model from local, you can download the model files from the links above and load the model with the folder path (folder structure should be maintained): | ||
```python | ||
TrellisImageTo3DPipeline.from_pretrained("/path/to/TRELLIS-image-large") | ||
``` | ||
|
||
<!-- Usage --> | ||
## 💡 Usage | ||
|
||
### Minimal Example | ||
|
||
Here is an [example](example.py) of how to use the pretrained models for 3D asset generation. | ||
|
||
```python | ||
import os | ||
# os.environ['ATTN_BACKEND'] = 'xformers' # Can be 'flash-attn' or 'xformers', default is 'flash-attn' | ||
os.environ['SPCONV_ALGO'] = 'native' # Can be 'native' or 'auto', default is 'auto'. | ||
# 'auto' is faster but will do benchmarking at the beginning. | ||
# Recommended to set to 'native' if run only once. | ||
import imageio | ||
from PIL import Image | ||
from trellis.pipelines import TrellisImageTo3DPipeline | ||
from trellis.utils import render_utils, postprocessing_utils | ||
# Load a pipeline from a model folder or a Hugging Face model hub. | ||
pipeline = TrellisImageTo3DPipeline.from_pretrained("JeffreyXiang/TRELLIS-image-large") | ||
pipeline.cuda() | ||
# Load an image | ||
image = Image.open("assets/example_image/T.png") | ||
# Run the pipeline | ||
outputs = pipeline.run( | ||
image, | ||
# Optional parameters | ||
seed=1, | ||
# sparse_structure_sampler_params={ | ||
# "steps": 12, | ||
# "cfg_strength": 7.5, | ||
# }, | ||
# slat_sampler_params={ | ||
# "steps": 12, | ||
# "cfg_strength": 3, | ||
# }, | ||
) | ||
# outputs is a dictionary containing generated 3D assets in different formats: | ||
# - outputs['gaussian']: a list of 3D Gaussians | ||
# - outputs['radiance_field']: a list of radiance fields | ||
# - outputs['mesh']: a list of meshes | ||
# Render the outputs | ||
video = render_utils.render_video(outputs['gaussian'][0])['color'] | ||
imageio.mimsave("sample_gs.mp4", video, fps=30) | ||
video = render_utils.render_video(outputs['radiance_field'][0])['color'] | ||
imageio.mimsave("sample_rf.mp4", video, fps=30) | ||
video = render_utils.render_video(outputs['mesh'][0])['normal'] | ||
imageio.mimsave("sample_mesh.mp4", video, fps=30) | ||
# GLB files can be extracted from the outputs | ||
glb = postprocessing_utils.to_glb( | ||
outputs['gaussian'][0], | ||
outputs['mesh'][0], | ||
# Optional parameters | ||
simplify=0.95, # Ratio of triangles to remove in the simplification process | ||
texture_size=1024, # Size of the texture used for the GLB | ||
) | ||
glb.export("sample.glb") | ||
``` | ||
|
||
After running the code, you will get the following files: | ||
- `sample_gs.mp4`: a video showing the 3D Gaussian representation | ||
- `sample_rf.mp4`: a video showing the Radiance Field representation | ||
- `sample_mesh.mp4`: a video showing the mesh representation | ||
- `sample.glb`: a GLB file containing the extracted textured mesh | ||
|
||
|
||
### Web Demo | ||
|
||
[app.py](app.py) provides a simple web demo for 3D asset generation. Since this demo is based on [Gradio](https://gradio.app/), additional dependencies are required: | ||
```sh | ||
. ./setup.sh --demo | ||
``` | ||
|
||
After installing the dependencies, you can run the demo with the following command: | ||
```sh | ||
python app.py | ||
``` | ||
|
||
Then, you can access the demo at the address shown in the terminal. | ||
|
||
***The web demo is also available on [Hugging Face Spaces](https://huggingface.co/spaces/JeffreyXiang/TRELLIS)!*** | ||
|
||
|
||
<!-- License --> | ||
## ⚖️ License | ||
|
||
TRELLIS models and the majority of the code are licensed under the [MIT License](LICENSE). Some submodules may have different licenses: | ||
- [**diffoctreerast**](https://github.com/JeffreyXiang/diffoctreerast): The CUDA-based real-time differentiable octree renderer we developed to render radiance fields in this project. It is a derivative of the [diff-gaussian-rasterization](https://github.com/graphdeco-inria/diff-gaussian-rasterization) and is licensed under [LICENSE](https://github.com/JeffreyXiang/diffoctreerast/blob/master/LICENSE). | ||
- [**Modified Flexicubes**](https://github.com/MaxtirError/FlexiCubes): The modified version of [Flexicubes](https://github.com/nv-tlabs/FlexiCubes) used in this project to support vertex attributes. It is licensed under [LICENSE](https://github.com/nv-tlabs/FlexiCubes/blob/main/LICENSE.txt) | ||
|
||
|
||
<!-- Citation --> | ||
## 📜 Citation | ||
|
||
If you find this work helpful, please consider citing our paper: | ||
|
||
```bibtex | ||
@article{xiang2024structured, | ||
title = {Structured 3D Latents for Scalable and Versatile 3D Generation}, | ||
author = {Xiang, Jianfeng and Lv, Zelong and Xu, Sicheng and Deng, Yu and Wang, Ruicheng and Zhang, Bowen and Chen, Dong and Tong, Xin and Yang, Jiaolong}, | ||
journal = {arXiv preprint arXiv:2412.01506}, | ||
year = {2024} | ||
} | ||
``` | ||
|
||
Code and models will be released by December 10th. Stay tuned. |
Oops, something went wrong.