Skip to content

[ICLR'24 Spotlight] Uni3D: 3D Visual Representation from BAAI

License

Notifications You must be signed in to change notification settings

baaivision/Uni3D

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

overview

We present Uni3D, a unified and scalable 3D pretraining framework for large-scale 3D representation learning, and explore its limits at the scale of one billion parameters. Uni3D uses a 2D initialized ViT end-to-end pretrained to align the 3D point cloud features with the image-text aligned features. Via the simple architecture and pretext task, Uni3D can leverage abundant 2D pretrained models as initialization and image-text aligned models as the target, unlocking the great potential of 2D models and scaling-up strategies to the 3D world. We efficiently scale up Uni3D to one billion parameters, and set new records on a broad range of 3D tasks.

Schedule

We are committed to open-sourcing Uni3D related materials, including:

  • Extended Uni3D to a 3D metric (Uni3D-score) for enhanced semantic coherence in text-to-3D tasks. For details, see GeoDream.
  • The weights of models range from 6M to 1B parameters.
  • Evaluation code
  • Evaluation data
  • Pretraining code
  • Pretraining data

We hope to foster the growth of our community through open-sourcing and promoting collaboration👬. Let's step towards multimodal intelligence together🍻.

Installation

Clone this repository and install the required packages:

git clone https://github.com/baaivision/Uni3D.git
cd Uni3D

conda create -n uni3d python=3.8
conda activate uni3d
conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia

pip install -r requirements.txt

# install pointnet2 extensions from https://github.com/erikwijmans/Pointnet2_PyTorch
pip install "git+git://github.com/erikwijmans/Pointnet2_PyTorch.git#egg=pointnet2_ops&subdirectory=pointnet2_ops_lib"

Core packages:

Model Zoo

Model Training Data Objaverse-LVIS Top1 (Top5) ModelNet40 Top1 (Top5) ScanObjectNN Top1 (Top5)
Uni3d-B Ensembled w/o LVIS 45.9 (74.8) 86.1 (98.7) 61.7 (89.5)
Uni3d-B Ensembled 51.7 (80.8) 86.3 (97.9) 63.8 (90.2)
Uni3d-L Ensembled w/o LVIS 46.2 (74.7) 86.6 (97.8) 58.4 (90.1)
Uni3d-L Ensembled 53.1 (81.5) 86.3 (98.3) 58.2 (89.4)
Uni3d-g Ensembled w/o LVIS 47.2 (76.1) 86.8 (98.4) 66.5 (90.1)
Uni3d-g Ensembled 53.5 (82.0) 87.3 (99.2) 63.9 (91.7)
Uni3d-g 🔥 Ensembled 55.3 (82.9) 88.2 (99.3) 65.3 (92.7)

Evaluation of Zero-shot 3D classification

We evaluate the zero-shot 3D classification performance on three datasets: Objaverse-LVIS, ModelNet40 and ScanObjectNN.

  1. Please refer to DATASETS.md for evaluation dataset preparation.
  2. [Recommended 🤗] Download the clip model and put it in /path/to/clip_model folder.
  3. Download model zoo weights and put them in /path/to/checkpoints folder.
  4. Run bash scripts/inference.sh [scale] to evaluate the model on the above datasets, e.g., bash scripts/inference.sh giant.

Pre-training

  1. Please refer to DATASETS.md for pre-train dataset preparation.
  2. [Recommended 🤗] Download the clip model and put it in /path/to/clip_model folder.
  3. [Recommended 🤗] Download the initialization model and put it in /path/to/init_model folder.
  4. Run bash scripts/pretrain.sh to pre-train the model on ensemble datasets.

Visualization

Open-world Understanding

scene

One-shot Part Segmentation

partseg

Point Cloud Painting

editing

Cross-modal Retrieval

retrival_text

retrival

Acknowledgement

Uni3D is built using the awesome EVA, OpenCLIP, timm, DeepSpeed, ULIP and OpenShape.

Citation

@inproceedings{zhou2023uni3d,
  title={Uni3d: Exploring unified 3d representation at scale},
  author={Zhou, Junsheng and Wang, Jinsheng and Ma, Baorui and Liu, Yu-Shen and Huang, Tiejun and Wang, Xinlong},
  booktitle={International Conference on Learning Representations (ICLR)},
  year={2024}
}