Beyond First Impressions: Integrating Joint Multi-modal Cues for Comprehensive 3D Representation

The offical implemention of JM3D.

What is JM3D

JM3D is a Joint Multi-modal framework for unified 3D understanding, which constructs structured vision and language information for joint modeling to improve the representation learning of 3D models without extra latency.

What can we do

Cross-modal Retrieval

We present more accurate results in more challenge scenes like the head of plane and the back of the laptop.

Zero-shot on Model

Zero-shot classification on ModelNet40 with 8192 points.

Models	top-1	top-5
PointNet2(ssg)	62.2	79.3
PointMLP	65.8	82.1
PointBERT	61.8	81.7

Instruction

Environments

Follow the requirements, the code is trained with CUDA>=11.0 and pytorch>=1.10.1. Follow the command that:

conda create -n jm3d python=3.7.15
conda activate jm3d
conda install pytorch==1.10.1 torchvision==0.11.2 torchaudio==0.10.1 cudatoolkit=11.3 -c pytorch -c conda-forge
pip install -r requirements.txt

To train PointMLP, the pointnet2_ops_lib is needed. Download it and install it as:

pip install pointnet2_ops_lib/.

For PointBERT, the extensions and other libraries are needed. Download it and install it as:

# Chamfer Distance
cd extensions/chamfer_dist
python setup.py install --user

# EMD
cd extensions/emd
python setup.py install --user

# GPU KNN
pip install Ninja
pip install --upgrade https://github.com/unlimblue/KNN_CUDA/releases/download/0.2/KNN_CUDA-0.2-py3-none-any.whl

Dataset

Download the dataset on the data, which will be like:

./data
|-- ScanObjectNN
|   |-- ScanObjectNN_shape_names.txt
|   |-- main_split
|   |-- main_split_nobg
|-- modelnet40_normal_resampled
|   |-- modelnet10_test_1024pts.dat
|   |-- modelnet10_test_1024pts_fps.dat
|   |-- modelnet10_train_1024pts.dat
|   |-- modelnet10_train_1024pts_fps.dat
|   |-- modelnet40_shape_names.txt
|   |-- modelnet40_shape_names_modified.txt
|   |-- modelnet40_test.txt
|   |-- modelnet40_test_1024pts.dat
|   |-- modelnet40_test_1024pts_fps.dat
|   |-- modelnet40_test_8192pts_fps.dat
|   |-- modelnet40_train.txt
|   |-- modelnet40_train_1024pts.dat
|   |-- modelnet40_train_1024pts_fps.dat
|   |-- modelnet40_train_8192pts_fps.dat
|-- shapenet-55
|   |-- rendered_images
|   |-- shapenet_pc
|   |-- taxonomy.json
|   |-- test.txt
|   |-- train.txt
|-- initialize_models
|   |-- point_bert_pretrained.pt
|   |-- slip_base_100ep.pt
|-- ModelNet40.yaml
|-- ScanObjectNN.yaml
|-- ShapeNet-55.yaml
|-- dataset_3d.py
|-- dataset_catalog.json
|-- labels.json
|-- templates.json

The ModelNet40, ShapeNet55 and the initialize_models can be downloaded from here, and the ScanObjectNN can be downloaded from here.

If you change your path of data, you should change the path config in the (dataset).yaml. To use your customized dataset, you should set a new *.yaml and a new function in dataset_3d.py.

Train

Note : The default number of point is 8192 to be same as the previous work. You can modify the number of points to adapt your work space. By the way, we use the FPS to downsample the point clouds, which is time costed. The pre-processed data will speed up your training with a little performance drop.

Three backbones of PointMLP, PointNet++(ssg), and PointBERT are supported. You can modify your own configs in scripts for gpus, text usage, or the other setting in args.

# the scripts are named by its correspoinding 3D backbone name.
bash ./scripts/(choose your pre-train script)

Zero-shot Test

Be similiar to training, you can use:

bash ./scripts/(choose your test script) /path/to/your/checkpoint.pt

Croos-modal Retrieval

Same as zero-shot, you should change the test_zeroshot_3d_core() in test_zeroshot_3d() to cross_retrived()

Pretrained Models

We provide the pretrained models for PointMLP, PointNet++(ssg), and PointBERT. The models can be downloaded from here.

TODO

More supported backbone and the v2 is coming soon!

Acknowledgements

Thanks to the code base from ULIP.

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
JM3D-LLM		JM3D-LLM
assets		assets
data		data
models		models
scripts		scripts
utils		utils
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Beyond First Impressions: Integrating Joint Multi-modal Cues for Comprehensive 3D Representation

What is JM3D

What can we do

Cross-modal Retrieval

Zero-shot on Model

Instruction

Environments

Dataset

Train

Zero-shot Test

Croos-modal Retrieval

Pretrained Models

TODO

Acknowledgements

About

Releases

Packages

Languages

sosppxo/JM3D

Folders and files

Latest commit

History

Repository files navigation

Beyond First Impressions: Integrating Joint Multi-modal Cues for Comprehensive 3D Representation

What is JM3D

What can we do

Cross-modal Retrieval

Zero-shot on Model

Instruction

Environments

Dataset

Train

Zero-shot Test

Croos-modal Retrieval

Pretrained Models

TODO

Acknowledgements

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages