This is the official implementation of NeurIPS 2024 D&B track paper "Point Cloud Matters: Rethinking the Impact of Different Observation Spaces on Robot Learning". Real-world codes can be found in RealRobot.
In robot learning, the observation space is crucial due to the distinct characteristics of different modalities, which can potentially become a bottleneck alongside policy design. In this study, we explore the influence of various observation spaces on robot learning, focusing on three predominant modalities: RGB, RGB-D, and point cloud. We introduce OBSBench, a benchmark comprising two simulators and 125 tasks, along with standardized pipelines for various encoders and policy baselines. Extensive experiments on diverse contact-rich manipulation tasks reveal a notable trend: point cloud-based methods, even those with the simplest designs, frequently outperform their RGB and RGB-D counterparts. This trend persists in both scenarios: training from scratch and utilizing pre-training. Furthermore, our findings demonstrate that point cloud observations often yield better policy performance and significantly stronger generalization capabilities across various geometric and visual conditions. These outcomes suggest that the 3D point cloud is a valuable observation modality for intricate robotic tasks. We also suggest that incorporating both appearance and coordinate information can enhance the performance of point cloud methods. We hope our work provides valuable insights and guidance for designing more generalizable and robust robotic models.
- Project Structure
- Installation
- Data Preparation
- Training and Evaluation
- Gotchas
- Trouble Shooting
- License
- Acknowledgement
- Citation
Our codebase draws significant inspiration from the excellent Lightning Hydra Template. The directory structure of this project is organized as follows:
Show directory structure
βββ .github <- Github Actions workflows
β
βββ configs <- Hydra configs
β βββ callbacks <- Callbacks configs
β βββ data <- Data configs
β βββ debug <- Debugging configs
β βββ exp_maniskill2_act_policy <- ManiSkill2 w. ACT policy experiment configs
| βββ exp_maniskill2_diffusion_policy <- ManiSkill2 w. diffusion policy experiment configs
β βββ extras <- Extra utilities configs
β βββ hydra <- Hydra configs
β βββ local <- Local configs
β βββ logger <- Logger configs
β βββ model <- Model configs
β βββ paths <- Project paths configs
β βββ trainer <- Trainer configs
| |
β βββ train.yaml <- Main config for training
β
βββ data <- Project data, e.g. ManiSkill2 replayed trajectories
β
βββ logs <- Logs generated by hydra and lightning loggers
β
βββ scripts <- Shell scripts
|
βββ src <- Source code
β βββ data <- Data scripts
β βββ models <- Model scripts
β βββ utils <- Utility scripts
β β
β βββ validate.py <- Run evaluation
β βββ train.py <- Run training
β
βββ .gitignore <- List of files ignored by git
βββ .project-root <- File for inferring the position of project root directory
βββ requirements.txt <- File for installing python dependencies
βββ setup.py <- File for installing project as a package
βββ README.md
Basics
# clone project
git clone https://github.com/HaoyiZhu/PointCloudMatters.git
cd PointCloudMatters
# crerate conda environment
conda create -n pcm python=3.11 -y
conda activate pcm
# install PyTorch, please refer to https://pytorch.org/ for other CUDA versions
# e.g. cuda 11.8:
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
# install basic packages
pip3 install -r requirements.txt
Point cloud related
# please install with your PyTorch and CUDA version
# e.g. torch 2.3.0 + cuda 118:
pip install torch-scatter torch-sparse torch-cluster -f https://data.pyg.org/whl/torch-2.3.0+cu118
Note:
spconv
must matches your CUDA version, see official Github for more information.
# e.g. for CUDA 11.8:
pip3 install spconv-cu118
# build FPS sampling operations (CUDA required)
cd libs/pointops
# docker & multi GPU arch
TORCH_CUDA_ARCH_LIST="ARCH LIST" python setup.py install
# e.g. 7.5: RTX 3000; 8.0: a100 More available in: https://developer.nvidia.com/cuda-gpus
TORCH_CUDA_ARCH_LIST="7.5 8.0" python setup.py install
cd ../..
ManiSkill2
pip install mani-skill2==0.5.3 && pip cache purge
You can test whether your ManiSkill2
is installed successfully by running:
python -m mani_skill2.examples.demo_random_action
RLBench
Note: Installing RLbench can be challenging. We recommend referring to PerAct's installation guides for more assistance.
Follow instructions from the official PyRep repo; reproduced here for convenience:
PyRep requires version 4.1 of CoppeliaSim. Download:
Once you have downloaded CoppeliaSim, you can pull PyRep from git:
cd <install_dir>
git clone https://github.com/stepjam/PyRep.git
cd PyRep
Add the following to your ~/.bashrc file: (NOTE: the 'EDIT ME' in the first line)
export COPPELIASIM_ROOT=<EDIT ME>/PATH/TO/COPPELIASIM/INSTALL/DIR
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$COPPELIASIM_ROOT
export QT_QPA_PLATFORM_PLUGIN_PATH=$COPPELIASIM_ROOT
Remember to source your bashrc (source ~/.bashrc
) or
zshrc (source ~/.zshrc
) after this.
Warning: CoppeliaSim might cause conflicts with ROS workspaces.
Finally install the python library:
pip install -r requirements.txt
pip install .
You should be good to go! You could try running one of the examples in the examples/ folder.
If you encounter errors, please use the PyRep issue tracker.
We use PerAct's RLBench fork.
cd <install_dir>
git clone -b peract https://github.com/MohitShridhar/RLBench.git # note: 'peract' branch
cd RLBench
pip install -r requirements.txt
python setup.py develop
For running in headless mode, tasks setups, and other issues, please refer to the official repo.
ManiSkill2
You can simply run the following to download and replay demonstrations:
bash scripts/download_and_replay_maniskill2.sh
RLBench
PerAct has provided pre-generated RLBench demonstrations for the 18 tasks it used. Each task contains 100 episodes for training, and 25 for testing and validation. Please download and extract them into ./data/rlbench/raw
. Your data directory structure may look like the following:
βββ data
β βββ ...
β βββ rlbench
β β βββ raw
| | | βββ train
| | | | βββ close_jar
| | | | | βββ all_variations
| | | | | | βββ episodes
| | | | | | | βββ episode0
| | | | | | | βββ episode1
| | | | | | | βββ ...
| | | | βββ open_drawer
| | | | βββ ...
| | | βββ val
| | | | βββ ...
| | | βββ test
| | | | βββ ...
β βββ ...
To facilite the data loading speed during training, we provide a script to pre-process the raw data. You can run the following example command and it will generate processed data under ./data/rlbench/processed
.
# e.g. to pre-process task turn_tap with front camera:
python scripts/preprocess_rlbench.py --task_names turn_tap --camera_views front
You can also generate your own data on all tasks RLBench supported.
Coming soon.
ManiSkill2
-
Train with RGB(-D) image observation:
# ACT policy example: python src/train.py exp_maniskill2_act_policy=base exp_maniskill2_act_policy/maniskill2_task@maniskill2_task=${task} exp_maniskill2_act_policy/maniskill2_model@maniskill2_model=${model} seed=${seed} # Diffusion policy example: python src/train.py exp_maniskill2_diffusion_policy=base exp_maniskill2_diffusion_policy/maniskill2_task@maniskill2_task=${task} exp_maniskill2_diffusion_policy/maniskill2_model@maniskill2_model=${model} seed=${seed}
-
Train with point cloud observation:
# ACT policy example: python src/train.py exp_maniskill2_act_policy=base exp_maniskill2_act_policy/maniskill2_pcd_task@maniskill2_pcd_task=${task} exp_maniskill2_act_policy/maniskill2_model@maniskill2_model=${model} seed=${seed} # Diffusion policy example: python src/train.py exp_maniskill2_diffusion_policy=base exp_maniskill2_diffusion_policy/maniskill2_pcd_task@maniskill2_pcd_task=${task} exp_maniskill2_diffusion_policy/maniskill2_model@maniskill2_model=${model} seed=${seed}
-
Evaluate a checkpoint:
python src/validate.py exp_maniskill2_act_policy=base exp_maniskill2_act_policy/maniskill2_pcd_task@maniskill2_pcd_task=${task} exp_maniskill2_act_policy/maniskill2_model@maniskill2_model=${model} ckpt_path=${path/to/checkpoint} seed=${seed}
-
Zero-shot generalization evaluation:
- To evaluate camera view generalization experiments, run scripts/run_maniskill2_camera_view.sh. The script evaluates the given
checkpoint
of the givenmodel
on the giventask
with four different camera views, using the specifiedseed
. See the script for more details. For example:
bash scripts/run_maniskill2_camera_view.sh ${path/to/checkpoint} ${task} ${model} ${seed}
- To evaluate visual changes generalization experiments, run scripts/run_maniskill2_visual_changes.sh. The script evaluates the given
checkpoint
of the givenmodel
with different lighting conditions, noise levels and background colors, using the specifiedseed
. See the script for more details. Note that currently onlyStackCube
task is supported. For example:
bash scripts/run_maniskill2_visual_changes.sh ${path/to/checkpoint} ${model} ${seed}
- To evaluate camera view generalization experiments, run scripts/run_maniskill2_camera_view.sh. The script evaluates the given
Detailed configurations can be found in configs/exp_maniskill2_act_policy and configs/exp_maniskill2_diffusion_policy.
Currently supported tasks can be found in configs/exp_maniskill2_act_policy/maniskill2_task, configs/exp_maniskill2_act_policy/maniskill2_pcd_task, configs/exp_maniskill2_diffusion_policy/maniskill2_task and configs/exp_maniskill2_diffusion_policy/maniskill2_pcd_task.
Currently supported models can be found in configs/exp_maniskill2_act_policy/maniskill2_model and configs/exp_maniskill2_diffusion_policy/maniskill2_model.
RLBench
-
Train with RGB(-D) image observation:
# ACT policy example: python src/train.py exp_rlbench_act_policy=base rlbench_task=${task} exp_rlbench_act_policy/rlbench_model@rlbench_model=${model} seed=${seed} # Diffusion policy example: python src/train.py exp_rlbench_diffusion_policy=base rlbench_task=${task} exp_rlbench_diffusion_policy/rlbench_model@rlbench_model=${model} seed=${seed}
-
Train with point cloud observation:
# ACT policy example: python src/train.py exp_rlbench_act_policy=base rlbench_task=${task} exp_rlbench_act_policy/rlbench_model@rlbench_model=${model} seed=${seed} # Diffusion policy example: python src/train.py exp_rlbench_diffusion_policy=base rlbench_task=${task} exp_rlbench_diffusion_policy/rlbench_model@rlbench_model=${model} seed=${seed}
-
Evaluate a checkpoint:
# ACT policy example: python src/test_rlbench_act.py exp_rlbench_act_policy=base rlbench_task=${task} exp_rlbench_act_policy/rlbench_model@rlbench_model=${model} seed=${seed} ckpt_path=${path/to/checkpoint}
-
Zero-shot camera-view generalization evaluation: To evaluate camera view generalization experiments, run scripts/run_rlbench_camera_view.sh. The script evaluates the given
checkpoint
of the givenpolicy
andmodel
on the giventask
with four different camera views, using the specifiedseed
. See the script for more details. For example:# policy: either diffusion or act bash scripts/run_rlbench_camera_view.sh ${policy} ${path/to/checkpoint} ${task} ${model} ${seed}
Detailed configurations can be found in configs/exp_rlbench_act_policy and configs/exp_rlbench_diffusion_policy.
Currently supported models can be found in configs/exp_rlbench_act_policy/rlbench_model and configs/exp_rlbench_diffusion_policy/rlbench_model.
Override any config parameter from command line
This codebase is based on Hydra, which allows for convenient configuration overriding:
python src/train.py trainer.max_epochs=20 seed=300
Note: You can also add new parameters with
+
sign.
python src/train.py +some_new_param=some_new_value
Train on CPU, GPU, multi-GPU and TPU
# train on CPU
python src/train.py trainer=cpu
# train on 1 GPU
python src/train.py trainer=gpu
# train on TPU
python src/train.py +trainer.tpu_cores=8
# train with DDP (Distributed Data Parallel) (4 GPUs)
python src/train.py trainer=ddp trainer.devices=4
# train with DDP (Distributed Data Parallel) (8 GPUs, 2 nodes)
python src/train.py trainer=ddp trainer.devices=4 trainer.num_nodes=2
# simulate DDP on CPU processes
python src/train.py trainer=ddp_sim trainer.devices=2
# accelerate training on mac
python src/train.py trainer=mps
Train with mixed precision
# train with pytorch native automatic mixed precision (AMP)
python src/train.py trainer=gpu +trainer.precision=16
Use different tricks available in Pytorch Lightning
# gradient clipping may be enabled to avoid exploding gradients
python src/train.py trainer.gradient_clip_val=0.5
# run validation loop 4 times during a training epoch
python src/train.py +trainer.val_check_interval=0.25
# accumulate gradients
python src/train.py trainer.accumulate_grad_batches=10
# terminate training after 12 hours
python src/train.py +trainer.max_time="00:12:00:00"
Note: PyTorch Lightning provides about 40+ useful trainer flags.
Easily debug
# runs 1 epoch in default debugging mode
# changes logging directory to `logs/debugs/...`
# sets level of all command line loggers to 'DEBUG'
# enforces debug-friendly configuration
python src/train.py debug=default
# run 1 train, val and test loop, using only 1 batch
python src/train.py debug=fdr
# print execution time profiling
python src/train.py debug=profiler
# try overfitting to 1 batch
python src/train.py debug=overfit
# raise exception if there are any numerical anomalies in tensors, like NaN or +/-inf
python src/train.py +trainer.detect_anomaly=true
# use only 20% of the data
python src/train.py +trainer.limit_train_batches=0.2 \
+trainer.limit_val_batches=0.2 +trainer.limit_test_batches=0.2
Note: Visit configs/debug/ for different debugging configs.
Resume training from checkpoint
python src/train.py ckpt_path="/path/to/ckpt/name.ckpt"
Note: Checkpoint can be either path or URL.
Note: Currently loading ckpt doesn't resume logger experiment, but it will be supported in future Lightning release.
Create a sweep over hyperparameters
# this will run 9 experiments one after the other,
# each with different combination of seed and learning rate
python src/train.py -m seed=100,200,300 model.optimizer.lr=0.0001,0.00005,0.00001
Note: Hydra composes configs lazily at job launch time. If you change code or configs after launching a job/sweep, the final composed configs might be impacted.
Execute all experiments from folder
python src/train.py -m 'exp_maniskill2_act_policy/maniskill2_task@maniskill2_task=glob(*)'
Note: Hydra provides special syntax for controlling behavior of multiruns. Learn more here. The command above executes all task experiments from configs/exp_maniskill2_act_policy/maniskill2_task.
Execute run for multiple different seeds
python src/train.py -m seed=100,200,300 trainer.deterministic=True
Note:
trainer.deterministic=True
makes pytorch more deterministic but impacts the performance.
For more instructions, refer to the official documentation for Pytorch Lightning, Hydra, and Lightning Hydra Template.
See TroubleShooting.md.
This repository is released under the MIT license.
Our code is primarily built upon Pytorch Lightning, Hydra, Lightning Hydra Template, ManiSkill2, RLBench, PerAct, ACT, Diffusion Policy, TIMM, PonderV2, MultiMAE, Pointcept, VC1, R3M. We extend our gratitude to all these authors for their generously open-sourced code and their significant contributions to the community.
@article{zhu2024point,
title={Point Cloud Matters: Rethinking the Impact of Different Observation Spaces on Robot Learning},
author={Zhu, Haoyi and Wang, Yating and Huang, Di and Ye, Weicai and Ouyang, Wanli and He, Tong},
journal={arXiv preprint arXiv:2402.02500},
year={2024}
}