Skip to content

[NeurIPS 2024 D&B] Point Cloud Matters: Rethinking the Impact of Different Observation Spaces on Robot Learning


Notifications You must be signed in to change notification settings


Folders and files

Last commit message
Last commit date

Latest commit



15 Commits

Repository files navigation

Point Cloud Matters: Rethinking the Impact of Different Observation Spaces on Robot Learning

python pytorch lightning hydra black isort license

Project Page | Arxiv

Haoyi Zhu, Yating Wang, Di Huang, Weicai Ye, Wanli Ouyang, Tong He


This is the official implementation of NeurIPS 2024 D&B track paper "Point Cloud Matters: Rethinking the Impact of Different Observation Spaces on Robot Learning". Real-world codes can be found in RealRobot.

In robot learning, the observation space is crucial due to the distinct characteristics of different modalities, which can potentially become a bottleneck alongside policy design. In this study, we explore the influence of various observation spaces on robot learning, focusing on three predominant modalities: RGB, RGB-D, and point cloud. We introduce OBSBench, a benchmark comprising two simulators and 125 tasks, along with standardized pipelines for various encoders and policy baselines. Extensive experiments on diverse contact-rich manipulation tasks reveal a notable trend: point cloud-based methods, even those with the simplest designs, frequently outperform their RGB and RGB-D counterparts. This trend persists in both scenarios: training from scratch and utilizing pre-training. Furthermore, our findings demonstrate that point cloud observations often yield better policy performance and significantly stronger generalization capabilities across various geometric and visual conditions. These outcomes suggest that the 3D point cloud is a valuable observation modality for intricate robotic tasks. We also suggest that incorporating both appearance and coordinate information can enhance the performance of point cloud methods. We hope our work provides valuable insights and guidance for designing more generalizable and robust robotic models.

πŸ“‹ Contents

πŸ”­ Project Structure

Our codebase draws significant inspiration from the excellent Lightning Hydra Template. The directory structure of this project is organized as follows:

Show directory structure
β”œβ”€β”€ .github                   <- Github Actions workflows
β”œβ”€β”€ configs                   <- Hydra configs
β”‚   β”œβ”€β”€ callbacks                         <- Callbacks configs
β”‚   β”œβ”€β”€ data                              <- Data configs
β”‚   β”œβ”€β”€ debug                             <- Debugging configs
β”‚   β”œβ”€β”€ exp_maniskill2_act_policy         <- ManiSkill2 w. ACT policy experiment configs
|   β”œβ”€β”€ exp_maniskill2_diffusion_policy   <- ManiSkill2 w. diffusion policy experiment configs
β”‚   β”œβ”€β”€ extras                            <- Extra utilities configs
β”‚   β”œβ”€β”€ hydra                             <- Hydra configs
β”‚   β”œβ”€β”€ local                             <- Local configs
β”‚   β”œβ”€β”€ logger                            <- Logger configs
β”‚   β”œβ”€β”€ model                             <- Model configs
β”‚   β”œβ”€β”€ paths                             <- Project paths configs
β”‚   β”œβ”€β”€ trainer                           <- Trainer configs
|   |
β”‚   └── train.yaml            <- Main config for training
β”œβ”€β”€ data                   <- Project data, e.g. ManiSkill2 replayed trajectories
β”œβ”€β”€ logs                   <- Logs generated by hydra and lightning loggers
β”œβ”€β”€ scripts                <- Shell scripts
β”œβ”€β”€ src                    <- Source code
β”‚   β”œβ”€β”€ data                     <- Data scripts
β”‚   β”œβ”€β”€ models                   <- Model scripts
β”‚   β”œβ”€β”€ utils                    <- Utility scripts
β”‚   β”‚
β”‚   β”œβ”€β”€              <- Run evaluation
β”‚   └──                 <- Run training
β”œβ”€β”€ .gitignore                <- List of files ignored by git
β”œβ”€β”€ .project-root             <- File for inferring the position of project root directory
β”œβ”€β”€ requirements.txt          <- File for installing python dependencies
β”œβ”€β”€                  <- File for installing project as a package

πŸ”¨ Installation

# clone project
git clone
cd PointCloudMatters

# crerate conda environment
conda create -n pcm python=3.11 -y
conda activate pcm

# install PyTorch, please refer to for other CUDA versions
# e.g. cuda 11.8:
pip3 install torch torchvision torchaudio --index-url
# install basic packages
pip3 install -r requirements.txt
Point cloud related
# please install with your PyTorch and CUDA version
# e.g. torch 2.3.0 + cuda 118:
pip install torch-scatter torch-sparse torch-cluster -f

Note: spconv must matches your CUDA version, see official Github for more information.

# e.g. for CUDA 11.8:
pip3 install spconv-cu118
# build FPS sampling operations (CUDA required)
cd libs/pointops
# docker & multi GPU arch
# e.g. 7.5: RTX 3000; 8.0: a100 More available in:
TORCH_CUDA_ARCH_LIST="7.5 8.0" python install
cd ../..
pip install mani-skill2==0.5.3 && pip cache purge

You can test whether your ManiSkill2 is installed successfully by running:

python -m mani_skill2.examples.demo_random_action

Note: Installing RLbench can be challenging. We recommend referring to PerAct's installation guides for more assistance.

1. PyRep and Coppelia Simulator

Follow instructions from the official PyRep repo; reproduced here for convenience:

PyRep requires version 4.1 of CoppeliaSim. Download:

Once you have downloaded CoppeliaSim, you can pull PyRep from git:

cd <install_dir>
git clone
cd PyRep

Add the following to your ~/.bashrc file: (NOTE: the 'EDIT ME' in the first line)


Remember to source your bashrc (source ~/.bashrc) or zshrc (source ~/.zshrc) after this.

Warning: CoppeliaSim might cause conflicts with ROS workspaces.

Finally install the python library:

pip install -r requirements.txt
pip install .

You should be good to go! You could try running one of the examples in the examples/ folder.

If you encounter errors, please use the PyRep issue tracker.

2. RLBench

We use PerAct's RLBench fork.

cd <install_dir>
git clone -b peract # note: 'peract' branch

cd RLBench
pip install -r requirements.txt
python develop

For running in headless mode, tasks setups, and other issues, please refer to the official repo.

πŸ” Data Preparation


You can simply run the following to download and replay demonstrations:

bash scripts/

1. Quick Start with PerAct's Pre-generated Datasets

PerAct has provided pre-generated RLBench demonstrations for the 18 tasks it used. Each task contains 100 episodes for training, and 25 for testing and validation. Please download and extract them into ./data/rlbench/raw. Your data directory structure may look like the following:

β”œβ”€β”€ data
β”‚   β”œβ”€β”€ ...
β”‚   β”œβ”€β”€ rlbench
β”‚   β”‚   β”œβ”€β”€ raw
|   |   |   β”œβ”€β”€ train
|   |   |   |   β”œβ”€β”€ close_jar
|   |   |   |   |   β”œβ”€β”€ all_variations
|   |   |   |   |   |   β”œβ”€β”€ episodes
|   |   |   |   |   |   |   β”œβ”€β”€ episode0
|   |   |   |   |   |   |   β”œβ”€β”€ episode1
|   |   |   |   |   |   |   β”œβ”€β”€ ...
|   |   |   |   β”œβ”€β”€ open_drawer
|   |   |   |   β”œβ”€β”€ ...
|   |   |   β”œβ”€β”€ val
|   |   |   |   β”œβ”€β”€ ...
|   |   |   β”œβ”€β”€ test
|   |   |   |   β”œβ”€β”€ ...
β”‚   └── ...

To facilite the data loading speed during training, we provide a script to pre-process the raw data. You can run the following example command and it will generate processed data under ./data/rlbench/processed.

# e.g. to pre-process task turn_tap with front camera:
python scripts/ --task_names turn_tap --camera_views front

2. Data Generation by Your Own

You can also generate your own data on all tasks RLBench supported.

Coming soon.

πŸš€ Training and Evaluation

  • Train with RGB(-D) image observation:

    # ACT policy example:
    python src/ exp_maniskill2_act_policy=base exp_maniskill2_act_policy/maniskill2_task@maniskill2_task=${task} exp_maniskill2_act_policy/maniskill2_model@maniskill2_model=${model} seed=${seed}
    # Diffusion policy example:
    python src/ exp_maniskill2_diffusion_policy=base exp_maniskill2_diffusion_policy/maniskill2_task@maniskill2_task=${task} exp_maniskill2_diffusion_policy/maniskill2_model@maniskill2_model=${model} seed=${seed}
  • Train with point cloud observation:

    # ACT policy example:
    python src/ exp_maniskill2_act_policy=base exp_maniskill2_act_policy/maniskill2_pcd_task@maniskill2_pcd_task=${task} exp_maniskill2_act_policy/maniskill2_model@maniskill2_model=${model} seed=${seed}
    # Diffusion policy example:
    python src/ exp_maniskill2_diffusion_policy=base exp_maniskill2_diffusion_policy/maniskill2_pcd_task@maniskill2_pcd_task=${task} exp_maniskill2_diffusion_policy/maniskill2_model@maniskill2_model=${model} seed=${seed}
  • Evaluate a checkpoint:

    python src/ exp_maniskill2_act_policy=base exp_maniskill2_act_policy/maniskill2_pcd_task@maniskill2_pcd_task=${task} exp_maniskill2_act_policy/maniskill2_model@maniskill2_model=${model} ckpt_path=${path/to/checkpoint} seed=${seed}
  • Zero-shot generalization evaluation:

    • To evaluate camera view generalization experiments, run scripts/ The script evaluates the given checkpoint of the given model on the given task with four different camera views, using the specified seed. See the script for more details. For example:
    bash scripts/ ${path/to/checkpoint} ${task} ${model} ${seed}
    • To evaluate visual changes generalization experiments, run scripts/ The script evaluates the given checkpoint of the given model with different lighting conditions, noise levels and background colors, using the specified seed. See the script for more details. Note that currently only StackCube task is supported. For example:
    bash scripts/ ${path/to/checkpoint} ${model} ${seed}

Detailed configurations can be found in configs/exp_maniskill2_act_policy and configs/exp_maniskill2_diffusion_policy.

Currently supported tasks can be found in configs/exp_maniskill2_act_policy/maniskill2_task, configs/exp_maniskill2_act_policy/maniskill2_pcd_task, configs/exp_maniskill2_diffusion_policy/maniskill2_task and configs/exp_maniskill2_diffusion_policy/maniskill2_pcd_task.

Currently supported models can be found in configs/exp_maniskill2_act_policy/maniskill2_model and configs/exp_maniskill2_diffusion_policy/maniskill2_model.

  • Train with RGB(-D) image observation:

    # ACT policy example:
    python src/ exp_rlbench_act_policy=base rlbench_task=${task} exp_rlbench_act_policy/rlbench_model@rlbench_model=${model} seed=${seed}
    # Diffusion policy example:
    python src/ exp_rlbench_diffusion_policy=base rlbench_task=${task} exp_rlbench_diffusion_policy/rlbench_model@rlbench_model=${model} seed=${seed}
  • Train with point cloud observation:

    # ACT policy example:
    python src/ exp_rlbench_act_policy=base rlbench_task=${task} exp_rlbench_act_policy/rlbench_model@rlbench_model=${model} seed=${seed}
    # Diffusion policy example:
    python src/ exp_rlbench_diffusion_policy=base rlbench_task=${task} exp_rlbench_diffusion_policy/rlbench_model@rlbench_model=${model} seed=${seed}
  • Evaluate a checkpoint:

    # ACT policy example:
    python src/ exp_rlbench_act_policy=base rlbench_task=${task} exp_rlbench_act_policy/rlbench_model@rlbench_model=${model} seed=${seed} ckpt_path=${path/to/checkpoint}
  • Zero-shot camera-view generalization evaluation: To evaluate camera view generalization experiments, run scripts/ The script evaluates the given checkpoint of the given policy and model on the given task with four different camera views, using the specified seed. See the script for more details. For example:

    # policy: either diffusion or act
    bash scripts/ ${policy} ${path/to/checkpoint} ${task} ${model} ${seed}

Detailed configurations can be found in configs/exp_rlbench_act_policy and configs/exp_rlbench_diffusion_policy.

Currently supported models can be found in configs/exp_rlbench_act_policy/rlbench_model and configs/exp_rlbench_diffusion_policy/rlbench_model.

πŸŽ‰ Gotchas

Override any config parameter from command line

This codebase is based on Hydra, which allows for convenient configuration overriding:

python src/ trainer.max_epochs=20 seed=300

Note: You can also add new parameters with + sign.

python src/ +some_new_param=some_new_value
Train on CPU, GPU, multi-GPU and TPU
# train on CPU
python src/ trainer=cpu

# train on 1 GPU
python src/ trainer=gpu

# train on TPU
python src/ +trainer.tpu_cores=8

# train with DDP (Distributed Data Parallel) (4 GPUs)
python src/ trainer=ddp trainer.devices=4

# train with DDP (Distributed Data Parallel) (8 GPUs, 2 nodes)
python src/ trainer=ddp trainer.devices=4 trainer.num_nodes=2

# simulate DDP on CPU processes
python src/ trainer=ddp_sim trainer.devices=2

# accelerate training on mac
python src/ trainer=mps
Train with mixed precision
# train with pytorch native automatic mixed precision (AMP)
python src/ trainer=gpu +trainer.precision=16
Use different tricks available in Pytorch Lightning
# gradient clipping may be enabled to avoid exploding gradients
python src/ trainer.gradient_clip_val=0.5

# run validation loop 4 times during a training epoch
python src/ +trainer.val_check_interval=0.25

# accumulate gradients
python src/ trainer.accumulate_grad_batches=10

# terminate training after 12 hours
python src/ +trainer.max_time="00:12:00:00"

Note: PyTorch Lightning provides about 40+ useful trainer flags.

Easily debug
# runs 1 epoch in default debugging mode
# changes logging directory to `logs/debugs/...`
# sets level of all command line loggers to 'DEBUG'
# enforces debug-friendly configuration
python src/ debug=default

# run 1 train, val and test loop, using only 1 batch
python src/ debug=fdr

# print execution time profiling
python src/ debug=profiler

# try overfitting to 1 batch
python src/ debug=overfit

# raise exception if there are any numerical anomalies in tensors, like NaN or +/-inf
python src/ +trainer.detect_anomaly=true

# use only 20% of the data
python src/ +trainer.limit_train_batches=0.2 \
+trainer.limit_val_batches=0.2 +trainer.limit_test_batches=0.2

Note: Visit configs/debug/ for different debugging configs.

Resume training from checkpoint
python src/ ckpt_path="/path/to/ckpt/name.ckpt"

Note: Checkpoint can be either path or URL.

Note: Currently loading ckpt doesn't resume logger experiment, but it will be supported in future Lightning release.

Create a sweep over hyperparameters
# this will run 9 experiments one after the other,
# each with different combination of seed and learning rate
python src/ -m seed=100,200,300,0.00005,0.00001

Note: Hydra composes configs lazily at job launch time. If you change code or configs after launching a job/sweep, the final composed configs might be impacted.

Execute all experiments from folder
python src/ -m 'exp_maniskill2_act_policy/maniskill2_task@maniskill2_task=glob(*)'

Note: Hydra provides special syntax for controlling behavior of multiruns. Learn more here. The command above executes all task experiments from configs/exp_maniskill2_act_policy/maniskill2_task.

Execute run for multiple different seeds
python src/ -m seed=100,200,300 trainer.deterministic=True

Note: trainer.deterministic=True makes pytorch more deterministic but impacts the performance.

For more instructions, refer to the official documentation for Pytorch Lightning, Hydra, and Lightning Hydra Template.

πŸ’‘ Trouble Shooting


πŸ“š License

This repository is released under the MIT license.

✨ Acknowledgement

Our code is primarily built upon Pytorch Lightning, Hydra, Lightning Hydra Template, ManiSkill2, RLBench, PerAct, ACT, Diffusion Policy, TIMM, PonderV2, MultiMAE, Pointcept, VC1, R3M. We extend our gratitude to all these authors for their generously open-sourced code and their significant contributions to the community.

πŸ“ Citation

  title={Point Cloud Matters: Rethinking the Impact of Different Observation Spaces on Robot Learning},
  author={Zhu, Haoyi and Wang, Yating and Huang, Di and Ye, Weicai and Ouyang, Wanli and He, Tong},
  journal={arXiv preprint arXiv:2402.02500},