Wenlong Huang1, Igor Mordatch2, Pieter Abbeel1, Deepak Pathak3
1University of California, Berkeley, 2Google Brain, 3Carnegie Mellon University
This is a PyTorch implementation of our Geometry-Aware Multi-Task Policy. The codebase also includes a suite of dexterous manipulation environments with 114 diverse real-world objects built upon Gym and MuJoCo.
We show that a single generalist policy can perform in-hand manipulation of over 100 geometrically-diverse real-world objects and generalize to new objects with unseen shape or size. Interestingly, we find that multi-task learning with object point cloud representations not only generalizes better but even outperforms the single-object specialist policies on both training as well as held-out test objects.
If you find this work useful in your research, please cite using the following BibTeX:
@article{huang2021geometry,
title={Generalization in Dexterous Manipulation via Geometry-Aware Multi-Task Learning},
author={Huang, Wenlong and Mordatch, Igor and Abbeel, Pieter and Pathak, Deepak},
journal={arXiv preprint arXiv:2111.03062},
year={2021}
}
- Python=3.6.9
- CUDA=10.2
- CUDNN=7.6.5
- MuJoCo=1.50 (Installation Instructions)
Note: MuJoCo now comes with a free license.
git clone https://github.com/huangwl18/geometry-dex.git
cd geometry-dex/
conda create --name geometry-dex-env python=3.6.9
conda activate geometry-dex-env
pip install --upgrade pip
pip install -r requirements.txt
bash install-baselines.sh
Below are some flags and parameters for run_ddpg.py
that you may find useful for reference:
Flags and Parameters | Description |
---|---|
--expID <INT> |
Experiment ID |
--train_names <List of STRING> |
list of environments for training; separated by space |
--test_names <List of STRING> |
list of environments for zero-shot testing; separated by space |
--point_cloud |
Use geometry-aware policy |
--pointnet_load_path <INT> |
Experiment ID from which to load the pre-trained Pointnet; required for --point_cloud |
--video_count <INT> |
Number of videos to generate for each env per cycle; only up to 1 is currently supported; 0 to disable |
--n_test_rollouts <INT> |
Total number of collected rollouts across all train + test envs for each evaluation run; should be multiple of len(train_names) + len(test_names) |
--num_rollouts <INT> |
Total number of collected rollouts across all train envs for 1 training cycle; should be multiple of len(train_names) |
--num_parallel_envs <INT> |
Number of parallel envs to create for vec_env ; should be multiple of len(train_names) |
--chunk_size <INT> |
Number of parallel envs asigned to each worker in SubprocChunkVecEnv ; 0 to disable and use SubprocVecEnv |
--num_layers <INT> |
Number of layers in MLP for all policies |
--width <INT> |
Width of each layer in MLP for all policies |
--seed <INT> |
seed for Gym, PyTorch and NumPy |
--eval |
Perform only evaluation using latest checkpoint |
--load_path <INT> |
Experiment ID from which to load the checkpoint for DDPG; required for --eval |
The code also uses WandB. You may wish to run wandb login
in terminal to record to your account or choose to run anonymously.
WARNING: Due to the large number of total environments, generating videos during training can be slow and memory intensive. You may wish to train the policy without generating videos by passing video_count=0
. After training completes, simply run run_ddpg.py
with flags --eval
and --video_count=1
to visualize the policy. See example below.
To train Vanilla Multi-Task DDPG policy:
python run_ddpg.py --expID 1 --video_count 0 --n_cycles 40000 --chunk 10
To train Geometry-Aware Multi-Task DDPG policy, first pretrain PointNet encoder:
python train_pointnet.py --expID 2
Then train the policy:
python run_ddpg.py --expID 3 --video_count 0 --n_cycles 40000 --chunk 10 --point_cloud --pointnet_load_path 2 --no_save_buffer
Note we don't save replay buffer here because it is slow as it contains sampled point clouds. If you wish to resume training in the future, do not pass --no_save_buffer
above.
To evaluate a trained policy and generate video visualizations, run the same command used to train the policy but with additional flags --eval --video_count=<VIDEO_COUNT> --load_path=<LOAD_EXPID>
. Replace <VIDEO_COUNT>
with 1
if you wish to enable visualization and 0
otherwise. Replace <LOAD_EXPID>
with the Experiment ID of the trained policy. For a Geometry-Aware Multi-Task DDPG policy trained using above command, run the following for evaluation and visualization:
python run_ddpg.py --expID 4 --video_count 1 --n_cycles 40000 --chunk 10 --point_cloud --pointnet_load_path 2 --no_save_buffer --eval --load_path 3
We will be releasing trained model files for our Geometry-Aware Policy and single-task oracle policies for each individual object. Stay tuned! Early access can be requested via email.
The code is adapted from this open-sourced implementation of DDPG + HER. The object meshes are from the YCB Dataset and the ContactDB Dataset. We use SubprocChunkVecEnv
from this pull request of OpenAI Baselines to speedup vectorized environments.