Skip to content

gist-ailab/3d-foundation-policy

 
 

Repository files navigation

FP3: A 3D Foundation Policy for Robotic Manipulation

3D Foundation Policy (FP3) is a 1.3B 3D point cloud-based language-visuomotor policy pre-trained on 60k episodes from the DROID dataset. FP3 supports data-efficient fine-tuning for downstream tasks, while demonstrating superior generalizability to unseen environments and novel objects.

[Homepage][Paper]

Updates

Note: An improved version of FP3 is under development, incorporating enhancements such as point cloud fusion and data augmentation. We will release the updated version shortly.

We will continuously improve this repository. Major updates will be tracked here:

  • [2025-06-05] Weights of FP3 pre-trained on DROID release.
  • [2025-06-05] Added support for DP2 and DP3 configuration generation for training Diffusion Policy and 3D Diffusion Policy.

Installation

See INSTALL.md for details.

Workflow

1. Data Collection

  • Calibrate cameras using the Droid GUI, following the guide here: Calibrating Cameras. Since we use point clouds as input, ensuring proper camera calibration is crucial.
  • Collect data using the Droid GUI, following the instructions here: Data Collection.

2. Data processing

  • The collected image information is stored in .svo files. We need to further process and combine this information into .h5 files. Navigate to the project folder and run the following command:

    python droid_policy_learning/robomimic/scripts/conversion/convert_droid.py --folder <your_data_folder>
  • After that, run this command to gather all your data and create a dataset.json file:

    python droid_policy_learning/robomimic/scripts/conversion/set_manifest_file.py --folder <your_data_folder> --lang <language_insruction>
  • With dataset.json, we can then encode the language with the pretrained encoder and merge the feature into .h5 files. Run the following command:

    python droid_policy_learning/robomimic/scripts/conversion/add_lang_to_converted_data.py --manifest_file droid_policy_learning/dataset.json

3. Download Pre-trained Weights

  • You can download the checkpoint pre-trained on DROID by this link: checkpoint on DROID. It can be used for finetuning or zero-shot evaluation.

4. Model Training

  • Our training scripts using deepspeed to accelerate training, so you need to configure it properly.

  • Add language embedding to dataset by running:

    python droid_policy_learning/robomimic/scripts/conversion/add_lang_to_converted_data.py --manifest_file droid_policy_learning/dataset.json
  • Generate config by running:

    python droid_policy_learning/robomimic/scripts/generate_config.py --dataset <your manifest file> --exp <experiment name>

    If you wish to train Diffusion Policy or 3D Diffusion Policy, you can generate configuration files for DP2 and DP3 in a similar manner.

    python droid_policy_learning/robomimic/scripts/generate_config_dp2.py --dataset <your manifest file> --exp <experiment name>
    python droid_policy_learning/robomimic/scripts/generate_config_dp3.py --dataset <your manifest file> --exp <experiment name>

    Once the configuration files are created, the subsequent training and evaluation procedures are identical to those of FP3.

  • And then run:

    accelerate launch droid_policy_learning/robomimic/scripts/train.py --config <config path> --ckpt <ckpt_path>

    If you want to finetune on our pre-trained model, just download the checkpoint and specify the ckpt argument.

5. Evaluation

  • To evaluate a policy on a robot, you can just run:
    cat <your_language_instruction> > ./eval_params/lang_command.txt
    python droid/scripts/evaluation/evaluate_policy.py

Acknowledgment

Our code is generally built upon DROID, Robomimic. We sincerely appreciate their contribution to the open-source community, which have significantly supported this project.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 82.4%
  • JavaScript 8.2%
  • Python 6.1%
  • C++ 2.3%
  • HTML 0.7%
  • Swift 0.2%
  • Other 0.1%