3D Foundation Policy (FP3) is a 1.3B 3D point cloud-based language-visuomotor policy pre-trained on 60k episodes from the DROID dataset. FP3 supports data-efficient fine-tuning for downstream tasks, while demonstrating superior generalizability to unseen environments and novel objects.
Note: An improved version of FP3 is under development, incorporating enhancements such as point cloud fusion and data augmentation. We will release the updated version shortly.
We will continuously improve this repository. Major updates will be tracked here:
- [2025-06-05] Weights of FP3 pre-trained on DROID release.
- [2025-06-05] Added support for DP2 and DP3 configuration generation for training Diffusion Policy and 3D Diffusion Policy.
See INSTALL.md for details.
- Calibrate cameras using the Droid GUI, following the guide here: Calibrating Cameras. Since we use point clouds as input, ensuring proper camera calibration is crucial.
- Collect data using the Droid GUI, following the instructions here: Data Collection.
-
The collected image information is stored in
.svo
files. We need to further process and combine this information into.h5
files. Navigate to the project folder and run the following command:python droid_policy_learning/robomimic/scripts/conversion/convert_droid.py --folder <your_data_folder>
-
After that, run this command to gather all your data and create a
dataset.json
file:python droid_policy_learning/robomimic/scripts/conversion/set_manifest_file.py --folder <your_data_folder> --lang <language_insruction>
-
With
dataset.json
, we can then encode the language with the pretrained encoder and merge the feature into.h5
files. Run the following command:python droid_policy_learning/robomimic/scripts/conversion/add_lang_to_converted_data.py --manifest_file droid_policy_learning/dataset.json
- You can download the checkpoint pre-trained on DROID by this link: checkpoint on DROID. It can be used for finetuning or zero-shot evaluation.
-
Our training scripts using deepspeed to accelerate training, so you need to configure it properly.
-
Add language embedding to dataset by running:
python droid_policy_learning/robomimic/scripts/conversion/add_lang_to_converted_data.py --manifest_file droid_policy_learning/dataset.json
-
Generate config by running:
python droid_policy_learning/robomimic/scripts/generate_config.py --dataset <your manifest file> --exp <experiment name>
If you wish to train Diffusion Policy or 3D Diffusion Policy, you can generate configuration files for DP2 and DP3 in a similar manner.
python droid_policy_learning/robomimic/scripts/generate_config_dp2.py --dataset <your manifest file> --exp <experiment name>
python droid_policy_learning/robomimic/scripts/generate_config_dp3.py --dataset <your manifest file> --exp <experiment name>
Once the configuration files are created, the subsequent training and evaluation procedures are identical to those of FP3.
-
And then run:
accelerate launch droid_policy_learning/robomimic/scripts/train.py --config <config path> --ckpt <ckpt_path>
If you want to finetune on our pre-trained model, just download the checkpoint and specify the
ckpt
argument.
- To evaluate a policy on a robot, you can just run:
cat <your_language_instruction> > ./eval_params/lang_command.txt python droid/scripts/evaluation/evaluate_policy.py
Our code is generally built upon DROID, Robomimic. We sincerely appreciate their contribution to the open-source community, which have significantly supported this project.