Skip to content

fuse-model/FuSe

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

29 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Beyond Sight: Finetuning Generalist Robot Policies with Heterogeneous Sensors via Language Grounding

HF Models HF Dataset Python License: MIT Static Badge

Joshua Jones, Oier Mees, Carmelo Sferrazza, Kyle Stachowicz, Pieter Abbeel, Sergey Levine


This repo contains code to Fuse heterogeneous Sensory (FuSE) data, like touch sensing or audio, into generalist robot policies via language grounding. We release both a dataset of 26,866 robot trajectories collected heterogeneous sensory modalities and checkpoints for our two main models: Octo a large diffusion-based transformer model and a 3B VLA based on PaliGemma. Our code is built on top of the Octo and PaliVLA codebases.

FuSE model

Get Started

Install PaliVLA:

cd palivla_digit
uv venv
source .venv/bin/activate
uv sync --extra [gpu or tpu]
uv pip install -e ../octo_digit --no-deps
uv pip install -e ../bridge_with_digit/widowx_envs
uv pip install -e .

Install Octo:

cd octo_digit
uv venv
source .venv/bin/activate
uv sync --extra [gpu or tpu]
uv pip install -e ../bridge_with_digit/widowx_envs
uv pip install -e .

Dataset Download

We provide a dataset containing 26,866 trajectories collected on a WidowX robot at the RAIL lab @ UC Berkeley, USA. It contains visual, tactile, sound and action data collected across several environments, annotated with natural language. You can download the dataset from the following HuggingFace dataset.

Model Training

For Octo:

python octo_digit/scripts/finetune_fuse.py --config=scripts/configs/fuse_config.py

For PaliVLA:

python palivla_digit/palivla/train_fuse.py --config=palivla_digit/palivla/configs/fuse_config.py

Inference with Pretrained Models

Install bridge_with_digit on the robot controller, and start the action server.

Download the pretrained models from the HuggingFace model hub.

For Octo:

python octo_digit/eval/fuse_eval.py --checkpoint_weights_path=ckpt.pth

For PaliVLA:

python palivla_digit/eval_palivla.py --checkpoint_dir=ckpt.pth

License

This project is licensed under the MIT License - see the LICENSE file for details. PaliVLA is licensed under the Apache 2.0 License - see the LICENSE file for details.

Citation

@article{jones2025fuse,
  title={Beyond Sight: Finetuning Generalist Robot Policies with Heterogeneous Sensors via Language Grounding},
  author={Jones, Joshua and Mees, Oier and Sferrazza, Carmelo and Stachowicz, Kyle and Abbeel, Pieter and Levine, Sergey},
  journal={arXiv preprint arXiv:2501.04693},
  year={2025}
}