[Arxiv] [Project page]
We introduce Vysics, a vision-and-physics framework for a robot to build an expressive geometry and dynamics model of a single rigid body, using a seconds-long RGBD video and the robot's proprioception. While the computer vision community has built powerful visual 3D perception algorithms, cluttered environments with heavy occlusions can limit the visibility of objects of interest. However, observed motion of partially occluded objects can imply physical interactions took place, such as contact with a robot or the environment. These inferred contacts can supplement the visible geometry with “physible geometry,” which best explains the observed object motion through physics. Vysics uses a vision-based tracking and reconstruction method, BundleSDF, to estimate the trajectory and the visible geometry from an RGBD video, and an odometry-based model learning method, Physics Learning Library (PLL), to infer the “physible” geometry from the trajectory through implicit contact dynamics optimization. The visible and “physible” geometries jointly factor into optimizing a signed distance function (SDF) to represent the object shape. Vysics does not require pretraining, nor tactile or force sensors. Compared with vision-only methods, Vysics yields object models with higher geometric accuracy and better dynamics prediction in experiments where the object interacts with the robot and the environment under heavy occlusion.
@inproceedings{bianchini2025vysics,
title={Vysics: Object Reconstruction Under Occlusion by Fusing Vision and Contact-Rich Physics},
author={Bibit Bianchini and Minghan Zhu and Mengti Sun and Bowen Jiang and Camillo J. Taylor and Michael Posa},
year={2025},
month={june},
booktitle={Robotics: Science and Systems (RSS)},
website={https://vysics-vision-and-physics.github.io/},
}
-
Download pretrained weights of segmentation network, and put it under
./BundleTrack/XMem/saves/XMem-s012.pth
-
Download pretrained weights of LoFTR outdoor_ds.ckpt, and put it under
./BundleTrack/LoFTR/weights/outdoor_ds.ckpt
- Build the docker image (this only needs to do once and can take some time).
cd docker
docker build --network host -t vysics:cu118 .
- Start a docker container the first time
cd docker && bash run_custom_container.sh
# Inside docker container, compile the packages which are machine dependent
cd ${PROJECT_ROOT}
bash build.sh
# Then build pll
cd dair_pll
pip install -e .
cd ${PROJECT_ROOT}
export PYTHONPATH=${PWD}:${PYTHONPATH}
# Configure the shell to use ROS
source /opt/ros/noetic/setup.bash
One Vysics run is associated with one BundleSDF run with a set of pose estimates, one PLL run, and one BundleSDF NeRF run whose shape jointly optimized visible and physible geometries. To keep track of these results, we use the following labels:
{ASSET_TYPE}
and{EXP_NUM}
make up an{ASSET_NAME}
via{ASSET_TYPE}_{EXP_NUM}
, e.g.robotocc_bottle
,1
make up the assetrobotocc_bottle_1
. These are the names for the experiments.{BSDF_ID}
and{NERF_BSDF_ID}
, which can be any string. This labels a BundleSDF run and a BundleSDF NeRF sub-run, respectively. For one "experiment" (denoted by{ASSET_NAME}
), there can be multiple "runs" (denoted by{BSDF_ID}
) for generating results. For one BundleSDF "run" (denoted by{BSDF_ID}
), there can be multiple "NeRF runs" (denoted by{NERF_BSDF_ID}
), always including{BSDF_ID}
itself. This structure is inherited by the BundleSDF repo design.{PLL_ID}
can be any string. It similarly labels a PLL run. Many PLL runs can be associated with the trajectory results of one BundleSDF run.{CYCLE_ITER}
is a number 1+ and represents how many iterations of BundleSDF pose estimates were in an experiment's history. In the Vysics RSS 2025 paper, all results were cycle iteration 1, though it is possible to get higher cycle iterations by running BundleSDF for the second time as a full BundleSDF run (with new{BSDF_ID}
) and not just a NeRF-only run (a new{NERF_BSDF_ID}
inheriting an existing{BSDF_ID}
); this creates a new set of pose estimates that factor in the physics insights from the last associated PLL run. All of these count as cycle iteration 1 results:- Running BundleSDF for the first time.
- Running PLL on a BundleSDF cycle iteration 1 result's pose estimates.
- Running BundleSDF NeRF-only, using PLL cycle iteration 1 results.
Our dataset are converted from rosbags. See later sections for a detailed explanation for that process. To make it easier to get started, we provide sample data that are processed are ready to use. Download from the following links:
- RGBD data
- Robot data
- Object shape data (for evaluation only, save to somewhere you like for now)
The downloaded data should be extracted and put in the following file structure:
PROJECT_ROOT
├── data
│ └── {ASSET_NAME} (e.g., robotocc_bakingbox_1)
│ ├── rgb/ (PNG files)
│ ├── depth/ (PNG files, stored in mm, uint16 format. Filename same as rgb)
│ ├── masks/ (PNG files. Filename same as rgb. 0 is background. Else is foreground)
│ └── cam_K.txt (3x3 intrinsic matrix, use space and enter to delimit)
└── cnets-data-generation
└── dataset
└── {ASSET_NAME}
├── bundlesdf_timestamps.txt
├── ee_positions.txt
├── joint_angles.txt
├── joint_efforts.txt
├── joint_names.txt
├── joint_times.txt
├── joint_velocities.txt
├── synced_ee_positions.txt
├── synced_joint_angles.txt
├── synced_joint_efforts.txt
└── synced_joint_velocities.txt
There are other data files that are needed for inference, but they are already included in the repo. These include:
PROJECT_ROOT
├── cnets-data-generation
│ ├── table_calibration/table_heights.yaml
│ ├── robot_dynamics
│ │ ├── franka_control_node.yaml
│ │ └── franka_hw_controllers.yaml
│ └── assets
│ ├── true_urdfs/bsdf_mesh_average_dynamics.urdf
│ ├── cam_K_robotocc.txt
│ ├── config.yaml
│ └── realsense_pose_robotocc.yaml
└── dair_pll
└── assets
├── precomputed_vision_functions/...
├── franka_with_ee.urdf
├── franka_without_geom_with_ee.urdf
└── vision_template.urdf
If you create your own dataset, you may need to modify these files.
A script for running a Vysics experiment after obtaining/generating the dataset is provided as run_all.sh
. It contains 6 steps:
- Run BundleSDF with tracking and NeRF: generates pose estimates and a vision-only shape estimate. Run from
vysics
repo. - Convert from BundleSDF to PLL: converts trajectory to PLL format and extracts vision insights to factor into physics learning. Run from the
cnets-data-generation
folder. - Run PLL: generates a physics-based shape estimate and other dynamics parameter estimates. Run from the
dair_pll
folder. - Convert from PLL to BundleSDF: moves physics-based shape information into vision input directory. Run from the
cnets-data-generation
folder. - Run BundleSDF with NeRF only: generates a vision- and physics-informed new shape estimate. Run from
vysics
repo. - Convert from BundleSDF to PLL: required step before evaluating. Run from the
cnets-data-generation
folder.
Start BundleSDF runs from the vysics
repo.
For running a BundleSDF tracking and NeRF run from scratch with BundleSDF ID {BSDF_ID}
:
python bundlenets/run_custom.py --vision-asset={ASSET_NAME} --run-name={BSDF_ID} --use-gui=0 --debug-level=2
For running a BundleSDF NeRF run only with NeRF ID {NERF_BSDF_ID}
under an existing BundleSDF tracking result with ID {BSDF_ID}
using supervision from PLL run {PLL_ID}
:
python bundlenets/run_custom.py --vision-asset={ASSET_NAME} --tracking-run-name={BSDF_ID} --share-tracking --run-name={NERF_BSDF_ID} --pll-id={PLL_ID} --use-gui=0 --debug-level=2
For both, add --cycle-iteration={CYCLE_ITER}
as necessary.
After iteration 1 of BundleSDF run with BundleSDF ID {BSDF_ID}
, run from the cnets-data-generation
folder:
python conversion_bsdf_to_pll.py --vision-asset={ASSET_NAME} --bundlesdf-id={BSDF_ID}
By default, this assumes {BSDF_ID}
is a cycle iteration 1 BundleSDF run, and that the NeRF geometry result to convert is under the shared ID {BSDF_ID}
. To change either, include the flags --cycle-iteration={CYCLE_ITER}
and/or --nerf-bundlesdf-id={NERF_BSDF_ID}
.
BundleSDF-to-PLL conversion does the following:
- Pose conversion: Converts the camera-frame, 4x4 homogeneous pose estimate transforms into PLL format, which uses world-frame coordinates, represents orientation with quaternions, includes velocity estimates, and includes Franka states, if the experiment featured any robot interactions.
- Visible geometry insights: Generates sampled points on the visible portion of the BundleSDF shape estimate's convex hull.
There are other flags that may be of interest for faster conversions; see the BundleSDF-to-PLL converter for documentation.
Start a PLL run from the dair_pll
folder with:
python examples/contactnets_vision.py --vision-asset={ASSET_NAME} --run-name={PLL_ID} --bundlesdf-id={BSDF_ID}
By default, this uses the pose and geometry estimates from {BSDF_ID}
as both the BundleSDF and NeRF IDs in cycle iteration 1. Add --nerf-bundlesdf-id={NERF_BSDF_ID}
and/or --cycle-iteration={CYCLE_ITER}
as necessary.
There are other flags that may be of interest such as adjusting loss weights; see the Vysics PLL script contactnets_vision.py for documentation.
After iteration 1 of PLL run with PLL ID {PLL_ID}
, run from the cnets-data-generation
folder:
python conversion_pll_to_bsdf.py --vision-asset={ASSET_NAME} --pll-id={PLL_ID}
Add the appropriate --cycle-iteration={CYCLE_ITER}
if necessary.
PLL-to-BundleSDF conversion does the following:
- Physible geometry insights: Moves geometry outputs from PLL into input directories accessible to future BundleSDF runs.
With the introduced terminology, locations for results are:
- Pose estimates come from running BundleSDF. These are represented as 4x4 homogeneous transforms of the object pose in the camera frame. The estimates are located within a BundleSDF output folder:
results/{ASSET_NAME}/bundlesdf_iteration_{CYCLE_ITER}/bundlesdf_id_{BSDF_ID}/ob_in_cam/
- To be usable by downstream PLL runs, these poses need to be converted into PLL format which will be located at a PLL input folder:
dair_pll/assets/vision_{ASSET_TYPE}/{ASSET_NAME}/toss/bundlesdf_iteration_{CYCLE_ITER}/bundlesdf_id_{BSDF_ID}/{EXP_NUM}.pt
- The
toss
subdirectory indicates trajectories where all of the contained dynamics can be explained by object-table and object-robot interactions, compared to thefull
subdirectory whose trajectories can include unmodeled human interventions. For Vysics RSS 2025 experiments, there were no human interventions, sotoss
andfull
trajectories are the same in length and content.
- To be usable by downstream PLL runs, these poses need to be converted into PLL format which will be located at a PLL input folder:
- Visible shape estimates come from running a NeRF experiment under a BundleSDF pose tracking experiment. The shape estimate is located within a BundleSDF output folder:
results/{ASSET_NAME}/bundlesdf_iteration_{CYCLE_ITER}/bundlesdf_id_{BSDF_ID}/nerf_runs/bundlesdf_id_{NERF_BSDF_ID}/mesh_cleaned.obj (normalized) results/{ASSET_NAME}/bundlesdf_iteration_{CYCLE_ITER}/bundlesdf_id_{BSDF_ID}/nerf_runs/bundlesdf_id_{NERF_BSDF_ID}/textured_mesh.obj (true scale)
- To be usable by downstream PLL runs, this visible mesh needs to produce a set of points sampled on the visible portions of the convex hull of the above .obj file, subject to our vision-based PLL supervision loss (paper Eq. (6)). This set of points, along with other geometry information, will be located within a PLL input folder:
dair_pll/assets/vision_{ASSET_TYPE}/{ASSET_NAME}/geom_for_pll/bundlesdf_iteration_{CYCLE_ITER}/bundlesdf_id_{BSDF_ID}/
- To be usable by downstream PLL runs, this visible mesh needs to produce a set of points sampled on the visible portions of the convex hull of the above .obj file, subject to our vision-based PLL supervision loss (paper Eq. (6)). This set of points, along with other geometry information, will be located within a PLL input folder:
- Physics parameters, including physible shape, come from running a PLL experiment and are located in a PLL output folder: [TODO
with_bundlesdf_mesh.urdf
is misleadingly named. It actually uses PLL-estiamted mesh. ]dair_pll/results/vision_{ASSET_TYPE}/{ASSET_NAME}/bundlesdf_iteration_{CYCLE_ITER}/pll_id_{PLL_ID}/urdfs/ - with_bundlesdf_mesh.urdf - body_best.obj
- To be usable by downstream BundleSDF runs, the PLL geometry estimate needs to produce a set of points and associated signed distances subject to our support point loss (paper Eq. (8)) and a set of points and associated signed distance lower bounds subject to our hyperplane-constrained loss (paper Eq. (11)). These sets of points, along with other geometry information, will be located within a BundleSDF input folder:
geometry/{ASSET_NAME}/bundlesdf_iteration_{CYCLE_ITER}/pll_id_{PLL_ID}/
- This BundleSDF input directory contains
from_support_points
andfrom_mesh_surface
subfolders. It worked well from our experience to supervise via support point loss (paper Eq. (8)) with data infrom_support_points
and via hyperplane-constrained loss (paper Eq. (11)) with data infrom_mesh_surface
. This helps ensure the stricter support point loss is applied on points that confidently hypothesized contact, while the less strict hyperplane-constrained loss can still be applied more broadly without encouraging unnecessary geometric convexity where visible signals may say otherwise.
- To be usable by downstream BundleSDF runs, the PLL geometry estimate needs to produce a set of points and associated signed distances subject to our support point loss (paper Eq. (8)) and a set of points and associated signed distance lower bounds subject to our hyperplane-constrained loss (paper Eq. (11)). These sets of points, along with other geometry information, will be located within a BundleSDF input folder:
We use a combination of manual alignment and ICP to align the meshes. You can use the following script to do the manual part through keyboard, and the ICP will follow automatically:
python mesh_processing.py manual_icp --vision-asset={ASSET_NAME} --bundlesdf-id={BSDF_ID} --nerf-bundlesdf-id={NERF_BSDF_ID} --cycle-iteration=1
Or you can manually align the meshes using external tools (e.g., meshlab), and put the ground truth mesh that is aligned with the textured_mesh.obj
output mesh to the follow path:
results/{ASSET_NAME}/bundlesdf_iteration_{CYCLE_ITER}/bundlesdf_id_{BSDF_ID}/nerf_runs/bundlesdf_id_{NERF_BSDF_ID}/true_geom_aligned_meshlab.obj
Then run mesh_processing.py
with an extra flag --meshlab
, which will only do the ICP part.
Either way, a folder for saving the evaluation results of this experiment will be created:
cnets-data-generation/evaluation/{ASSET_NAME}_bsdf_{BSDF_ID}_{NERF_BSDF_ID}_{CYCLE_ITER}/
and the aligned ground truth mesh will be saved to true_geom_aligned_assist.obj
under this folder.
Evaluating a Vysics or BundleSDF run for geometry metrics can be done with the evaluation script via:
python evaluate.py --vision-asset={ASSET_NAME} --bundlesdf-id={BSDF_ID} --nerf-bundlesdf-id={NERF_BSDF_ID} --cycle-iteration={CYCLE_ITER}
Other flags of interest include --do-videos
for generating helpful videos, among others. See the evaluation script for documentation.
This script saves evaluation results to the evaluation folder mentioned above, including results.yaml
, which reports the geometric errors.
Generate dynamics predictions for Vysics, BundleSDF, PLL, and/or ground truth geometry with robot interactions using the robot dynamics predictions script via:
python robot_dynamics_predictions.py gen {ASSET_NAME} vysics bsdf pll gt --debug
...where you can specify any subset of vysics
, bsdf
, pll
, and gt
flags if you want to run simulation on the output of certain methods.
⚠️ NOTE: This requires editing / adding an entry to thePLL_BSDF_NERF_IDS_FROM_VISION_ASSET
dictionary in robot_dynamics_predictions.py so it knows what experiment runs (specified by{BSDF_ID}
,{NERF_BSDF_ID}
, and{PLL_ID}
) should be associated withvysics
,bsdf
, andpll
for the given{ASSET_NAME}
.
This script creates a folder at:
cnets-data-generation/robot_dynamics/{ASSET_NAME}/
...containing numerical and visual simulation results.
Analyze results from many runs with the gather results script via:
python gather_results.py gather
This traverses through all of the subfolders within each of these directories:
cnets-data-generation/evaluation/
cnets-data-generation/robot_dynamics/
...and writes yaml files into the evaluation
directory for each experiment type, e.g. bsdf_pll_robotocc.yaml
for Vysics, bsdf_robotocc.yaml
for BundleSDF, and pll_robotocc.yaml
for PLL.
⚠️ NOTE: This requires editing / adding lines to theprocess_gather_command
function to help it interpret what subfolders of theevaluation
directory should be included under Vysics, BundleSDF, PLL, or other labels you define. You can use different matching patterns for the experiments and export them to different yaml files as desired.
Generate plots resulting from the exported results from the gather
command above with the gather results script via:
python gather_results.py plot
This opens every expected generated yaml file from the gather
command and generates plots in the plot subdirectory:
cnets-data-generation/plots/
Currently this will make a MMDD
subdirectory for the given month and day, then within which there will be geometry and dynamics plots.
This repo includes a dataset creation script to generate new datasets from ROS bag inputs. There are a few manual steps required in addition to running this script.
-
Define a new
{ASSET_NAME}
and annotate the interactions in the dataset by editing the experiment configuration file at assets/config.yaml. This requires adding the following keys/entries into the file:dataset: {ASSET_TYPE}: 0: # Leave toss 0 blank 1: {BAG_NUM} ... # Repeat for every {EXP_NUM} for the given {ASSET_TYPE} tosses: {ASSET_TYPE}: - toss: 0 # Leave toss 0 details blank - toss: 1 start_time: secs: X nsecs: Y end_time: secs: Z nsecs: W ... # Repeat for every {EXP_NUM} for the given {ASSET_TYPE}
-
Put
raw_{BAG_NUM}.bag
into thecnets-data-generation/rosbags/
folder. As designed, this ROS bag needs to contain the following topics:/camera/aligned_depth_to_color/image_raw
for depth images/camera/color/image_raw
for RGB images/panda/joint_states
for Franka joint states
For further customization of extracting data from ROS bags, edits can be made to the ROS bag processor and how that functionality gets called from the dataset creator.
-
Create files for camera parameters. For Vysics data, we save extrinsics at:
cnets-data-generation/assets/realsense_pose_robotocc.yaml
; intrinsics at:cnets-data-generation/assets/cam_K_robotocc.txt
. -
(Optional) Compensate for camera depth offsets. There can be a bias in the depth camera returns that do not fully line up with the camera extrinsic calibration, which usually relies on the RGB sensor. The magnitude of this bias can be a few millimeters. If high precision is desired, you may want to calibrate the depth bias with respect to some external reference measurements (e.g., from TagSLAM) if available.
To calibrate the depth bias, uncomment
inspect_camera_alignments.interactive_offset_adjustment()
increate_dataset.py
when you generate the data of a new experiment for the first time. Then runpython create_dataset.py --vision-asset=${ASSET_NAME}
An interactive GUI will guide you to find the optimal bias value.
Use the result to hardcode
self.depth_offset_mm
inself._create_images()
increate_dataset.py
. Commentinspect_camera_alignments.interactive_offset_adjustment()
. -
Generate the dataset for an asset from the rosbag via:
python create_dataset.py --vision-asset={ASSET_NAME}
This writes RGB, depth, and camera intrinsics to a
data/{ASSET_NAME}/
folder. It also writes timestamps associated with each image, robot states, and synchronized robot states for each image timestamp to acnets-data-generation/dataset/{ASSET_NAME}/
folder.When creating the dataset, it may call
compute_table_offsets.py
, which estimates the height of the table surface from the data and saves the result tocnets-data-generation/table_calibration/ - {ASSET_NAME}.txt - {ASSET_NAME}.png - {ASSET_NAME}_eps.png
You can also run the script manually with matplotlib-backed GUI guidance:
python compute_table_offsets.py single --vision-asset={ASSET_NAME}
-
Combine the table height information into a single file. A subsequent call to:
python compute_table_offsets.py combine
...will combine the individual experiment results compatibly into table_heights.yaml, which contains a key
{ASSET_TYPE}
with sub-keys for every contained{EXP_NUM}
with a float for the table height to use in each experiment.You may avoid running
python compute_table_offsets.py single --vision-asset={ASSET_NAME}
for every{ASSET_NAME}
and reuse a single height for many{ASSET_NAME}
's by editing theget_table_height_from_log
function in the offset helper script to detect from the{ASSET_NAME}
what offset to use. Thecombine
command will still be required after these changes in order to write the values to table_heights.yaml. -
Create object masks. The masks need to be in the
data/{ASSET_NAME}/masks/
folder. This can be done a number of ways, and ultimately XMem worked sufficiently well for us (see BundleSDF' repo for reference).
After these steps, experiments can be run on the new {ASSET_NAME}
via the experiment instructions.
- link to project page
- make submodules public
- rename git submodules in the repo (will require code changes too for filepaths)
- reflect the submodules path changes in the readme
- put all submodule code on main branches
- clean setup of docker/virtual environment
- clean up obselete files
- publish sample data
- publish the complete dataset
- Documentation:
- running experiments
- evaluation
- dataset creation
- Known issue:
- Fix mesh overlay rendering misalignment.
Thanks to the great open-source repositories: BundleSDF, PLL.