Skip to content

jingjingqian75/GeoPredict

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation


GeoPredict: Leveraging Predictive Kinematics
and 3D Gaussian Geometry
for Precise VLA Manipulation

🔥 CVPR 2026 Highlight 🔥

Jingjing Qian1Boyao Han2Chen Shi1Lei Xiao1  Long Yang1Shaoshuai Shi3Li Jiang1
1Chinese University of Hong Kong, Shenzhen   2Hunan University   3Voyager Research, Didi Chuxing

arXiv


🔥 News

  • [2026-05] We released the inference code and checkpoints for GeoPredict.
  • [2026-02] Our paper was accepted by CVPR2026 as a Highlight ! 🥳
  • [2025-12] We released the paper and the project page for GeoPredict.

🎄 Overview

GeoPredict is a geometry-aware vision-language-action (VLA) framework for robotic manipulation. Existing methods are often limited by:

  1. 2D-Centric Formulation: operate in 2D image space, lacking explicit 3D spatial modeling.
  2. Reactive Control: map observations reactively, failing to anticipate future physical dynamics.
  3. Geometric Inconsistency: view-independent predictions struggle to enforce 3D consistency.

GeoPredict addresses these limitations with:

  1. Geometry-Aware VLA: augments VLA with predictive kinematic and 3D geometric priors.
  2. Predictive 3D Modeling: forecasts workspace geometry using track-guided 3DGS refinement.
  3. Lightweight Inference: uses predictive modules solely for training, reducing test-time overhead.

📝 TODO

  • Release paper and project page.
  • Release inference code and checkpoints.
  • Release training code. Expected in June 2026.
  • Support more open-source VLA models, such as Pi0.5 and OpenVLA. Expected in July 2026.

📚 Getting Started

  1. Environment Setup & Inference

📖 Framework

Framework

GeoPredict consists of three key components:

  • (a) Trajectory-Level Kinematic Prediction: encodes motion history of robot keypoints into compact tokens via a Track Encoder, and predicts multi-step 3D keypoint trajectories using learnable future track queries.
  • (b) Predictive 3D Gaussian Geometry: decodes a coarse 3D spatial query into initial Gaussian primitives to represent workspace geometry, and forecasts how the explicit 3D scene representation evolves across multiple future timesteps.
  • (c) Track-Guided Refinement & Rendering: adaptively increases Gaussian density along predicted trajectories to capture task-relevant interaction regions, and supervises the predictive 3DGS exclusively through future depth-map rendering without color modeling.

📊 Results

RoboCasa Simulation Benchmark Results

RoboCasa Simulation Benchmark Results

LIBERO Simulation Benchmark Results

LIBERO Simulation Benchmark Results

We report strong performance on both RoboCasa Human-50 and LIBERO benchmarks, demonstrating the effectiveness of GeoPredict in geometry-intensive and spatially demanding manipulation tasks. Please see the paper for full tables, metrics, and more detailed analysis.


📬 Contact

If you have questions about the paper, feel free to open an issue or contact:

  • Jingjing Qian: jingjingqian.0705@gmail.com

🔗 Citation

If you find our work helpful, please cite:

@misc{qian2025geopredict,
  title={GeoPredict: Leveraging Predictive Kinematics and 3D Gaussian Geometry for Precise VLA Manipulation},
  author={Jingjing Qian and Boyao Han and Chen Shi and Lei Xiao and Long Yang and Shaoshuai Shi and Li Jiang},
  year={2025},
  eprint={2512.16811},
  archivePrefix={arXiv},
  primaryClass={cs.CV},
  url={https://arxiv.org/abs/2512.16811},
}

About

[CVPR2026] GeoPredict: Leveraging Predictive Kinematics and 3D Gaussian Geometry for Precise VLA Manipulation

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors