GeoPredict: Leveraging Predictive Kinematics
and 3D Gaussian Geometry
for Precise VLA Manipulation

🔥 CVPR 2026 🔥

Jingjing Qian¹ Boyao Han² Chen Shi¹ Lei Xiao¹ Long Yang¹ Shaoshuai Shi³ Li Jiang¹
¹Chinese University of Hong Kong, Shenzhen ²Hunan University ³Voyager Research, Didi Chuxing

🔥 News

[2026-05] We released the inference code and checkpoints for GeoPredict.
[2026-02] Our paper was accepted by CVPR2026 as a Highlight ! 🥳
[2025-12] We released the paper and the project page for GeoPredict.

🎄 Overview

GeoPredict is a geometry-aware vision-language-action (VLA) framework for robotic manipulation. Existing methods are often limited by:

2D-Centric Formulation: operate in 2D image space, lacking explicit 3D spatial modeling.
Reactive Control: map observations reactively, failing to anticipate future physical dynamics.
Geometric Inconsistency: view-independent predictions struggle to enforce 3D consistency.

GeoPredict addresses these limitations with:

Geometry-Aware VLA: augments VLA with predictive kinematic and 3D geometric priors.
Predictive 3D Modeling: forecasts workspace geometry using track-guided 3DGS refinement.
Lightweight Inference: uses predictive modules solely for training, reducing test-time overhead.

📝 TODO

Release paper and project page.
Release inference code and checkpoints.
Release training code. Expected in June 2026.
Support more open-source VLA models, such as Pi0.5 and OpenVLA. Expected in July 2026.

📚 Getting Started

Environment Setup & Inference

📖 Framework

GeoPredict consists of three key components:

(a) Trajectory-Level Kinematic Prediction: encodes motion history of robot keypoints into compact tokens via a Track Encoder, and predicts multi-step 3D keypoint trajectories using learnable future track queries.
(b) Predictive 3D Gaussian Geometry: decodes a coarse 3D spatial query into initial Gaussian primitives to represent workspace geometry, and forecasts how the explicit 3D scene representation evolves across multiple future timesteps.
(c) Track-Guided Refinement & Rendering: adaptively increases Gaussian density along predicted trajectories to capture task-relevant interaction regions, and supervises the predictive 3DGS exclusively through future depth-map rendering without color modeling.

📊 Results

RoboCasa Simulation Benchmark Results

LIBERO Simulation Benchmark Results

We report strong performance on both RoboCasa Human-50 and LIBERO benchmarks, demonstrating the effectiveness of GeoPredict in geometry-intensive and spatially demanding manipulation tasks. Please see the paper for full tables, metrics, and more detailed analysis.

📬 Contact

If you have questions about the paper, feel free to open an issue or contact:

Jingjing Qian: jingjingqian.0705@gmail.com

🔗 Citation

If you find our work helpful, please cite:

@misc{qian2025geopredict,
  title={GeoPredict: Leveraging Predictive Kinematics and 3D Gaussian Geometry for Precise VLA Manipulation},
  author={Jingjing Qian and Boyao Han and Chen Shi and Lei Xiao and Long Yang and Shaoshuai Shi and Li Jiang},
  year={2025},
  eprint={2512.16811},
  archivePrefix={arXiv},
  primaryClass={cs.CV},
  url={https://arxiv.org/abs/2512.16811},
}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
assets		assets
data_processing		data_processing
docs		docs
models		models
tools		tools
utils		utils
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
readme.md		readme.md
requirements.txt		requirements.txt
test_robocasa.sh		test_robocasa.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GeoPredict: Leveraging Predictive Kinematics
and 3D Gaussian Geometry
for Precise VLA Manipulation

🔥 CVPR 2026 🔥

🔥 News

🎄 Overview

📝 TODO

📚 Getting Started

📖 Framework

📊 Results

RoboCasa Simulation Benchmark Results

LIBERO Simulation Benchmark Results

📬 Contact

🔗 Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

GeoPredict: Leveraging Predictive Kinematics and 3D Gaussian Geometry for Precise VLA Manipulation

🔥 CVPR 2026 🔥

🔥 News

🎄 Overview

📝 TODO

📚 Getting Started

📖 Framework

📊 Results

RoboCasa Simulation Benchmark Results

LIBERO Simulation Benchmark Results

📬 Contact

🔗 Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

GeoPredict: Leveraging Predictive Kinematics
and 3D Gaussian Geometry
for Precise VLA Manipulation

Packages