Skip to content

[CVPR 2025] Official implementation of "GenManip: LLM-driven Simulation for Generalizable Instruction-Following Manipulation"

License

Notifications You must be signed in to change notification settings

InternRobotics/GenManip

Repository files navigation

GenManip: LLM-driven Simulation for Generalizable Instruction-Following Manipulation

📄 Official Project Page for CVPR 2025 Paper
🎥 Watch the demo video below to see GenManip in action!

GenManip Video

Paper Project Page Docs


🧠 Overview

GenManip is a large-scale simulation and evaluation platform for generalist robotic manipulation policies under diverse and realistic instruction-following scenarios.

Built on NVIDIA Isaac Sim, GenManip enables:

  • 🧠 LLM-driven task generation via a novel Task-oriented Scene Graph (ToSG)
  • 🔬 200 curated evaluation scenarios for both modular and end-to-end policy benchmarking
  • 🧱 A scalable asset pool with 10,000+ rigid and 100+ articulated objects with multimodal annotations
  • 🧭 Evaluation of spatial, appearance, commonsense, and long-horizon reasoning abilities

🚀 Recent Highlights

🔹 Oct 2025 — Data & Evaluation Release

The data synthesis pipeline and evaluation toolkit for generalizable pick-and-place tasks are now available.

🔹 Aug 2025 — IROS 2025 Challenge Integration

GenManip serves as the core simulation backbone for the IROS 2025 Challenge: Vision-Language Manipulation in Open Tabletop Environments.

  • Generated 55K+ generalizable pick-and-place tasks across ~14K objects using the ALOHA platform
  • Released 10 expert-designed post-training tasks for dual-arm manipulation
  • Provided diverse pre-training data with randomized objects, scenes, and language instructions to promote cross-domain generalization

📌 Challenge Registration:
https://eval.ai/web/challenges/challenge-page/2626/overview

IROS 2025 Teaser


📂 Dataset Access

Type Description Link
Pre-training Data Dual-arm generalizable pick-and-place (55K+ samples) Hugging Face
Post-training Data Dual-arm manipulation, 10 benchmark tasks Hugging Face

Additional Resources

  • The GenManip Benchmark will be merged into InternManip
  • Datasets are also included in InternData-M1 — a large-scale embodied robotics dataset with ~250K demonstrations and rich annotations (2D/3D boxes, trajectories, grasps, masks)
  • Conversion to LeRobot format is ongoing; all data has been generated and will be fully available soon
  • Scaling data for long-horizon, multi-stage manipulation is in progress 🚀

✨ Key Features

Feature Description
🎯 ToSG-based Task Synthesis Graph-based semantic representation for generating compositional tasks
🖼️ Photorealistic Simulation RTX ray-traced rendering with physically accurate dynamics
📊 Benchmark Suite 200+ diverse tasks with human-in-the-loop annotation refinement
🧪 Evaluation Toolkit Supports SR, SPL, ablation studies, and generalization diagnostics

🧩 TODO List

  • Website, documentation, and leaderboard
  • Code release for task synthesis, rendering, and evaluation
  • Full GenManip asset pack (10K+ objects)
  • Baseline model implementations (ACT, Seer, InternVLA-M1, etc.)
  • Objaverse scaling pipeline

📚 Citation

If you find our work useful, please cite:

@inproceedings{gao2025genmanip,
  title={GenManip: LLM-driven Simulation for Generalizable Instruction-Following Manipulation},
  author={Gao, Ning and Chen, Yilun and Yang, Shuai and Chen, Xinyi and Tian, Yang and Li, Hao and Huang, Haifeng and Wang, Hanqing and Wang, Tai and Pang, Jiangmiao},
  booktitle={CVPR},
  year={2025}
}

About

[CVPR 2025] Official implementation of "GenManip: LLM-driven Simulation for Generalizable Instruction-Following Manipulation"

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published