Stars
Video-R1: Reinforcing Video Reasoning in MLLMs [🔥the first paper to explore R1 for video]
library for nonlinear optimization, wrapping many algorithms for global and local, constrained or unconstrained, optimization
[CVPR'24 Oral] Official repository of Point Transformer V3 (PTv3)
[CVPR 2025] RoboBrain: A Unified Brain Model for Robotic Manipulation from Abstract to Concrete. Official Repository.
Labeling tool with SAM(segment anything model),supports SAM, SAM2, sam-hq, MobileSAM EdgeSAM etc.交互式半自动图像标注工具
Annotate better with CVAT, the industry-leading data engine for machine learning. Used and trusted by teams at any scale, for data of any scale.
Reflective Planning: Vision-Language Models for Multi-Stage Long-Horizon Robotic Manipulation
Release repo for our SLAM Handbook
A Foundational Vision-Language-Action Model for Synergizing Cognition and Action in Robotic Manipulation
GRUtopia: Dream General Robots in a City at Scale
Download market data from Yahoo! Finance's API
MichalZawalski / embodied-CoT
Forked from openvla/openvlaEmbodied Chain of Thought: A robotic policy that reason to solve the task.
This package contains the original 2012 AlexNet code.
NVIDIA Isaac GR00T N1 is the world's first open foundation model for generalized humanoid robot reasoning and skills.
A Python framework for high performance GPU simulation and graphics
Motion imitation with deep reinforcement learning.
[Lumina Embodied AI Community] 具身智能技术指南 Embodied-AI-Guide
Fast and flexible image augmentation library. Paper about the library: https://www.mdpi.com/2078-2489/11/2/125
Image augmentation for machine learning experiments.
Align Anything: Training All-modality Model with Feedback
[CVPR 2025 Highlight] Video Depth Anything: Consistent Depth Estimation for Super-Long Videos
moojink / openvla-oft
Forked from openvla/openvlaFine-Tuning Vision-Language-Action Models: Optimizing Speed and Success