-
Shanghai Jiaotong University
- Shanghai
- https://agito555.github.io/
Highlights
- Pro
Stars
🌐 A curated collection of vision-language-action (VLA) models for autonomous driving applications
[CVPR 2025] GaussTR: Foundation Model-Aligned Gaussian Transformer for Self-Supervised 3D Spatial Understanding
Implement custom operators in PyTorch with cuda/c++
Several simple examples for popular neural network toolkits calling custom CUDA operators.
[CoRL 2025] CaRL: Learning Scalable Planning Policies with Simple Rewards
[CoRL 2025] Repository relating to "TrackVLA: Embodied Visual Tracking in the Wild"
[NeurIPS 2024] OPUS: Occupancy Prediction Using a Sparse Set
[CVPR 2025 Best Paper Award] VGGT: Visual Geometry Grounded Transformer
[ICCV 2025] DiST-4D: Disentangled Spatiotemporal Diffusion with Metric Depth for 4D Driving Scene Generation
Adding Scene-Centric Forecasting Control to Occupancy World Model
This repository summarizes recent advances in the VLA + RL paradigm and provides a taxonomic classification of relevant works.
🚀 Efficient implementations of state-of-the-art linear attention models
Cosmos-Drive-Dreams: Scalable Synthetic Driving Data Generation with World Foundation Models
[ECCV 2024] Fully Sparse 3D Occupancy Prediction & RayIoU Evaluation Metric
[Lumina Embodied AI] 具身智能技术指南 Embodied-AI-Guide
A Paper List for Humanoid Robot Learning.
[ICLR 2025 Oral] The official implementation of "Diffusion-Based Planning for Autonomous Driving with Flexible Guidance"
[ICCV 2025] Official code of "ORION: A Holistic End-to-End Autonomous Driving Framework by Vision-Language Instructed Action Generation"
本项目旨在分享大模型相关技术原理以及实战经验(大模型工程化、大模型应用落地)
EFFOcc: A Minimal Baseline for EFficient Fusion-based 3D Occupancy Network
🔥LeetCode solutions in any programming language | 多种编程语言实现 LeetCode、《剑指 Offer(第 2 版)》、《程序员面试金典(第 6 版)》题解
[CVPR 2025 Highlight] OmniManip: Towards General Robotic Manipulation via Object-Centric Interaction Primitives as Spatial Constraints
A comprehensive list of papers for the definition of World Models and using World Models for General Video Generation, Embodied AI, and Autonomous Driving, including papers, codes, and related webs…
[ICLR 2025] Official code implementation for the paper "X-Drive: Cross-modality Consistent Multi-Sensor Data Synthesis for Driving Scenarios"
SemanticKITTI API for visualizing dataset, processing data, and evaluating results.
[CVPR 2025] GaussHDR: High Dynamic Range Gaussian Splatting via Learning Unified 3D and 2D Local Tone Mapping