-
HKUST(GZ) && @SYSU-STAR
Highlights
- Pro
Lists (24)
Sort Name ascending (A-Z)
Aerial Reconstruction
🤓Course
Data Structure
Depth Estimation
Hardware
Large Language Models
Manipulation
Multi-robot System
Object Detection/Segmentation
Object Goal Navigation
🤓Papar list
🤓Phd Survival Guide
Pre-training Models
Reconstruction
Robot Exploration
Robot Simulator
Scene Graph
Semantic Dataset
Semantic Mapping
🤓Tools
Trajectory Planner
Vision Language Action
Vision Language Models
Vision Language Navigation
Stars
Unofficial implementation of YOLO-World + EfficientSAM for ComfyUI
GPT4Scene: Understand 3D Scenes from Videos with Vision-Language Models
🦉 OWL: Optimized Workforce Learning for General Multi-Agent Assistance in Real-World Task Automation
No fortress, purely open ground. OpenManus is Coming.
[CVPR 2025] EgoLife: Towards Egocentric Life Assistant
[ICRA'25] One Map to Find Them All: Real-time Open-Vocabulary Mapping for Zero-shot Multi-Object Navigation
Protocol Buffers - Google's data interchange format
The repo for "Distill Any Depth: Distillation Creates a Stronger Monocular Depth Estimator"
[CVPR 2025] Fast3R: Towards 3D Reconstruction of 1000+ Images in One Forward Pass
PyTorch implementation of paper: GaussNav: Gaussian Splatting for Visual Navigation
Primitive-Swarm: An Ultra-lightweight and Scalable Planner for Large-scale Aerial Swarms
[CVPR 2025] MASt3R-SLAM: Real-Time Dense SLAM with 3D Reconstruction Priors
Integrate the DeepSeek API into popular softwares
[IEEE RA-L'25] NavRL: Learning Safe Flight in Dynamic Environments (NVIDIA Isaac/Python/ROS1/ROS2)
整理开源的中文大语言模型,以规模较小、可私有化部署、训练成本较低的模型为主,包括底座模型,垂直领域微调及应用,数据集与教程等。
robosuite: A Modular Simulation Framework and Benchmark for Robot Learning
DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding
Open-sourced code for "HOMIE: Humanoid Loco-Manipulation with Isomorphic Exoskeleton Cockpit".
A most Frontend Collection and survey of vision-language model papers, and models GitHub repository
[RA-L 2025] Dynamic Open-Vocabulary 3D Scene Graphs for Long-term Language-Guided Mobile Manipulation
[IJRR2024] The official repository for the WildScenes: A Benchmark for 2D and 3D Semantic Segmentation in Natural Environments
本项目旨在分享大模型相关技术原理以及实战经验(大模型工程化、大模型应用落地)
A curated list of awesome prompt/adapter learning methods for vision-language models like CLIP.