- China
Highlights
- Pro
Lists (9)
Sort Name ascending (A-Z)
Starred repositories
Official repo and evaluation implementation of VSI-Bench
π up-to-date & curated list of awesome 3D Visual Grounding papers, methods & resources.
Improving 3D Large Language Model via Robust Instruction Tuning
π€ LeRobot: Making AI for Robotics more accessible with end-to-end learning
A Vision-Language Model for Spatial Affordance Prediction in Robotics
Official Task Suite Implementation of ICML'23 Paper "VIMA: General Robot Manipulation with Multimodal Prompts"
A Survey of Embodied Learning for Object-Centric Robotic Manipulation
openvla / openvla
Forked from TRI-ML/prismatic-vlmsOpenVLA: An open-source vision-language-action model for robotic manipulation.
[Embodied-AI-Survey-2024] Paper list and projects for Embodied AI
Implementation of Autoregressive Diffusion in Pytorch
STAR: Scale-wise Text-to-image generation via Auto-Regressive representations
Official Implementation of "Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining"
Utilities intended for use with Llama models.
π° Must-read papers and blogs on LLM based Long Context Modeling π₯
Cambrian-1 is a family of multimodal LLMs with a vision-centric design.
π₯π₯π₯ A curated list of papers on LLMs-based multimodal generation (image, video, 3D and audio).
Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR.
Datasets for data-driven deep reinforcement learning with Atari (wrapper for datasets released by Google)
[NeurIPS 2024] An official implementation of ShareGPT4Video: Improving Video Understanding and Generation with Better Captions
code for NeurIPS 2018 paper, "Sparse PCA from Sparse Linear Regression"