This repo contains a curated list of 3D Vision papers relating to Robotics domain in the era of large models i.e. LLMs/VLMs, inspired by awesome-computer-vision
Please feel free to send me pull requests or email to add papers!
If you find this repository useful, please consider citing π and STARing β this list.
Feel free to share this list with others! List curated and maintained by Zubair Irshad. If you have any questions, please get in touch!
π₯ Other relevant survey papers:
-
"Neural Fields in Robotics", arXiv, Oct 2024. [Paper]
-
"When LLMs step into the 3D World: A Survey and Meta-Analysis of 3D Tasks via Multi-modal Large Language Models", arXiv, May 2024. [Paper]
-
"3D Gaussian Splatting in Robotics: A Survey", arXiv, Oct 2024. [Paper]
-
"A Comprehensive Study of 3-D Vision-Based Robot Manipulation", TCYB 2021. [Paper]
- Policy Learning
- Pretraining
- VLM and LLM
- Representations
- Simulations, Datasets and Benchmarks
- Citation
-
3D Diffuser Actor: "Policy diffusion with 3d scene representations", arXiv Feb 2024. [Paper] [Webpage] [Code]
-
3D Diffusion Policy: "Generalizable Visuomotor Policy Learning via Simple 3D Representations", RSS 2024. [Paper] [Webpage] [Code]
-
DNAct: "Diffusion Guided Multi-Task 3D Policy Learning", arXiv Mar 2024. [Paper] [Webpage]
-
ManiCM: "Real-time 3D Diffusion Policy via Consistency Model for Robotic Manipulation", arXiv Jun 2024. [Paper] [Webpage] [Code]
-
HDP: "Hierarchical Diffusion Policy for Kinematics-Aware Multi-Task Robotic Manipulation", CVPR 2024. [Paper] [Webpage] [Code]
-
Imagination Policy: "Using Generative Point Cloud Models for Learning Manipulation Policies", arXiv Jun 2024. [Paper] [Webpage]
-
PCWM: "Point Cloud Models Improve Visual Robustness in Robotic Learners", ICRA 2024. [Paper] [Webpage]
-
RVT: "Generalizable Visuomotor Policy Learning via Simple 3D Representations", CORL 2023. [Paper] [Webpage] [Code]
-
Act3D: "3D Feature Field Transformers for Multi-Task Robotic Manipulation", CORL 2023. [Paper] [Webpage] [Code]
-
VIHE: "Transformer-Based 3D Object Manipulation Using Virtual In-Hand View", arXiv, Mar 2024. [Paper] [Webpage] [Code]
-
SGRv2: "Leveraging Locality to Boost Sample Efficiency in Robotic Manipulation", arXiv, Jun 2024. [Paper] [Webpage]
-
Sigma-Agent: "Contrastive Imitation Learning for Language-guided Multi-Task Robotic Manipulation", arXiv June 2024. [Paper]
-
RVT-2: "Learning Precise Manipulation from Few Demonstrations", RSS 2024. [Paper] [Webpage] [Code]
-
SAM-E: "Leveraging Visual Foundation Model with Sequence Imitation for Embodied Manipulation", ICML 2024. [Paper] [Webpage] [Code]
-
RISE: "3D Perception Makes Real-World Robot Imitation Simple and Effective", arXiv, Apr 2024. [Paper] [Webpage] [Code]
-
Polarnet: "3D Point Clouds for Language-Guided Robotic Manipulation", CORL 2023. [Paper] [Webpage] [Code]
-
Chaineddiffuser: "Unifying Trajectory Diffusion and Keypose Prediction for Robotic Manipulation", CORL 2023. [Paper] [Webpage] [Code]
-
Pointcloud_RL: "On the Efficacy of 3D Point Cloud Reinforcement Learning", arXiv, June 2023. [Paper] [Code]
-
Perceiver-Actor: "A Multi-Task Transformer for Robotic Manipulation", CORL 2022. [Paper] [Webpage] [Code]
-
CLIPort: "What and Where Pathways for Robotic Manipulation", CORL 2021. [Paper] [Webpage] [Code]
-
Polarnet: "3D Point Clouds for Language-Guided Robotic Manipulation", CORL 2023. [Paper] [Webpage] [Code]
-
3D-MVP: "3D Multiview Pretraining for Robotic Manipulation", arXiv, June 2024. [Paper] [Webpage]
-
DexArt: "Benchmarking Generalizable Dexterous Manipulation with Articulated Objects", CVPR 2023. [Paper] [Webpage] [Code]
-
RoboUniView: "Visual-Language Model with Unified View Representation for Robotic Manipulaiton", arXiv, Jun 2023. [Paper] [Website] [Code]
-
SUGAR: "Pre-training 3D Visual Representations for Robotics", CVPR 2024. [Paper] [Webpage] [Code]
-
DPR: "Visual Robotic Manipulation with Depth-Aware Pretraining", arXiv, Jan 2024. [Paper]
-
MV-MWM: "Multi-View Masked World Models for Visual Robotic Manipulation", ICML 2023. [Paper] [Code]
-
Point Cloud Matters: "Rethinking the Impact of Different Observation Spaces on Robot Learning", arXiv, Feb 2024. [Paper] [Code]
-
RL3D: "Visual Reinforcement Learning with Self-Supervised 3D Representations", IROS 2023. [Paper] [Website] [Code]
-
AHA: "A Vision-Language-Model for Detecting and Reasoning over Failures in Robotic Manipulation", ArXiv 2024. [Paper] [Website]
-
ShapeLLM: "ShapeLLM: Universal 3D Object Understanding for Embodied Interaction", ECCV 2024. [Paper/PDF] [Code] [Website]
-
3D-VLA: "3D Vision-Language-Action Generative World Model", ICML 2024. [Paper] [Website] [Code]
-
RoboPoint: "A Vision-Language Model for Spatial Affordance Prediction for Robotics", CORL 2024. [Paper] [Website] [Demo]
-
Open6DOR: "Benchmarking Open-instruction 6-DoF Object Rearrangement and A VLM-based Approach", IROS 2024. [Paper] [Website] [Code]
-
ReasoningGrasp: "Reasoning Grasping via Multimodal Large Language Model", CORL 2024. [Paper]
-
SpatialVLM: "Endowing Vision-Language Models with Spatial Reasoning Capabilities", CVPR 2024. [Paper] [Website] [Code]
-
SpatialRGPT: "Grounded Spatial Reasoning in Vision Language Model", arXiv, June 2024. [Paper] [Website]
-
Scene-LLM: "Extending Language Model for 3D Visual Understanding and Reasoning", arXiv, Mar 2024. [Paper]
-
ManipLLM: "Embodied Multimodal Large Language Model for Object-Centric Robotic Manipulation ", CVPR 2024. [Paper] [Website] [Code]
-
Manipulate-Anything: "Manipulate-Anything: Automating Real-World Robots using Vision-Language Models", CoRL, 2024. [Paper] [Website]
-
MOKA: "Open-Vocabulary Robotic Manipulation through Mark-Based Visual Prompting", RSS 2024. [Paper] [Website] [Code]
-
Agent3D-Zero: "An Agent for Zero-shot 3D Understanding", arXIv, Mar 2024. [Paper] [Website] [Code]
-
MultiPLY: "A Multisensory Object-Centric Embodied Large Language Model in 3D World", CVPR 2024. [Paper] [Website] [Code]
-
ThinkGrasp: "A Vision-Language System for Strategic Part Grasping in Clutter", arXiv, Jul 2024. [Paper] [Website]
-
VoxPoser: "Composable 3D Value Maps for Robotic Manipulation with Language Models", CORL 2023. [Paper] [Website] [Code]
-
Dream2Real: "Zero-Shot 3D Object Rearrangement with Vision-Language Models", ICRA 2024. [Paper] [Website] [Code]
-
LEO: "An Embodied Generalist Agent in 3D World", ICML 2024. [Paper] [Website] [Code]
-
SpatialPIN: "Enhancing Spatial Reasoning Capabilities of Vision-Language Models through Prompting and Interacting 3D Priors", arXiv, Mar 2024. [Paper] [Website]
-
SpatialBot: "Precise Spatial Understanding with Vision Language Models", arXiv, Jun 2024. [Paper] [Code]
-
COME-robot: "Closed-Loop Open-Vocabulary Mobile Manipulation with GPT-4V", arXiv, Apr 2024. [Paper] [Website]
-
3D-LLM: "Open-Vocabulary Robotic Manipulation through Mark-Based Visual Prompting", Neurips 2023. [Paper] [Website] [Code]
-
VLMaps: "Visual Language Maps for Robot Navigation", ICRA 2023. [Paper] [Website] [Code]
-
MoMa-LLM: "Language-Grounded Dynamic Scene Graphs for Interactive Object Search with Mobile Manipulation", RA-L 2024. [Paper] [Website] [Code]
-
LGrasp6D: "Language-Driven 6-DoF Grasp Detection Using Negative Prompt Guidance", ECCV 2024. [Paper] [Website]
-
OpenAD: "Open-Vocabulary Affordance Detection in 3D Point Clouds", IROS 2023. [Paper] [Website] [Code]
-
3DAPNet: "Language-Conditioned Affordance-Pose Detection in 3D Point Clouds", ICRA 2024. [Paper] [Website] [Code]
-
OpenKD: "Open-Vocabulary Affordance Detection using Knowledge Distillation and Text-Point Correlation", ICRA 2024. [Paper] [Code]
-
PARIS3D: "Reasoning Based 3D Part Segmentation Using Large Multimodal Model", ECCV 2024. [Paper] [Code]
-
RoVi-Aug: "Robot and Viewpoint Augmentation for Cross-Embodiment Robot Learning", CORL 2024. [Paper] [Webpage]
-
Vista: "View-Invariant Policy Learning via Zero-Shot Novel View Synthesis", CORL 2024. [Paper] [Webpage] [Code]
-
GraspSplats: "Efficient Manipulation with 3D Feature Splatting", CORL 2024. [Paper] [Webpage] [Code]
-
RAM: "Retrieval-Based Affordance Transfer for Generalizable Zero-Shot Robotic Manipulation", CORL 2024. [Paper] [Webpage] [Code]
-
Language-Embedded Gaussian Splats (LEGS): "Incrementally Building Room-Scale Representations with a Mobile Robot", IROS 2024. [Paper] [Webpage]
-
Splat-MOVER: "Multi-Stage, Open-Vocabulary Robotic Manipulation via Editable Gaussian Splatting", arXiv May 2024. [Paper] [Webpage]
-
GNFactor: "Multi-Task Real Robot Learning with Generalizable Neural Feature Fields", CORL 2023. [Paper] [Webpage] [Code]
-
ManiGaussian: "Dynamic Gaussian Splatting for Multi-task Robotic Manipulation", ECCV 2024. [Paper] [Webpage] [Code]
-
GaussianGrasper: "3D Language Gaussian Splatting for Open-vocabulary Robotic Grasping", arXiv Mar 2024. [Paper] [Webpage] [Code]
-
ORION: "Vision-based Manipulation from Single Human Video with Open-World Object Graphs", arXiv May 2024. [Paper] [Webpage]
-
ConceptGraphs: "Open-Vocabulary 3D Scene Graphs for Perception and Planning", ICRA 2024. [Paper] [Webpage] [Code]
-
SparseDFF: "Sparse-View Feature Distillation for One-Shot Dexterous Manipulation", ICLR 2024. [Paper] [Webpage]
-
GROOT: "Learning Generalizable Manipulation Policies with Object-Centric 3D Representations", CORL 2023. [Paper] [Webpage] [Code]
-
Distilled Feature Fields: "Enable Few-Shot Language-Guided Manipulation", CORL 2023. [Paper] [Webpage] [Code]
-
SGR: "A Universal Semantic-Geometric Representation for Robotic Manipulation", CORL 2023. [Paper] [Webpage] [Code]
-
OVMM: "Open-vocabulary Mobile Manipulation in Unseen Dynamic Environments with 3D Semantic Maps", arXiv, Jun 2024. [Paper]
-
CLIP-Fields: "Weakly Supervised Semantic Fields for Robotic Memory", RSS 2023. [Paper] [Webpage] [Code]
-
NeRF in the Palm of Your Hand: "Corrective Augmentation for Robotics via Novel-View Synthesis", CVPR 2023. [Paper] [Webpage]
-
JCR: "Unifying Scene Representation and Hand-Eye Calibration with 3D Foundation Models", arXiv, Apr 2024. [Paper] [Code]
-
D3Fields: "Dynamic 3D Descriptor Fields for Zero-Shot Generalizable Robotic Manipulation", arXiv, Sep 2023. [Paper] [Webpage] [Code]
-
SayPlan: "Grounding Large Language Models using 3D Scene Graphs for Scalable Robot Task Planning", CORL 2023. [Paper] [Webpage]
-
Dex-NeRF: "Using a Neural Radiance field to Grasp Transparent Objects", CORL 2021. [Paper] [Webpage]
-
The Colosseum: "A Benchmark for Evaluating Generalization for Robotic Manipulation", RSS 2024. [Paper] [Website] [Code]
-
OpenEQA: "Embodied Question Answering in the Era of Foundation Models", CVPR 2024. [Paper] [Website] [Code]
-
DROID: "A Large-Scale In-the-Wild Robot Manipulation Dataset", RSS 2024. [Paper] [Website] [Code]
-
RH20T: "A Comprehensive Robotic Dataset for Learning Diverse Skills in One-Shot", ICRA 2024. [Paper] [Website] [Code]
-
Gen2Sim: "A Comprehensive Robotic Dataset for Learning Diverse Skills in One-Shot", ICRA 2024. [Paper] [Website] [Code]
-
BEHAVIOR Vision Suite: "Customizable Dataset Generation via Simulation", CVPR 2024. [Paper] [Website] [Code]
-
RoboCasa: "Large-Scale Simulation of Everyday Tasks for Generalist Robots", RSS 2024. [Paper] [Website] [Code]
-
ARNOLD: "ARNOLD: A Benchmark for Language-Grounded Task Learning With Continuous States in Realistic 3D Scenes", ICCV 2023. [Paper] [Webpage] [Code]
-
VIMA: "General Robot Manipulation with Multimodal Prompts", ICML 2023. [Paper] [Website] [Code]
-
ManiSkill2: "A Unified Benchmark for Generalizable Manipulation Skills", ICLR 2023. [Paper] [Website] [Code]
-
Robo360: "A 3D Omnispective Multi-Material Robotic Manipulation Dataset", arxiv, Dec 2023. [Paper]
-
AR2-D2: "Training a Robot Without a Robot", CORL 2023. [Paper] [Website] [Code]
-
Habitat 2.0: "Training Home Assistants to Rearrange their Habitat", Neuips 2021. [Paper] [Website] [Code]
-
VL-Grasp: "a 6-Dof Interactive Grasp Policy for Language-Oriented Objects in Cluttered Indoor Scenes", IROS 2023. [Paper] [Code]
-
OCID-Ref: "A 3D Robotic Dataset with Embodied Language for Clutter Scene Grounding", NAACL 2021. [Paper] [Code]
-
ManipulaTHOR: "A Framework for Visual Object Manipulation", CVPR 2021. [Paper] [Website] [Code]
-
RoboTHOR: "An Open Simulation-to-Real Embodied AI Platform", CVPR 2020. [Paper] [Website] [Code]
-
HabiCrowd: "HabiCrowd: A High Performance Simulator for Crowd-Aware Visual Navigation", IROS 2024. [Paper] [Website] [Code]
If you find this repository useful, please consider citing this list:
@misc{irshad2024roboticd3D,
title = {Awesome Robotics 3D - A curated list of resources on 3D vision papers relating to robotics},
author = {Muhammad Zubair Irshad},
journal = {GitHub repository},
url = {https://github.com/zubair-irshad/Awesome-Robotics-3D},
year = {2024},
}