Here is a collection of research papers and the relevant valuable open-source resources for awesome knowledge-driven autonomous driving (AD). The repository will be continuously updated to track the frontier of knowledge-driven AD.
🌟 Welcome to star and contribute to (PR) this awesome knowledge-driven AD! 🌟
[2023.12.08] New: We release the survey 'Towards Knowledge-driven Autonomous Driving'! [2023.10.24] New: We release the awesome knowledge-driven AD!
The autonomous driving community has witnessed substantial growth in approaches that embrace a knowledge-driven paradigm. Here, we delve into knowledge-driven autonomous driving, exploring motivations, components, challenges, and prospects. More details of knowledge-driven autonomous driving can be found in our paper.
Key components in knowledge-driven AD.
Knowledge-aug. Dataset | Sensors | Knowledge Form | Tasks | Metrics |
---|---|---|---|---|
BDD-X | C | Explanation | Vehicle Control, Explanation Generation, Scene Captioning | MAE, MDC, BLEU-4, METEOR, CIDEr-D |
Cityscapes-Ref | C | Object Referral, Gaze Heatmap | Object Referring | Acc@1 |
DR(eye)VE | C | Gaze Heatmap | Gaze Prediction | CC, KLD, IG |
HAD | C | Advice | Vehicle Control | MAE, MDC |
Talk2Car | C+L+R | Object Referral | Object Referring | IoU@0.5 |
DADA-2000 | C | Gaze Heatmap, Crash Objects, Accident Window | Gaze Prediction | CC, KLD, NSS, SIM |
HDBD | C | Gaze Heatmap, Takeover Intention | Driver Takeover Detection | AUC |
Refer-KITTI | C+L | Object Referral | Object Referring, Object Tracking | HOTA |
DRAMA | C | Advice, Risk Localization | Motion Planning | L2 Error, Collision Rate |
Rank2Tell | C+L | Object Referral, Importance Ranking | Importance Estimation, Scene Captioning | F1 Score, Accuracy, BLEU-4, METEOR, ROUGE, CIDER |
DriveLM | C | Scene Captioning, Question Answering | Scene Captioning, Question Answering, Vehicle Control | ADE, FDE, Accuracy, Collision Rate, SPICE, GPT-Score |
NuScenes-QA | C+L+R | Question Answering | Question Answering | Exist, Count, Object, Status, Comparison, Acc |
DESIGN | C+L+R | Scene Captioning, Question Answering | Question Answering, Motion Planning | BLEU-4, METEOR, ROUGE, L2 Error, Collision Rate |
Reason2Drive | C+L | Question Answering | Question Answering | BLEU-4, METEOR, ROUGE, CIDER |
NuScenes-MQA | C+L+R | Question Answering | Question Answering | BLEU-4, METEOR, ROUGE |
LangAuto | C+L | Navigation Instructions, Notice Instructions | Vehicle Control | RC, IS, DS |
DriveMLM | C+L | Question Answering, User Instructions | Vehicle Control, Decision Explanation | RC, IS, DS, BLEU-4, METEOR, CIDER |
NuInstruct | C | Scene-, Frame-, Ego-, Instance Information, Question Answering | Question Answering, Scene Captioning | MAE, Accuracy, BLEU-4, mAP |
- UniSim: A Neural Closed-Loop Sensor Simulator[
CVPR 2023
, Project] - Neural Lighting Simulation for Urban Scenes [
NeurIPS 2023
, Project] - Waymax: An Accelerated, Data-Driven Simulator for Large-Scale Autonomous Driving Research [
NeurIPS 2023
, Github] - LaMPilot: An Open Benchmark Dataset for Autonomous Driving with Language Model Programs [
CVPR 2024
] - ChatSim: Editable Scene Simulation for Autonomous Driving via LLM-Agent Collaboration [
CVPR 2024
, Github, Project] - Panacea: Panoramic and Controllable Video Generation for Autonomous Driving [
CVPR 2024
, Github, Project] - Driving into the Future: Multiview Visual Forecasting and Planning with World Model for Autonomous Driving [
CVPR 2024
, Project, Github] - Generalized Predictive Model for Autonomous Driving [
CVPR 2024
] - DriveWorld: 4D Pre-trained Scene Understanding via World Models for Autonomous Driving [
CVPR 2024
] - NuPlan: A closed-loop ML-based planning benchmark for autonomous vehicles[
arxiv 2023
, Github] - DrivingDiffusion: Layout-Guided multi-view driving scene video generation with latent diffusion model [
arxiv 2023
, Project] - OccWorld: Learning a 3D Occupancy World Model for Autonomous Driving [
arxiv 2023
, Project] - ADriver-I: A General World Model for Autonomous Driving [
arxiv 2023
] - WoVoGen: World Volume-aware Diffusion for Controllable Multi-camera Driving Scene Generation [
arxiv 2023
, Github] - DriveDreamer: Towards Real-world-driven World Models for Autonomous Driving [
arxiv 2023
] - MagicDrive: Street View Generation with Diverse 3D Geometry Control [
arxiv 2023
] - GAIA-1: A Generative World Model for Autonomous Driving [
arxiv 2023
] - MUVO: A Multimodal Generative World Model for Autonomous Driving with Geometric Representations [
arxiv 2023
] - Natural-language-driven Simulation Benchmark and Copilot for Efficient Production of Object Interactions in Virtual Road Scenes [
arxiv 2023
] - DrivingGaussian: Composite Gaussian Splatting for Surrounding Dynamic Autonomous Driving Scenes [
arxiv 2023
] - OccNeRF: Self-Supervised Multi-Camera Occupancy Prediction with Neural Radiance Fields [
arxiv 2023
] - Street Gaussians for Modeling Dynamic Urban Scenes [
arxiv 2024
, Github, Project] - LimSim++: A Closed-Loop Platform for Deploying Multimodal LLMs in Autonomous Driving [
arxiv 2024
, Github, Project] - Neural Rendering based Urban Scene Reconstruction for Autonomous Driving [
arxiv 2024
] - OASim: an Open and Adaptive Simulator based on Neural Rendering for Autonomous Driving [
arxiv 2024
, Github, Project] - DriveDreamer-2: LLM-Enhanced World Models for Diverse Driving Video Generation [
arxiv 2024
, Github, Project] - SubjectDrive: Scaling Generative Data in Autonomous Driving via Subject Control [
arxiv 2024
, Project] - TCLC-GS: Tightly Coupled LiDAR-Camera Gaussian Splatting for Surrounding Autonomous Driving Scenes [
arxiv 2024
] - Urban Architect: Steerable 3D Urban Scene Generation with Layout Prior [
arxiv 2024
] - CarDreamer: Open-Source Learning Platform for World Model based Autonomous Driving [
arxiv 2024
] - Probing Multimodal LLMs as World Models for Driving [
arxiv 2024
, Github] - OccSora: 4D Occupancy Generation Models as World Simulators for Autonomous Driving [
arxiv 2024
] - Vista: A Generalizable Driving World Model with High Fidelity and Versatile Controllability [
arxiv 2024
, Github] - MagicDrive3D: Controllable 3D Generation for Any-View Rendering in Street Scenes [
arxiv 2024
] - Unleashing Generalization of End-to-End Autonomous Driving with Controllable Long Video Generation [
arxiv 2024
, Project]
- Textual explanations for self-driving vehicles [
ECCV 2018
, Github] - Grounding human-to-vehicle advice for self-driving vehicles [
CVPR 2019
] - ADAPT: Action-aware Driving Caption Transformer [
ICRA 2023
, Github] - Talk to the Vehicle: Language Conditioned Autonomous Navigation of Self Driving Cars [
IROS 2019
] - Talk2Car: Taking Control of Your Self-Driving Car [
EMNLP-IJNLP 2019
, Project] - Drama: Joint risk localization and captioning in driving [
WACV 2023
] - DiLu: A Knowledge-Driven Approach to Autonomous Driving with Large Language Models [
ICLR 2024
, Github] - Talk2BEV: Language-Enhanced Bird's Eye View (BEV) Maps [
ICRA 2024
, Project, Github] - LMDrive: Closed-Loop End-to-End Driving with Large Language Models [
CVPR 2024
, Github] - VLP: Vision Language Planning for Autonomous Driving [
CVPR 2024
] - Driving Everywhere with Large Language Model Policy Adaptation [
CVPR 2024
, Github, Project] - Holistic Autonomous Driving Understanding by Bird’s-Eye-View Injected Multi-Modal Large Models [
CVPR 2024
, Github] - Drive Like a Human: Rethinking Autonomous Driving with Large Language Models [
WACVW 2024
, Github] - GPT-Driver: Learning to Drive with GPT [
NeurIPSW 2023
, Github] - Driving with LLMs: Fusing Object-Level Vector Modality for Explainable Autonomous Driving [
ICRA 2024
, Github] - NuScenes-QA: A Multi-modal Visual Question Answering Benchmark for Autonomous Driving Scenario [
AAAI 2024
, Github] - DriveGPT4: Interpretable End-to-end Autonomous Driving via Large Language Model] [
arxiv 2023
, Project] - LanguageMPC: Large Language Models as Decision Makers for Autonomous Driving [
arxiv 2023
, Project] - Receive, Reason, and React: Drive as You Say with Large Language Models in Autonomous Vehicles [
arxiv 2023
] - Drive as You Speak: Enabling Human-Like Interaction with Large Language Models in Autonomous Vehicles [
arxiv 2023
] - SurrealDriver: Designing Generative Driver Agent Simulation Framework in Urban Contexts based on Large Language Model [
arxiv 2023
] - Language-Guided Traffic Simulation via Scene-Level Diffusion [
arxiv 2023
] - Language Prompt for Autonomous Driving [
arxiv 2023
, Github] - BEVGPT: Generative Pre-trained Large Model for Autonomous Driving Prediction, Decision-Making, and Planning [
arxiv 2023
] - HiLM-D: Towards High-Resolution Understanding in Multimodal Large Language Models for Autonomous Driving [
arxiv 2023
] - Can you text what is happening? Integrating pre-trained language encoders into trajectory prediction models for autonomous driving [
arxiv 2023
] - OpenAnnotate3D: Open-Vocabulary Auto-Labeling System for Multi-modal 3D Data [
arxiv 2023
, Github] - LangProp: A Code Optimization Framework Using Language Models Applied to Driving [
arxiv 2024
, Github] - Learning Unsupervised World Models for Autonomous Driving via Discrete Diffusion [
openreview 2023
] - Planning with an Ensemble of World Models [
openreview 2023
] - Large Language Models Can Design Game-Theoretic Objectives for Multi-Agent Planning [
openreview 2023
] - TrafficBots: Towards World Models for Autonomous Driving Simulation and Motion Prediction [
arxiv 2023
] - BEV-CLIP: Multi-Modal BEV Retrieval Methodology for Complex Scene in Autonomous Driving [
arxiv 2023
] - Large Language Models Can Design Game-theoretic Objectives for Multi-Agent Planning [
openreview 2023
] - Semantic Anomaly Detection with Large Language Models [
arxiv 2023
] - Driving through the Concept Gridlock: Unraveling Explainability Bottlenecks in Automated Driving [
arxiv 2023
] - 3D Dense Captioning Beyond Nouns: A Middleware for Autonomous Driving [
openreview 2023
] - SwapTransformer: Highway Overtaking Tactical Planner Model via Imitation Learning on OSHA Dataset [
openreview 2023
] - Language Prompt for Autonomous Driving [
arxiv 2023
, Github] - Drive Anywhere: Generalizable End-to-end Autonomous Driving with Multi-modal Foundation Models [
arxiv 2023
] - Addressing Limitations of State-Aware Imitation Learning for Autonomous Driving [
arxiv 2023
] - A Language Agent for Autonomous Driving [
arxiv 2023
] - Human-Centric Autonomous Systems With LLMs for User Command Reasoning [
WACVW 2024
] - On the Road with GPT-4V (ision): Early Explorations of Visual-Language Model on Autonomous Driving [
arxiv 2023
] - Reason2Drive: Towards Interpretable and Chain-based Reasoning for Autonomous Driving [
arxiv 2023
, Github] - GPT-4 Enhanced Multimodal Grounding for Autonomous Driving: Leveraging Cross-Modal Attention with Large Language Models [
arxiv 2023
, Github] - ChatGPT as Your Vehicle Co-Pilot: An Initial Attempt [
IEEE TIV 2023
] - DriveLLM: Charting The Path Toward Full Autonomous Driving with Large Language Models [
IEEE TIV 2023
] - NuScenes-MQA: Integrated Evaluation of Captions and QA for Autonomous Driving Datasets using Markup Annotations [
WACVW 2024
, Github] - Evaluation of Large Language Models for Decision Making in Autonomous Driving [
arxiv 2023
] - DriveMLM: Aligning Multi-Modal Large Language Models with Behavioral Planning States for Autonomous Driving [
arxiv 2023
, Github] - Large Language Models for Autonomous Driving: Real-World Experiments [
arxiv 2023
] - LingoQA: Video Question Answering for Autonomous Driving [
arxiv 2023
, Github] - DriveLM: Driving with Graph Visual Question Answering [
arxiv 2023
, Github] - LLM-Assist: Enhancing Closed-Loop Planning with Language-Based Reasoning [
arxiv 2024
, Project] - BEV-CLIP: Multi-modal BEV Retrieval Methodology for Complex Scene in Autonomous Driving [
arxiv 2024
] - DME-Driver: Integrating Human Decision Logic and 3D Scene Perception in Autonomous Driving [
arxiv 2024
] - RAG-Driver: Generalisable Driving Explanations with Retrieval-Augmented In-Context Learning in Multi-Modal Large Language Model [
arxiv 2024
, Github, Project] - DriveVLM: The Convergence of Autonomous Driving and Large Vision-Language Models [
arxiv 2024
, Project] - Embodied Understanding of Driving Scenarios [
arxiv 2024
, Github] - Driving Style Alignment for LLM-powered Driver Agent [
arxiv 2024
] - Large Language Models Powered Context-aware Motion Prediction [
arxiv 2024
] - LORD: Large Models based Opposite Reward Design for Autonomous Driving [
arxiv 2024
] - Prompting Multi-Modal Tokens to Enhance End-to-End Autonomous Driving Imitation Learning with LLMs [
arxiv 2024
] - AgentsCoDriver: Large Language Model Empowered Collaborative Driving with Lifelong Learning [
arxiv 2024
] - OmniDrive: A Holistic LLM-Agent Framework for Autonomous Driving with 3D Perception, Reasoning and Planning [
arxiv 2024
] - Co-driver: VLM-based Autonomous Driving Assistant with Human-like Behavior and Understanding for Complex Road Scenes [
arxiv 2024
] - AD-H: Autonomous Driving with Hierarchical Agents [
arxiv 2024
] - DriVLMe: Enhancing LLM-based Autonomous Driving Agents with Embodied and Social Experiences [
arxiv 2024
] - PlanAgent: A Multi-modal Large Language Agent for Closed-loop Vehicle Motion Planning [
arxiv 2024
] - REvolve: Reward Evolution with Large Language Models for Autonomous Driving [
arxiv 2024
, Project] - Continuously Learning, Adapting, and Improving: A Dual-Process Approach to Autonomous Driving [
arxiv 2024
, Github, Project]
- Applications of Large Scale Foundation Models for Autonomous Driving [
arxiv 2023
] - A Survey on Multimodal Large Language Models for Autonomous Driving [
arxiv 2023
] - A Survey of Large Language Models for Autonomous Driving [
arxiv 2023
] - Vision Language Models in Autonomous Driving and Intelligent Transportation Systems [
arxiv 2023
] - Choose Your Simulator Wisely: A Review on Open-source Simulators for Autonomous Driving [
arxiv 2023
] - Towards Knowledge-driven Autonomous Driving [
arxiv 2023
] - Forging Vision Foundation Models for Autonomous Driving: Challenges, Methodologies, and Opportunities [
arxiv 2024
] - A Survey for Foundation Models in Autonomous Driving [
arxiv 2024
] - World Models for Autonomous Driving: An Initial Survey [
arxiv 2024
] - Delving into Multi-modal Multi-task Foundation Models for Road Scene Understanding: From Learning Paradigm Perspectives [
arxiv 2024
] - Prospective Role of Foundation Models in Advancing Autonomous Vehicles [
arxiv 2024
]
- [WACV2024 Workshop] MAPLM: A Large-Scale Vision-Language Dataset for Map and Traffic Scene Understanding
- [Blog] LINGO-1: Exploring Natural Language for Autonomous Driving
- [Blog] Introducing GAIA-1: A Cutting-Edge Generative AI Model for Autonomy
- [Blog] Ghost Gym: A Neural Simulator for Autonomous Driving
If you find our paper useful, please kindly cite us via:
@article{li2023knowledgedriven,
title={Towards Knowledge-driven Autonomous Driving},
author={Li, Xin and Bai, Yeqi and Cai, Pinlong and Wen, Licheng and Fu, Daocheng and Zhang, Bo and Yang, Xuemeng and Cai, Xinyu and Ma, Tao and Guo, Jianfei and Gao, Xing and Dou, Min and Shi, Botian and Liu, Yong and He, Liang and Qiao, Yu},
journal={arXiv preprint arXiv:2312.04316},
year = {2023}
}
Awesome Knowledge-driven Autonomous Driving is released under the Apache 2.0 license.