Skip to content

[ECCV 2024] Embodied Understanding of Driving Scenarios

Notifications You must be signed in to change notification settings

OpenDriveLab/ELM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

53 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ELM: Embodied Understanding of Driving Scenarios

Revive driving scene understanding by delving into the embodiment philosophy

ELM: v1.0 License: Apache2.0

Yunsong Zhou, Linyan Huang, Qingwen Bu, Jia Zeng, Tianyu Li, Hang Qiu, Hongzi Zhu, Minyi Guo, Yu Qiao, and Hongyang Li

Highlights

🔥 The first embodied language model for understanding the long-horizon driving scenarios in space and time.

🌟 ELM expands a wide spectrum of new tasks to fully leverage the capability of large language models in an embodiment setting and achieves significant improvements in various applications.

method

🏆 Interpretable driving model, on the basis of language prompting, will be a main track in the CVPR 2024 Autonomous Driving Challenge. Please stay tuned for further details!

News

  • 🔥 Interpretable driving model is launched. Please refer to the link for more details.
  • [2024/03] ELM paper released.
  • [2024/03] ELM code and data initially released.

Table of Contents

  1. Highlights
  2. News
  3. TODO List
  4. Installation
  5. Dataset
  6. Training and Inference
  7. License and Citation
  8. Related Resources

TODO List

  • Release fine-tuning code and data
  • Release reference checkpoints
  • Toolkit for label generation

Installation

  1. (Optional) Creating conda environment
conda create -n elm python=3.8
conda activate elm
  1. install from PyPI
pip install salesforce-lavis
  1. Or, for development, you may build from source
git clone https://github.com/OpenDriveLab/ELM.git
cd ELM
pip install -e .

Dataset

Pre-training data. We collect driving videos from YouTube, nuScenes, Waymo, and Ego4D. Here we provide a sample of 🔗 YouTube video list we used. For privacy considerations, we are temporarily keeping the full-set data labels private. Part of pre-training data and reference checkpoints can be found in 💾 google drive.

Fine-tuning data. The full set of question and answer pairs for the benchmark can be obtained through this 🔗data link. You may need to download the corresponding image data from the official nuScenes and Ego4D channels. For a quick verification of the pipeline, we recommend downloading the subset dataset of DriveLM and organizing the data in line with the format.

Please make sure to soft link nuScenes and ego4d datasets under data/xx folder. You may need to run tools/video_clip_processor.py to pre-process data first. Besides, we provide some script used during auto-labeling, you may use these as a reference if you want to customize data.

Training

# you can modify the lavis/projects/blip2/train/advqa_t5_elm.yaml
bash scripts/train.sh

Inference

Modify the advqa_t5_elm.yaml to enable the evaluate as True.

bash scripts/train.sh

For the evaluation of generated answers, please use the script in scripts/qa_eval.py.

python scripts/qa_eval.py <data_root> <log_name>

License and Citation

All assets and code in this repository are under the Apache 2.0 license unless specified otherwise. The language data is under CC BY-NC-SA 4.0. Other datasets (including nuScenes and Ego4D) inherit their own distribution licenses. Please consider citing our paper and project if they help your research.

@article{zhou2024embodied,
  title={Embodied Understanding of Driving Scenarios},
  author={Zhou, Yunsong and Huang, Linyan and Bu, Qingwen and Zeng, Jia and Li, Tianyu and Qiu, Hang and Zhu, Hongzi and Guo, Minyi and Qiao, Yu and Li, Hongyang},
  journal={arXiv preprint arXiv:2403.04593},
  year={2024}
}

Related Resources

We acknowledge all the open-source contributors for the following projects to make this work possible:

Twitter Follow