Skip to content

dtc111111/GaussianDWM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

1 Commit
Β 
Β 
Β 
Β 

Repository files navigation

GaussianDWM: 3D Gaussian Driving World Model for Unified Scene Understanding and Multi-Modal Generation

Paper Huggingface

GaussianDWM is the first unified 3D Gaussian-based world model framework that achieves comprehensive scene understanding and scene generation for driving scenarios. It efficiently encodes complex scenes, samples task-relevant information, and handles diverse question-answering tasks. Moreover, by leveraging the extracted world knowledge, our framework guides the generative model to perform accurate spatial and temporal scene generation.


🎯 Overview

GaussianDWM addresses three core challenges in autonomous driving world models:

  • πŸ”§ Token Extraction & Projection: Novel module for 3D Gaussian scene representations with task-aware language-guided sampling that overcomes gaussian alignment and token length limitations while preserving essential spatial information
  • 🎨 Dual-condition Generation: Multi-modal scene generation framework combining high-level features from world knowledge with low-level features from images
  • πŸ”— Unified Understanding & Generation: Bridges the gap between scene comprehension and generation, enabling accurate understanding and coherent future scene prediction

Teaser


✨ Key Features

Feature Description
Unified Framework First 3D Gaussian-based world model supporting both scene understanding and generation
Semantic Space Alignment Aligns 3D Gaussian features to the semantic space of LLM for accurate cross-modal understanding
Task-aware Sampling Language-guided sampling strategy selects relevant Gaussians from dense representations
Dual-condition Generation High-level language features and low-level image features jointly guide multi-modal synthesis
Spatial & Temporal Supports novel view synthesis (1m/2m shifts) and future prediction (1s/2s ahead)

πŸ—οΈ Architecture

Architecture


πŸ’₯ News

  • [2025/12]: Paper and code coming soon!

πŸ“š Citation

If you find our work useful in your research, please consider citing:

@article{deng2025gaussiandwm,
  title={GaussianDWM: 3D Gaussian Driving World Model for Unified Scene Understanding and Multi-Modal Generation},
  author={Deng, Tianchen and Chen, Xuefeng and Chen, Yi and Chen, Qu and Xu, Yuyao and Yang, Lijin and Xu, Le and Zhang, Yu and Zhang, Bo and Huang, Wuxiong and Wang, Hesheng},
  journal={arXiv preprint},
  year={2025}
}

πŸ“„ License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.


❀️ Acknowledgments

We would like to thank the following open-source projects:

  • Qwen3-VL - Vision-language model foundation
  • Dist4D - Multi-modal scene representation

🌟 Star us on GitHub if you find this project helpful! 🌟

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published