Skip to content

[ICCV 2025] Depth AnyEvent: A Cross-Modal Distillation Paradigm for Event-Based Monocular Depth Estimation

Notifications You must be signed in to change notification settings

bartn8/depthanyevent

Repository files navigation

Depth AnyEvent: A Cross-Modal Distillation Paradigm for Event-Based Monocular Depth Estimation (ICCV 2025)


🚨 This repository will contain download links to our evaluation code, and trained deep models of our work "Depth AnyEvent: A Cross-Modal Distillation Paradigm for Event-Based Monocular Depth Estimation", ICCV 2025

by Luca Bartolomei1,2, Enrico Mannocci2, Fabio Tosi2, Matteo Poggi1,2, and Stefano Mattoccia1,2

Advanced Research Center on Electronic System (ARCES)1 Department of Computer Science and Engineering (DISI)2

University of Bologna

Depth AnyEvent: A Cross-Modal Distillation Paradigm for Event-Based Monocular Depth Estimation (ICCV 2025)

Project Page | Paper

Alt text

Proposed Cross-Modal Distillation Strategy. During training, a VFM teacher processes RGB input frames to generate proxy depth labels, which supervise an event-based student model. The student takes aligned event stacks as input and predicts the final depth map.

Note: 🚧 Kindly note that this repository is currently in the development phase. We are actively working to add and refine features and documentation. We apologize for any inconvenience caused by incomplete or missing elements and appreciate your patience as we work towards completion.

📑 Table of Contents

🎬 Introduction

Monocular depth perception from cameras is crucial for applications such as autonomous navigation and robotics. While conventional cameras have enabled impressive results, they struggle in highly dynamic scenes and challenging lighting conditions due to limitations like motion blur and low dynamic range. Event cameras, with their high temporal resolution and dynamic range, address these issues but provide sparse information and lack large annotated datasets, making depth estimation difficult.

This project introduces a novel approach to monocular depth estimation with event cameras by leveraging Vision Foundation Models (VFMs) trained on images. The method uses cross-modal distillation to transfer knowledge from image-based VFMs to event-based networks, utilizing spatially aligned data from devices like the DAVIS Camera. Additionally, the project adapts VFMs for event-based depth estimation, proposing both a direct adaptation and a new recurrent architecture. Experiments on synthetic and real datasets demonstrate competitive or state-of-the-art results without requiring expensive depth annotations.

Contributions:

  • A novel cross-modal distillation paradigm that leverages the robust proxy labels obtained from image-based VFMs for monocular depth estimation.

  • An adapting strategy to cast existing image-based VFMs into the event domain effortlessly.

  • A novel recurrent architecture based on an adapted image-based VFM.

  • Adapting VFMs to the event domain yields state-of-the-art performance, and our distillation paradigm is competitive against the supervision from depth sensors.

🖋️ If you find this code useful in your research, please cite:

@InProceedings{Bartolomei_2025_ICCV,
    author    = {Bartolomei, Luca and Mannocci, Enrico and Tosi, Fabio and Poggi, Matteo and Mattoccia, Stefano},
    title     = {Depth AnyEvent: A Cross-Modal Distillation Paradigm for Event-Based Monocular Depth Estimation},
    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
    month     = {October},
    year      = {2025},
}

📥 Pretrained Models

Here, you will be able to download the weights of VFMs for the event domain.

You can download our pretrained models here.

📝 Code

The Test section contains scripts to evaluate depth estimation on MVSEC and DSEC datasets.

Please refer to the section for detailed instructions on setup and execution.

Warning:

  • With the latest updates in PyTorch, slight variations in the quantitative results compared to the numbers reported in the paper may occur.

🛠️ Setup Instructions

  1. Dependencies: Ensure that you have installed all the necessary dependencies. The list of dependencies can be found in the ./requirements.txt file.
  2. Set scripts variables: Each script needs the path to the virtual environment (if any) and to the dataset. Please set those variables before running the script.
  3. Set config variables: Each JSON config file has a datapath key: update it accordingly to your environment.

💾 Datasets

We used two datasets for evaluation: MVSEC and DSEC.

MVSEC

Download the processed version of MVSEC here. Thanks to the authors of E2DEPTH for the amazing work.

Unzip the archives arranging them as shown in the data structure below:

MVSEC
├── test
│   ├── mvsec_dataset_day2
└── train
    ├── mvsec_outdoor_day1
    ├── mvsec_outdoor_night1
    ├── mvsec_outdoor_night2
    └── mvsec_outdoor_night3

DSEC

Download Images, Events, Disparities, and Calibration Files from the official website.

Unzip the archives, then you will get a data structure as follows:

DSEC
├── train
    ├── interlaken_00_c
    ...
    └── zurich_city_11_c

🚀 Test

To evaluate the tables in our paperuse this snippet:

bash scripts/test.sh

You should change the variables inside the script before launching it.

✉️ Contacts

For questions, please send an email to luca.bartolomei5@unibo.it

🙏 Acknowledgements

We would like to extend our sincere appreciation to the authors of the following projects for making their code available, which we have utilized in our work:

  • We would like to thank the authors of E2DEPTH for providing their code, which has been inspirational for our work.
  • We would like to thank the authors of DAv2 for providing their code and models, which has been inspirational for our work.

About

[ICCV 2025] Depth AnyEvent: A Cross-Modal Distillation Paradigm for Event-Based Monocular Depth Estimation

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published