This repository contains the official implementation of the paper:
Maximilian Luz*, Rohit Mohan*, Ahmed Rida Sekkat, Oliver Sawade, Elmar Matthes, Thomas Brox, and Abhinav Valada
*Equal contribution.IEEE International Conference on Robotics and Automation (ICRA) 2024
If you find our work useful, please consider citing our paper via BibTeX.
Optical flow estimation is very challenging in situations with transparent or occluded objects. In this work, we address these challenges at the task level by introducing Amodal Optical Flow, which integrates optical flow with amodal perception. Instead of only representing the visible regions, we define amodal optical flow as a multi-layered pixel-level motion field that encompasses both visible and occluded regions of the scene. To facilitate research on this new task, we extend the AmodalSynthDrive dataset to include pixel-level labels for amodal optical flow estimation. We present several strong baselines, along with the Amodal Flow Quality metric to quantify the performance in an interpretable manner. Furthermore, we propose the novel AmodalFlowNet as an initial step toward addressing this task. AmodalFlowNet consists of a transformer-based cost-volume encoder paired with a recurrent transformer decoder which facilitates recurrent hierarchical feature propagation and amodal semantic grounding. We demonstrate the tractability of amodal optical flow in extensive experiments and show its utility for downstream tasks such as panoptic tracking.
- Create conda environment:
conda create --name amodal-flow python=3.11
- Activate conda environment:
conda activate amodal-flow
- Install PyTorch:
conda install pytorch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 pytorch-cuda=11.7 cudatoolkit=11.7 -c pytorch -c nvidia
- Install OpenCV:
conda install opencv
- Install remaining dependencies:
pip install -r requirements.txt
This code has been developed and tested with PyTorch version 2.0.1 and CUDA version 11.7.
We use pre-trained FlowFormer++ weights to initialize the modal base network and use the AmodalSynthDrive dataset for amodal training. Therefore
- Download the FlowFormer++
sintel.pth
checkpoint and place it at./checkpoints/sintel.pth
. - Download the AmodalSynthDrive dataset and place it at
./datasets/AmSynthDrive
.
The final folder structure should look like this:
.
├── checkpoints
│ └── sintel.pth
├── datasets
│ ├── AmSynthDrive
│ │ ├── empty
│ │ ├── full
│ │ └...
│ └── amsynthdrive.json
└...
The configuration used for both evaluation and training can be found at ./configs/amsynthdrive.py
.
Training output will be saved in a subdirectory of the logs
directory.
Our model can be trained by running
python ./train_FlowFormer.py --name amsynthdrive --stage amsynthdrive --validation amsynthdrive
We train our model using 6 GPUs with 48 GB VRAM each.
For a given checkpoint logs/some_run/model.pth
, the model can be evaluated using
python ./evaluate_FlowFormer_tile.py --eval amsynthdrive_validation --model logs/some_run/model.pth
For evaluation, a single 12GB GPU is enough.
We will provide a pre-trained checkpoint soon.
-
The initial training, resuming from a pre-trained FlowFormer++ checkpoint, will generate a warning that the base checkpoint could not be loaded in strict mode. This is expected. The missing keys are the additional amodal decoder strands added on top of FlowFormer++.
-
While AmodalSynthDrive uses
.png
files for storing optical flow similar to KITTI, the scaling differs. Please refer toreadFlowKITTI()
incore/utils/frame_utils.py
on how to load AmodalSynthDrive flow files.
For academic use, code for AmodalFlowNet is released under the Apache License, following FlowFormer++. For any commercial usage, please contact the authors.
The code of this project is based on FlowFormer++. Subsequently, we use parts of:
In addition, this work was funded by the German Research Foundation (DFG) Emmy Noether Program grant No 468878300 and an academic grant from NVIDIA.