|
| 1 | +# FuseNet |
| 2 | + |
| 3 | +This repository contains PyTorch implementation of FuseNet-SF5 architecture from the paper |
| 4 | +[FuseNet: incorporating depth into semantic segmentation via fusion-based CNN architecture](https://pdfs.semanticscholar.org/9360/ce51ec055c05fd0384343792c58363383952.pdf). |
| 5 | + |
| 6 | + |
| 7 | +## Installation |
| 8 | +Prerequisites: |
| 9 | +- python 3.6 |
| 10 | +- Nvidia GPU + CUDA cuDNN |
| 11 | +``` |
| 12 | +## Datasets |
| 13 | +
|
| 14 | +### [NYU-Depth V2](https://cs.nyu.edu/~silberman/datasets/nyu_depth_v2.html) |
| 15 | +- Simply, create a directory named datasets in the main project directory and in datasets directory download the preprocessed dataset, in HDF5 format, with 40 semantic-segmentation and 10 scene classes here: [train + test set](https://vision.in.tum.de/webarchive/hazirbas/fusenet-pytorch/nyu/nyu_class_10_db.h5) |
| 16 | +- Preprocessed dataset contains 1449 (train: 795, test: 654) RGB-D images with 320x240 resolution, their semantic-segmentation and scene-type annotations. |
| 17 | +- Depth image values have been normalized so that they fall into 0-255 range. |
| 18 | +
|
| 19 | +## Training |
| 20 | +- To train FuseNet, run `fusenet_train.py` by providing the path of the dataset. |
| 21 | +- If you would like to train a FuseNet model with the classification head, provide `--use_class True` |
| 22 | +- Example training commands can be found below. |
| 23 | +
|
| 24 | +### Training from scratch |
| 25 | +
|
| 26 | +``` |
| 27 | +python fusenet_train.py --dataroot ./datasets/nyu_class_10_db.h5 --batch_size 8 --lr 0.005 --num_epochs 125 |
| 28 | +``` |
| 29 | +
|
| 30 | +### Resuming training from a checkpoint |
| 31 | +
|
| 32 | +python fusenet_train.py --dataroot ./datasets/nyu_class_10_db.h5 --resume_train True --batch_size 8 \ |
| 33 | + --load_checkpoint ./checkpoints/may27_first_run/nyu/best_model.pth.tar --lr 0.005 --num_epochs 25 |
| 34 | +``` |
| 35 | + |
| 36 | +## Inference |
| 37 | +- Model's semantic segmentation performance on the given dataset will be evaluated in three accuracy measures: global pixel-wise classification accuracy, |
| 38 | +intersection over union, and mean accuracy. |
| 39 | +- vis_results is used to visualize the results on the test set |
| 40 | +- Example run command: |
| 41 | +python fusenet_test.py --dataroot ./datasets/nyu_class_10_db.h5 --load_checkpoint ./checkpoints/rgb_only/nyu/best_model.pth.tar --vis_results True |
| 42 | +``` |
| 43 | +
|
| 44 | +
|
| 45 | +
|
| 46 | +## Citing FuseNet |
| 47 | +Caner Hazirbas, Lingni Ma, Csaba Domokos and Daniel Cremers, _"FuseNet: Incorporating Depth into Semantic Segmentation via Fusion-based CNN Architecture"_, in proceedings of the 13th Asian Conference on Computer Vision, 2016. |
| 48 | +
|
| 49 | + @inproceedings{fusenet2016accv, |
| 50 | + author = "C. Hazirbas and L. Ma and C. Domokos and D. Cremers", |
| 51 | + title = "FuseNet: incorporating depth into semantic segmentation via fusion-based CNN architecture", |
| 52 | + booktitle = "Asian Conference on Computer Vision", |
| 53 | + year = "2016", |
| 54 | + month = "November", |
| 55 | + } |
0 commit comments