A Faster Implementation of Scan2Cap: Context-aware Dense Captioning in RGB-D Scans
Please refer to Scan2Cap for the data preparation and setup details.
For submission to the Scan2Cap benchmark, run the following script to generate predictions:
python benchmark/predict.py --config outputs/XYZ_MULTIVIEW_NORMAL/VOTENET_SCAN2CAP/info.yaml --test_split testPlease compress the benchmark_test.json as a .zip or .7z file and follow the instructions to upload your results.
Before submitting the results on the test set to the official benchmark, you can also benchmark the performance on the val set. Run the following script to generate GTs for val set first:
python scripts/build_benchmark_gt.py --split valNOTE: don't forget to change the
DATA_ROOTinscripts/build_benchmark_gt.py
Generate the predictions on val set:
python benchmark/predict.py --config outputs/XYZ_MULTIVIEW_NORMAL/VOTENET_SCAN2CAP/info.yaml --test_split valEvaluate the predictions on the val set:
python benchmark/eval.py --split val --path <path to predictions> --verboseRun the following script to start the end-to-end training of Scan2Cap model using the multiview features and normals. For more training options, please run scripts/train.py -h:
python scripts/train.py --config config/votenet_scan2cap.yamlThe trained model as well as the intermediate results will be dumped into outputs/<output_folder>. For evaluating the model (@0.5IoU), please run the following script and change the <output_folder> accordingly, and note that arguments must match the ones for training:
python scripts/eval.py --config outputs/XYZ_MULTIVIEW_NORMAL/VOTENET_SCAN2CAP/info.yaml --eval_captionEvaluating the detection performance:
python scripts/eval.py --config outputs/XYZ_MULTIVIEW_NORMAL/VOTENET_SCAN2CAP/info.yaml --eval_detectionYou can even evaluate the pretraiend object detection backbone:
If you found our work helpful, please kindly cite our paper via:
@inproceedings{chen2021scan2cap,
title={Scan2Cap: Context-aware Dense Captioning in RGB-D Scans},
author={Chen, Zhenyu and Gholami, Ali and Nie{\ss}ner, Matthias and Chang, Angel X},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={3193--3203},
year={2021}
}Scan2Cap is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.
Copyright (c) 2021 Dave Zhenyu Chen, Ali Gholami, Matthias Nießner, Angel X. Chang