FasterCaptioning

Introduction

A Faster Implementation of Scan2Cap: Context-aware Dense Captioning in RGB-D Scans

Setup

Please refer to Scan2Cap for the data preparation and setup details.

🌟 Benchmark Toolbox 🌟

For submission to the Scan2Cap benchmark, run the following script to generate predictions:

python benchmark/predict.py --config outputs/XYZ_MULTIVIEW_NORMAL/VOTENET_SCAN2CAP/info.yaml --test_split test

Please compress the benchmark_test.json as a .zip or .7z file and follow the instructions to upload your results.

Local Benchmarking on Val Set

Before submitting the results on the test set to the official benchmark, you can also benchmark the performance on the val set. Run the following script to generate GTs for val set first:

python scripts/build_benchmark_gt.py --split val

NOTE: don't forget to change the DATA_ROOT in scripts/build_benchmark_gt.py

Generate the predictions on val set:

python benchmark/predict.py --config outputs/XYZ_MULTIVIEW_NORMAL/VOTENET_SCAN2CAP/info.yaml --test_split val

Evaluate the predictions on the val set:

python benchmark/eval.py --split val --path <path to predictions> --verbose

Usage

End-to-End training for 3D dense captioning

Run the following script to start the end-to-end training of Scan2Cap model using the multiview features and normals. For more training options, please run scripts/train.py -h:

python scripts/train.py --config config/votenet_scan2cap.yaml

The trained model as well as the intermediate results will be dumped into outputs/<output_folder>. For evaluating the model (@0.5IoU), please run the following script and change the <output_folder> accordingly, and note that arguments must match the ones for training:

python scripts/eval.py --config outputs/XYZ_MULTIVIEW_NORMAL/VOTENET_SCAN2CAP/info.yaml --eval_caption

Evaluating the detection performance:

python scripts/eval.py --config outputs/XYZ_MULTIVIEW_NORMAL/VOTENET_SCAN2CAP/info.yaml --eval_detection

You can even evaluate the pretraiend object detection backbone:

Citation

If you found our work helpful, please kindly cite our paper via:

@inproceedings{chen2021scan2cap,
  title={Scan2Cap: Context-aware Dense Captioning in RGB-D Scans},
  author={Chen, Zhenyu and Gholami, Ali and Nie{\ss}ner, Matthias and Chang, Angel X},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={3193--3203},
  year={2021}
}

License

Scan2Cap is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
benchmark		benchmark
config		config
lib		lib
models		models
pretrained		pretrained
scripts		scripts
slurm		slurm
utils		utils
.gitignore		.gitignore
README.md		README.md
data		data

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Repository files navigation

FasterCaptioning

Introduction

Setup

🌟 Benchmark Toolbox 🌟

Local Benchmarking on Val Set

Usage

End-to-End training for 3D dense captioning

Citation

License

About

Uh oh!

Releases

Packages

Languages

Uh oh!

Uh oh!

daveredrum/faster-captioning

Folders and files

Latest commit

History

Repository files navigation

FasterCaptioning

Introduction

Setup

🌟 Benchmark Toolbox 🌟

Local Benchmarking on Val Set

Usage

End-to-End training for 3D dense captioning

Citation

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages