Official Pytorch implementation of FNAC.
Our paper is accpeted to CVPR 2023:
Learning Audio-Visual Source Localization via False Negative Aware Contrastive Learning
Setup environment:
pip install -r requirements.txt
Data can be downloaded from Learning to localize sound sources
Data can be downloaded from Localizing Visual Sounds the Hard Way
Data can be downloaded from Unheard and Heard
For training FNAC on Flickr: set the dataset path, train set ('flickr_10k' or 'flickr_144k'), and experiment name accordingly and run:
bash train_flickr.sh
For evaluation: set the dataset path, test set, and experiment name accordingly and run:
bash test_flickr.sh
You can follow the same implementation to train and test on VGGSS and heard&unheard
Train Set | Test Set | VGG-SS CIoU | VGG_SS AUC | url |
---|---|---|---|---|
Flickr 10k | VGG-SS | 35.27 | 38.00 | checkpoints |
Flickr 144k | VGG-SS | 33.93 | 37.29 | checkpoints |
VGG-Sound 10k | VGG-SS | 37.29 | 38.99 | checkpoints |
VGG-Sound 144k | VGG-SS | 39.50 | 39.66 | checkpoints |
If you find our work useful, please cite our paper:
@article{sun2023learning,
title={Learning Audio-Visual Source Localization via False Negative Aware Contrastive Learning},
author={Sun, Weixuan and Zhang, Jiayi and Wang, Jianyuan and Liu, Zheyuan and Zhong, Yiran and Feng, Tianpeng and Guo, Yandong and Zhang, Yanhao and Barnes, Nick},
journal={arXiv preprint arXiv:2303.11302},
year={2023}
}
We thank EZ-VSL for their great codebase.