[Project Page] [arXiv] [Video]
2.5D Visual Sound
Ruohan Gao1 and Kristen Grauman2
1UT Austin, 2Facebook AI Research
In Conference on Computer Vision and Pattern Recognition (CVPR), 2019
This repository (~100G) contains the FAIR-Play dataset we collected and used in our CVPR 2019 paper. It contains 1,871 video clips and their corresponding binaural audio clips recorded in a music room. The video clip and binaural clip of the same index are roughly aligned. The splits directory contains the 10 random splits used in the paper. See PseudoBinaural for 5 more challenging splits, where there are no or less scene overlap in the training and testing splits. The code is shared at 2.5D Visual Sound Code.
- The dataset can be downloaded by cloning the repository uisng git lfs:
brew install git-lfs
git lfs clone git@github.com:facebookresearch/FAIR-Play.git
git lfs install
git lfs pull
- If you have trouble in downloading the dataset through GitHub, you can also download it using the following commands:
wget http://dl.fbaipublicfiles.com/FAIR-Play/videos.tar.gz
wget http://dl.fbaipublicfiles.com/FAIR-Play/audios.tar.gz
wget http://dl.fbaipublicfiles.com/FAIR-Play/splits.tar.gz
- The dataset is also shared at UT Box.
If you find our data or project useful in your research, please cite:
@inproceedings{gao2019visualsound,
title={2.5D Visual Sound},
author={Gao, Ruohan and Grauman, Kristen},
booktitle={CVPR},
year={2019}
}
We would like to thank Tony Miller, Jacob Donley, Pablo Hoffmann and Vladimir Tourbabin from Facebook for helpful discussions and the volunteers who participate in our data collection.
FAIR-Play is CC BY 4.0 licensed, as found in the LICENSE file.