This dataset is used to train Spatial Language Integrating Model (SLIM) in [1]. It consists of virtual scenes with ten views for each scene. Each scene consists of two or three objects placed on a square walled room. Each view is represented by an image, and synthetic or natural language descriptions. View images are 3D pictures rendered from a particular scene from ten different camera viewpoints.
gustil
need to be installed to download the dataset. The dataset is available
from here. Because the
dataset is huge, about 600 GB, I downloaded the whole dataset except the
synthetic_data/train
data.
Use the command below to download the dataset.
gsutil -m cp -c -L manifest.log -r \
"gs://slim-dataset/turk_data/" \
./<DATASET_FOLDER>/
After downloading is finished, make sure to manually create test
, valid
,
and train
directories under the turk_data
and move the respective files
under them. The dataset directory should be as shown below.
<DATASET_FOLDER>
└── synthetic_data
└── turk_data
├── test
├── train
└── valid
Dataset files are in tfrecord
fromat. TFRecord file format is a binary
storage format which are optimized to be used with Tensorflow. As I will use
pyTorch, the dataset files are converted to pt.gz
format. To convert the
dataset use the following command:
./convert_slim_dataset.sh <absolute_path/to/dataset_folder>
If you want to use the default value, just use: ./convert_slim_dataset.sh
After conversion, dataset directory will be as follows:
<DATASET_FOLDER>
├── synthetic_data
│ └── turk_data
│ ├── test
│ ├── train
│ └── valid
└── turk_data_torch
├── test
├── train
└── valid
[1] Ramalho, T., Kočiský, T., Besse, F., Eslami, S. M., Melis, G., Viola, F., ... & Hermann, K. M. (2018). Encoding spatial relations from natural language. arXiv preprint arXiv:1807.01670.