Welcome to the unity_synthetic_data data repository.
- Linux machine with NVIDIA GPU
- CUDA 12 compatible system
- Python 3.9 or higher
Preview the bucket content on our data visualization website. To gain access:
- Send
\invite-genjax <email>
to any channel in the MIT Probcomp Slack. - Wait 1-2 minutes for processing.
Note: Use the same Google account email associated with your GCP bucket access.
Activate the conda environment you plan to use for loading the data. You can do this by running:
conda activate your_environment_name
Run the Install Script
pip install "datasync @ git+ssh://git@github.com/probcomp/datasync.git"
The package stores data in BUCKET_DATA_PATH/unity_synthetic_data
.
By default, BUCKET_DATA_PATH
is set to /home/<user>/gcp_assets
. To use a different location, you can set the BUCKET_DATA_PATH
environment variable:
export BUCKET_DATA_PATH=/path/to/custom/directory
import os
os.environ['BUCKET_DATA_PATH'] = "/path/to/custom/directory"
set BUCKET_DATA_PATH=C:\path\to\custom\directory
$env:BUCKET_DATA_PATH = "C:\path\to\custom\directory"
The data_pull
command is used for downloading data from the GCP bucket into your local.
By default, the data_pull
command pulls all existing data in the GCP bucket into BUCKET_DATA_PATH/unity_synthetic_data
. If the local directory does not already exist, it will automatically be initialized.
Please note that the dataset exceeds 100GB, so ensure you have sufficient storage space.
By default, files with names that already exist in your local BUCKET_DATA_PATH/unity_synthetic_data
will not be overwritten with the version on the cloud, i.e. the download will be skipped for those files.
data_pull # pulls all bucket contents into local,
with no overwrite on already-existing filenames
To pull all existing data in the GCP bucket with overwrite into your local BUCKET_DATA_PATH/unity_synthetic_data
, use the -ow
flag.
Note that other than potential overwriting behavior enabled by -ow
, data_pull
never deletes files on your local directory; be sure to delete deprecated local files before data_push
'ing your local contents onto the bucket.
data_pull -ow # pulls all bucket contents into local,
with overwrite on already-existing filenames
Finally, to pull select items (files, directories) in the GCP bucket into your local BUCKET_DATA_PATH/unity_synthetic_data
, use the -fn
flag, followed by the path of the item relative to the root of the bucket.
All necessary intermediate directories will automatically be initialized if not already existing in your local BUCKET_DATA_PATH/unity_synthetic_data
.
For example, if you would like to pull toyroom/ballstriking/feature_track_data/lit_bg_200p.input.npz
,
data_pull -fn toyroom/ballstriking/feature_track_data/lit_bg_200p.input.npz
# populates BUCKET_DATA_PATH/unity_synthetic_data/toyroom/ballstriking/feature_track_data/lit_bg_200p.input.npz
For another example, if you would like to pull all items in primitives/bouncingcube
,
data_pull -fn unity/primitives/bouncingcube
# recursively populates BUCKET_DATA_PATH/unity_synthetic_data/primitives/bouncingcube
You can combine these flags. For example, if you would like to pull all items in primitives/bouncingcube
with overwrite enabled,
data_pull -ow -fn primitives/bouncingcube
# recursively populates BUCKET_DATA_PATH/unity_synthetic_data/primitives/bouncingcube,
with overwrite on already-existing filenames
All *.mp4 teasers in the bucket are rendered at 10 fps. The actual *.npz data contains videos at 30 fps.