Skip to content

probcomp/datasync

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

71 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DataSync

Welcome to the unity_synthetic_data data repository.

Requirements

  • Linux machine with NVIDIA GPU
  • CUDA 12 compatible system
  • Python 3.9 or higher

Website

Preview the bucket content on our data visualization website. To gain access:

  1. Send \invite-genjax <email> to any channel in the MIT Probcomp Slack.
  2. Wait 1-2 minutes for processing.

Note: Use the same Google account email associated with your GCP bucket access.

Installation instructions

Activate the conda environment you plan to use for loading the data. You can do this by running:

conda activate your_environment_name

Run the Install Script

pip install "datasync @ git+ssh://git@github.com/probcomp/datasync.git"

Configuring the Local Data Directory (Optional)

The package stores data in BUCKET_DATA_PATH/unity_synthetic_data. By default, BUCKET_DATA_PATH is set to /home/<user>/gcp_assets. To use a different location, you can set the BUCKET_DATA_PATH environment variable:

Unix-based Systems (Linux, macOS)

export BUCKET_DATA_PATH=/path/to/custom/directory

Python and Jupyter

import os
os.environ['BUCKET_DATA_PATH'] = "/path/to/custom/directory"

Windows (Command Prompt)

set BUCKET_DATA_PATH=C:\path\to\custom\directory

Windows (PowerShell)

$env:BUCKET_DATA_PATH = "C:\path\to\custom\directory"

Download data

The data_pull command is used for downloading data from the GCP bucket into your local.

Download all bucket data (disable overwrite)

By default, the data_pull command pulls all existing data in the GCP bucket into BUCKET_DATA_PATH/unity_synthetic_data. If the local directory does not already exist, it will automatically be initialized. Please note that the dataset exceeds 100GB, so ensure you have sufficient storage space. By default, files with names that already exist in your local BUCKET_DATA_PATH/unity_synthetic_data will not be overwritten with the version on the cloud, i.e. the download will be skipped for those files.

data_pull   # pulls all bucket contents into local,
              with no overwrite on already-existing filenames

Download all bucket data (enable overwrite)

To pull all existing data in the GCP bucket with overwrite into your local BUCKET_DATA_PATH/unity_synthetic_data, use the -ow flag. Note that other than potential overwriting behavior enabled by -ow, data_pull never deletes files on your local directory; be sure to delete deprecated local files before data_push'ing your local contents onto the bucket.

data_pull -ow   # pulls all bucket contents into local,
                  with overwrite on already-existing filenames

Download select bucket data items

Finally, to pull select items (files, directories) in the GCP bucket into your local BUCKET_DATA_PATH/unity_synthetic_data, use the -fn flag, followed by the path of the item relative to the root of the bucket. All necessary intermediate directories will automatically be initialized if not already existing in your local BUCKET_DATA_PATH/unity_synthetic_data.

For example, if you would like to pull toyroom/ballstriking/feature_track_data/lit_bg_200p.input.npz,

data_pull -fn toyroom/ballstriking/feature_track_data/lit_bg_200p.input.npz  
# populates BUCKET_DATA_PATH/unity_synthetic_data/toyroom/ballstriking/feature_track_data/lit_bg_200p.input.npz

For another example, if you would like to pull all items in primitives/bouncingcube,

data_pull -fn unity/primitives/bouncingcube  
# recursively populates BUCKET_DATA_PATH/unity_synthetic_data/primitives/bouncingcube

You can combine these flags. For example, if you would like to pull all items in primitives/bouncingcube with overwrite enabled,

data_pull -ow -fn primitives/bouncingcube   
# recursively populates BUCKET_DATA_PATH/unity_synthetic_data/primitives/bouncingcube,
  with overwrite on already-existing filenames

Note on videos in the bucket

All *.mp4 teasers in the bucket are rendered at 10 fps. The actual *.npz data contains videos at 30 fps.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •