Built off Git LFS, Git DRS allows you to store file contents outside of the Git repo such as in a gen3 bucket, while keeping a pointer to the file inside the repo. With Git DRS, data files that are traditionally too large to store in Git can be tracked along with your code in a single Git repo! And the best part: you can still use the same Git commands you know (and possibly love)! Using just a few extra command line tools, Git DRS helps consolidate your data and code into a single location.
Git DRS functions within Git, so you will only need a few extra commands (git-lfs pull
, git-drs init
, etc) that aren't the usual Git commands to do this. Git DRS primarily plugs in the following ways:
git add
: during each add, Git LFS processes your file and checks in a pointer to git.git commit
: before each commit, Git DRS creates a DRS object that stores the details of your file needed to push.git push
/git pull
: before each push, Git DRS handles the transfer of each committed filegit pull
: Git DRS pulls from the DRS server to your working directory if it doesn't already exists locally
- Download Git LFS (
brew install git-lfs
for Mac users) - Configure LFS on your machine
git lfs install --skip-smudge
- Download credentials from your data commons
- Login to your data commons
- Click your email in the top right to go to your profile
- Click Create API Key -> Download JSON
- Make note of the path that it downloaded to
- Download Git DRS
# build git-drs from source w/ custom gen3-client dependency git clone --recurse-submodule https://github.com/bmeg/git-drs.git cd git-drs go build # make the executable accessible export PATH=$PATH:$(pwd)
- Clone an existing DRS repo. If you don't already have one set up see "Setting up your repo"
cd .. # clone test repo git clone git@source.ohsu.edu:CBDS/git-drs-test-repo.git cd git-drs-test-repo
- Configure general acccess to your data commons
git drs init --profile <data-commons-name> --apiendpoint https://data-commons-name.com/ --cred /path/to/downloaded/credentials.json
When you do git drs init
, there are a couple things already set up for you...
.drs
directory to automatically store any background files and logs needed during execution- Git settings to sync up the git with gen3 services
- a gen3 profile is created for you so that you can access gen3
In your own repo, all you need to setup is a .drsconfig file. Once you have created a Git repo, create a .drs/config
with the following structure
{
"gen3Profile": "<gen3-profile-here>",
"gen3Project": "<project-id-here>",
"gen3Bucket": "<bucket-name-here>"
}
gen3Profile
stores the name of the profile you specified ingit drs init
(eg the<data-commons-name>
above)gen3Project
is the project ID uniquely describing the data from your project. This will be provided to you by a data commons administratorgen3Bucket
is the name of the bucket that you will be using to store all your files. This will also be provided by a data commons administrator
Track Specific File Types
Store all bam files as a pointer in the Git repo and store actual contents in the DRS server. This is handled by a configuration line in .gitattributes
git lfs track "*.bam"
git add .gitattributes
Pull Files Pull a single file
git lfs pull -I /path/to/file
Pull all non-localized files
git lfs pull