|
| 1 | +# Why? |
| 2 | +The purpose is to separate big-file caching from revision-control. There are several alternatives: |
| 3 | + |
| 4 | + * https://github.com/jedbrown/git-fat |
| 5 | + * https://github.com/schacon/git-media |
| 6 | + * http://git-annex.branchable.com/ |
| 7 | + * https://github.com/github/git-lfs |
| 8 | + |
| 9 | +But all those impose the penalty of checksums on the large files. We assert that the large files can be uniquely derived from URLs, versioned in S3 or by filename, etc. We store only symlinks in the git repo. |
| 10 | + |
| 11 | +## Installing |
| 12 | +``` |
| 13 | +ln -sf `pwd`/git_sym.py ~/bin/git-sym |
| 14 | +``` |
| 15 | + |
| 16 | +## Running |
| 17 | +You can test it right here. |
| 18 | +``` |
| 19 | +touch ~/foo |
| 20 | +git-sym update links/foo |
| 21 | +# or |
| 22 | +# python git_sym.py update links/foo |
| 23 | +cat links/foo |
| 24 | +``` |
| 25 | +And this should fail: |
| 26 | +``` |
| 27 | +rm -f ~/git_sym_cache/foo ~/foo |
| 28 | +git-sym update |
| 29 | +``` |
| 30 | + |
| 31 | +## Adding your own large-files. |
| 32 | +``` |
| 33 | +git-sym add large1 large2 large3 |
| 34 | +git commit -m 'adding links' |
| 35 | +git-sym show |
| 36 | +``` |
| 37 | +**git-sym** will choose unique filenames based on checksums. But `git-sym add` is strictly for convenience. |
| 38 | +You are free to use your own filenames. Anything symlinked via `GIT_ROOT/.git_sym` will be update-able. |
| 39 | + |
| 40 | +Next, you might want to make those files available to other. |
| 41 | +You can then move those files out of GIT_SYM_CACHE_DIR and into Amazon S3, or an ftp site, or wherever. |
| 42 | +Just add rules to your `git_sym.makefile`. |
| 43 | + |
| 44 | +## Other useful commands |
| 45 | +``` |
| 46 | +git-sym show -h |
| 47 | +git-sym missing -h |
| 48 | +git-sym -h |
| 49 | +``` |
| 50 | + |
| 51 | +# Details |
| 52 | +## Typical usage |
| 53 | +You will store relative symlinks in your repo. They will point to a unique filename inside `ROOT/.git_sym/`, where ROOT is `../../` etc. |
| 54 | + |
| 55 | +`git-sym update` will search your repo for symlinks (unless you specify them on the command-line). For each, it will execute `ROOT/git_sym.makefile` in your `GIT_SYM_CACHE_DIR` (`~/git_sym_cache` by default). The makefile targets will be the basenames of the symlinks. |
| 56 | + |
| 57 | +If all those files are properly retrieved, then symlinks will be created with the same filenames inside `.git/git_sym`. `ROOT/.git_sym` will point at that. And all other symlinks will point *thru* `ROOT/.git_sym`. Thus, there are three (3) levels of indirection. |
| 58 | + |
| 59 | +## Makefile |
| 60 | +Someday, we will offer a plugin architecture. But for now, using a makefile is really very simple. Just create a rule for each unique filename. (You *are* using unique filenames, right?) You can run `wget`, `curl`, `ftp`, `rcp`, `rsync`, `aws-s3-get`, or whatever you want. The retrieval mechanism is decoupled from caching. |
| 61 | + |
| 62 | +You should try to ensure that you have a rule for every current symlink. Old rules for symlinks no longer in your repo are fine; they are simply ignored. |
| 63 | + |
| 64 | +To test your rules: |
| 65 | +``` |
| 66 | +export GIT_SYM_CACHE_DIR=~/mytest |
| 67 | +git-sym missing # should report something |
| 68 | +git-sym update |
| 69 | +git-sym missing |
| 70 | +``` |
| 71 | + |
| 72 | +## Other notes |
| 73 | +### Cache |
| 74 | +**git-sym** sets the mode to read-only for the cached files. These files should never change. You might want to name them after their own checksums. `git-sym add` can help you with that. |
| 75 | +### Submodules |
| 76 | +If your module can be used as a *submodule*, we cannot point at `.git/git_sym/` directly because for submodules `.git/` is not inside the tree. (The relative symlinks are constant, so they need to work no matter where `.git/` sits.) That is why we have *three* levels of indirection, in case your were wondering. (This is also why **git-annex** *fails* for submodules.) |
| 77 | + |
| 78 | +This is also why we write `ROOT/.git_sym`; it might be a different directory than `.git`. |
| 79 | + |
| 80 | +For submodule support, you will also need this: |
| 81 | +``` |
| 82 | +git config --global alias.gsexec '!exec ' |
| 83 | +``` |
| 84 | +We use that to learn the actual location of the `.git/` directory. If it fails, we try current directory, and if `.git` is not a directory there, we attempt to find it in `../.git/modules/REPO`, where REPO is the root directory. (This can fail in many ways. The alias never fails.) |
| 85 | + |
| 86 | +Again, we expect you to forget that, so we add that alias to your local repo for you. Believe us: It's a Good Thing. |
| 87 | + |
| 88 | +### .gitignore |
| 89 | +Since the intermediate symlink is also in the repo, but points to a changing target, it needs to be listed in `.gitignore`. (That anticipates both accidental `git add` and `git clean`.) We expect you to forget that important rule, so **git-sym** will detect its absence and add it to `.git/info/exclude` instead. No worries. |
| 90 | + |
| 91 | +### Complicated symlinks? |
| 92 | +We require a flat directory structure within `.git/git_sym`. If you need more files than your filesystem |
| 93 | +can handle, you're Doing It Wrong. Git will slow down anyway. |
| 94 | + |
| 95 | +However, we support symlinked *directories*, which can then be an entire tree in GIT_SYM_CACHE_DIR. That should |
| 96 | +satisfy all reasonable use-cases. |
| 97 | + |
| 98 | +# TODO |
| 99 | +* git-sym fix -- also fix broken links from moved cache, and missing links in GIT_SYM_DIR |
| 100 | +* Try `.gitattributes` instead of `.gitignore`, to avoid problems with `git clean`. |
| 101 | +* Add `git-submodule` support, to run `git-sym update` automatically. |
0 commit comments