If you repeatedly clone a Git repository (for instance, for software you use to hack) you could annoyed by how long you have to wait for the clone operation, especially for large remote git repositories. On the other hand, if you have multiple copies of a Git repository on your computer, it can be a waste of space that you would prefer not to waste.
Git has a feature "--reference"/alternates, which allows you to reference commits from another local repo on the same computer.
An example use case is:
$ cd ~/test
$ git clone https://example.org/git/repo.git normal-repo
(you wait minutes or hours, and the resulting directory is big)
$ git clone --reference ~/test/repo https://example.org/git/repo.git referenced-repo
(you wait seconds, and the resulting directory is small - only the size of the files, no size used by the history)
Both clones can access the whole history of the Git repository.
Internally, the second clone contains a file .git/objects/info/alternates
containing the path of the first repo, which
tells git to search the referenced alternates
repo for any commits that can't be found in the current repo.
Obviously, if you delete the referenced (first) repo, the second copy is useless and will display tonnes of errors.
git-cache
creates a central cache directory on a computer, which contains most of the commits of regularly cloned Git
repositories.
If you are a regular hacker of a software, say MediaWiki, you can cache most commits and then only download the most recent ones.
Example use case:
$ sudo git cache init # cache is located in /var/cache/git-cache
$ git cache add mediawiki https://git.wikimedia.org/git/mediawiki/core.git
You can now use the cache directory with:
git clone --reference /var/cache/git-cache https://git.wikimedia.org/git/mediawiki/core.git
(1607 seconds, the cache takes 314 Mio)
$ git clone --reference /var/cache/git-cache https://git.wikimedia.org/git/mediawiki/core.git clone-1
(46 seconds, the directory takes 98 Mio, the .git subdirectory takes 17 Mio because there were new commits)
$ git clone --reference /var/cache/git-cache https://git.wikimedia.org/git/mediawiki/core.git clone-2
(58 seconds, the directory takes 98 Mio, the .git subdirectory takes 17 Mio because there were new commits)
Installation:
Copy git-cache
into your git commands directory (e.g. Ubuntu: /usr/lib/git-core) and make sure it is executable by all
users (mode 0755).
Usage: All commands have the format:
git cache ACTION [SUB-ACTION] PARAMETERS
Commands:
# General maintenance commands
git cache init [DIR] initialise the cache directory
git cache delete --force delete the cache directory
# Daily commands
git cache remote add NAME URL add a cached Git repository
git cache remote rm --force NAME remove a cached Git repository
git cache show [NAME] show cached Git repositoryies/repository
git cache update fetch all cached Git repositories
git cache clone NAME [DIR] clone a cached repository into a dir (as per `git clone --reference ...` above)
(Any other command will be applied to the cache directory,
e.g. `git cache gc` or `git cache remote show`.)
Location of the cache directory:
The default cache directory contains all cached repositories (each Git repository is a remote).
If it is created by the root
user, the cache directory is /var/cache/git-cache
, otherwise it is ~/.cache/git-cache
;
it can also be configured to use another directory if you specify this with the init
command.
Note: This directory could become big, be sure you have enough space.
You may want to create a cron-job to run (bug)git cache update
to automatically retrieve new commits.
This is an updated version by mexisme:
- fairly substantially refactored/restructured
- moved code into functions
- used case-blocks (instead of if-elseif trees)
- commands rearranged
- a few new commands
- a
remote
super-command - a
clone
command.
- a
- some bugs fixed
- README language slightly updated
TODO:
- Rethink some of the sub-commands and names; it would help if they matched common git usages/conventions
- Figure-out a faster way to do the intial copy-into-cache with a large cache
- Port away from Bash/Shell; it's a pain to deal effectively with certain types of failures
- Besides C, it looks like
sh
(notbash
) andperl
are directly expected/supported by git. There's a single reference to Ruby - I'd prefer Ruby, Rust or Go, but Perl wouldn't be a problem
- Besides C, it looks like
- Does it make sense to auto-gc the cache?
- Does it make sense to auto-update the cache contents? How can we make sure to only do this over cheap/fast links?
From the original README:
This is a first version, and feedback is needed to improve its daily usage. Some questions I wonder:
- Are the subcommands sufficiently explicit?
- Should we regularly run
git gc
or evengit gc --aggressive
? - How git behaves when some commits are both local and in the cache? Does it remove local objects to gain place?
- etc.
Internally the cache directory is a big bare directory containing all remote repositories, even if they do not share commits (sort of orphans branches). Should the cache directory be splitted by repository: /var/cache/git-cache/mediawiki, /var/cache/git-cache/visualeditor, etc. Particularly in the second case, the name of the remote repository is useless, and I would prefer use unique names, e.g. md5(URL) and completely hide this to the user (possibly with soft links from the URL to the directory for usability inside the cache directory).
This program is a generalisation to arbitrary Git repositories of an idea and implementation by Randy Fay specifically for Drupal. Hopefully this generalisation is sufficiently simple to stay useable and practical.
Similar program: git-cached by dvessel