Skip to content

Toy version of Git/Mercurial built using recursion schemes. Implements lazy diffing and merging of merkle dir trees. Solves toy versions of some core problems in https://github.com/facebookexperimental/mononoke

License

Notifications You must be signed in to change notification settings

hermetique/hgit

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

69 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

HGIT

This repo implements a full git-like data structure (commits, dirs, blob trees, blobs) using recursion schemes and uses it to implement a minimal proof-of concept version control tool that supports functionality including lazy diffing of merkle dirs and lazy merging of branches. Many of the techniques (hash pointer based store, lazy deref, etc) used are applicable to any recursive data structure that uses Merkle-type hash pointer based indirection, eg merkle lists (blockchains), merkle trees (git, mercurial), merkle DAGs (IPFS) and merkle AST's (unison, if I understand it correctly).

Demo using the provided dockerfile:

> docker build -t pkinsky/hgit:demo .
> docker run -it pkinsky/hgit:demo bash

let's create an empty repo:

bash:/# mkdir repo
bash:/# cd repo
bash:/repo# hgit init
bash:/repo# tree .hgit
.hgit
|-- state
`-- store
bash:/repo# cat .hgit/state.json
{"branches":{"default":"z"},"currentBranch":"default"}

hgit init creates a .hgit directory with a store subdir that will be used to store various hash-identified objects (commits, dir trees, file blobs, etc) and a state file used to track repo state, which consists of the current branch and a map of branch names to commit pointers.

Repos start with the 'default' branch, which is initialized as pointer to the null commit (z (or zero) is used as a pointer to the null commit)

Ok, cool. Let's add some files to the repo.

bash:/repo# echo "totally a real project" > README.md
bash:/repo# echo "pls no infringe" > LICENSE

bash:/repo# hgit status
current branch: default
diffs:
	EntityCreated at LICENSE
	EntityCreated at README.md
bash:/repo# hgit commit "initial demo project"
bash:/repo# cat .hgit/state.json
{"branches":{"default":"xE9cQhCc0Vvk"},"substantiated":[],"currentBranch":"default"}

We've created a few simple files and commited them, updating the 'default' branch's commit pointer to xE9cQhCc0Vvk. Let's take a look at the contents of our store, starting with that commit.

bash:/repo# tree .hgit
.hgit
|-- state
`-- store
    |-- xE9cQhCc0Vvk
    |-- xfUffAHam2Ob
    |-- xodfMZhuFTpg
    `-- yvdB7xqeqRxi


bash:/repo# cat .hgit/store/xE9cQhCc0Vvk 
{"root":"xfUffAHam2Ob","name":"initial demo project","type":"commit","parents":["z"]}

bash:/repo# cat .hgit/store/xfUffAHam2Ob 
{"children":[ {"path":"LICENSE","pointer":"yvdB7xqeqRxi","type":"file"}
            , {"path":"README.md","pointer":"xodfMZhuFTpg","type":"file"}
            ],
"type":"dir"
}

bash:/repo# cat .hgit/store/yvdB7xqeqRxi 
{"contents":"pls no infringe\n","type":"blob"}

bash:/repo# cat .hgit/store/xodfMZhuFTpg 
{"contents":"totally a real project\n","type":"blob"}

so here we have our commit (xE9cQhCc0Vvk), containing a pointer to the null commit and a root directory with our LICENSE and README.md files.

Now we're going to create a branch to demonstrate diffing branches (you'll have to take my word for it, but this is all done lazily):

bash:/repo# hgit status
current branch: default
diffs:
bash:/repo# hgit branch diff-example
bash:/repo# mkdir foo
bash:/repo# touch foo/bar
bash:/repo# touch baz

bash:/repo# tree
|-- LICENSE
|-- README.md
|-- baz
`-- foo
    `-- bar

bash:/repo# hgit status
current branch: diff-example
diffs:
	EntityCreated at baz
	EntityCreated at foo
    
bash:/repo# hgit commit "diff example branch a"

that's our first branch, 'diff-example'. Now lets hop back over to 'default' and make some changes there, too.

bash:/repo# hgit checkout default
bash:/repo# touch foo
bash:/repo# mkdir baz
bash:/repo# touch baz/bar

bash:/repo# tree
|-- LICENSE
|-- README.md
|-- baz
|   `-- bar
`-- foo

bash:/repo# hgit status
current branch: default
diffs:
	EntityCreated at baz
	EntityCreated at foo
    
bash:/repo# hgit commit "diff example branch b"

bash:/repo# hgit diff default diff-example
	DirReplacedWithFile at baz
	FileReplacedWithDir at foo
bash:/repo# hgit diff diff-example default
	FileReplacedWithDir at baz
	DirReplacedWithFile at foo

Here you can see hgit diff handling both the file-replaced-with-dir and dir-replaced-with-file cases.

Now let's try merging two branches!

bash:/repo# hgit branch merge-example

bash:/repo# hgit status
current branch: merge-example
diffs:

bash:/repo# mkdir branchtest
bash:/repo# touch branchtest/a
bash:/repo# echo "same same" > b

bash:/repo# hgit status
current branch: merge-example
diffs:
	EntityCreated at branchtest
	EntityCreated at b
bash:/repo# tree
.
|-- LICENSE
|-- README.md
|-- b
|-- baz
|   `-- bar
|-- branchtest
|   `-- a
`-- foo

Here's our branch - now let's make some changes in the 'default' branch that, while touching the same files and directories, are non-conflicting.

bash:/repo# hgit checkout default

bash:/repo# echo "same same" > b
bash:/repo# mkdir branchtest
bash:/repo# touch branchtest/b

bash:/repo# hgit status
current branch: default
diffs:
	EntityCreated at branchtest
	EntityCreated at b
bash:/repo# tree
.
|-- LICENSE
|-- README.md
|-- b
|-- baz
|   `-- bar
|-- branchtest
|   `-- b
`-- foo

bash:/repo# hgit commit "merge example branch b"

These changes are non-conflicting - both branches have created a file at 'b' with the same contents, and added files with different names to the directory 'branchtest'. They can be merged.

bash:/repo# hgit status
current branch: default
diffs:

bash:/repo# hgit merge merge-example "merge test"
bash:/repo# tree
.
|-- LICENSE
|-- README.md
|-- b
|-- baz
|   `-- bar
|-- branchtest
|   |-- a
|   `-- b
`-- foo

Here's a counterexample, in which a merge fails due to a non-resolvable change:

bash:/repo# hgit branch merge-fail-example
bash:/repo# echo "mario" > character
bash:/repo# hgit commit "merge fail branch a"

in one branch, we will create a file, 'character', containing the string "mario".

bash:/repo# hgit checkout default
bash:/repo# echo "luigi" > character
bash:/repo# hgit commit "merge fail branch b"
bash:/repo# hgit merge merge-fail-example "failing merge"
hgit: user error (merge nonviable due to: MergeViolation {mergeViolationPath = ["character"]})

As expected, this merge fails - the algorithm has no way of knowing which character to pick.

DISCLAIMERS:

disclaimer: all hashes shown below are the result of some janky non-cryptographic hash function being converted into [a..z,A..Z,0..9]. This project now uses blake2 digests encoded as strings using base16 disclaimer: running this example will likely result in different hashes, as the hashes are derived from the serialized json structure of hgit objects which continues to change.

About

Toy version of Git/Mercurial built using recursion schemes. Implements lazy diffing and merging of merkle dir trees. Solves toy versions of some core problems in https://github.com/facebookexperimental/mononoke

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published