Description
Hi @quark-zju, this is a super cool project. I would like to integrate it into my project at https://github.com/arxanas/git-branchless, which simulates the workflows at companies like Facebook. I had a few questions I was hoping you could help me with.
Data structures?
What data structures are used to implement the DAG? I found this remark:
See slides/201904-segmented-changelog/segmented-changelog.pdf for pretty graphs about how segments help with ancestry queries.
at https://docs.rs/esl01-dag/0.1.1/esl01_dag/struct.IdDag.html, but I didn't find the associated slides (are they publicly available?). What kind of performance can I expect for various operations?
What kind of correctness guarantees can I expect? If I query the DAG for a node which hasn't yet been observed, what happens? Can I use it in a multi-threaded or multi-process context?
How stable is the DAG API? To what degree can I rely on it?
Performance with reference updates?
The performance for initializing the DAG when running git-revs
is quite good on the repository I'm testing with (maybe 30 seconds, compared to minutes when running git commit-graph
instead, but I didn't even measure the time because it took so long). But subsequent invocations take two or three seconds at a minimum.
My guess is that it's because crawling all the references here: https://docs.rs/gitdag/0.1.2/src/gitdag/gitdag.rs.html#82. I think you mentioned somewhere in the documentation that it will be slow if there are a lot of references. In the case of git-branchless
, we keep track of the commit graph heads ourselves, and we don't care about remote references, so I should be able to significantly speed it up. However, I can't pass in my own GitDag
to this library. Should I change the API, and if so, what changes do you recommend?
Commit evolution?
git-branchless
implements its own commit evolution feature, not based on the reflog (see https://github.com/arxanas/git-branchless/wiki/Architecture). So I don't want the reflog-based commit evolution implementation here: https://github.com/quark-zju/gitrevset/blob/master/src/mutation.rs#L13. Similarly to the above, if I want to swap out the implementation for this behavior with my own, do I need to change the git-revset
API, and if so, what changes do you recommend?
add_heads_and_flush
For this function DagPersistent::add_heads_and_flush
: https://docs.rs/esl01-dag/0.2.1/esl01_dag/namedag/struct.NameDag.html#method.add_heads_and_flush, why does it care about the difference between master names and non-master names? git-branchless
relies on a main branch, but I don't see why the DAG itself cares about which branches are "main".