Description
Note that we handle both crates here as they are very much intertwined. git-index
handles the data structure to accelerate operations in the git-worktree
for actually manipulating the working copy.
Tasks for checkout
- checkout entire index OR checkout from tree (in case the index doesn't exist yet).
- This would allow the index to be created while we are checking out, maybe there is a benefit.
- probing of the file system - needed to set configuration options accordingly
- correctness
- handle case-insensitive file systems (those that apply folding rules) #341
- checkout: Create directories similar to and consider using a cache #343
- filemode
- symlinks (use symlink crate instead (or factor out symlink related code at least)).
- explicit
fclose()
without performance loss due to silent and implicit close on drop.
- better handling of delta-ref bases #344
- basic checkout parallelism #346 (see performance article)
- a way to parallelize with less contention #352
- when we get into the sub-second area the parallelization costs get higher than doing the actual work - but we couldn't make it faster with different parallelization, tried non-blockwise
- git-ignore and git-attributes access
- parse git-ignore files #359
- parse git-attribute files #360
- glob/wildcard support
- worktree stack supports excludes #397
- attributes matching #400
- gix-worktree integration (
fs::Cache
) and support for various sources. attributes for worktree-cache #818 - attributes integration via
gix index entries -a
with bare supportgix index entries
with attributes listing #830 - A test tool to collect gitattribute and maybe gitignore information from real-world git repos along with the replies by
git check-attr
to have baselines against which to test our implementation. -
gix attributes query
#846
- Filters
- internal format conversion, like line feeds, etc., including
woring-tree-encoding
. Viable crate is encoding_rs - support for smudge and clean filters (which would certainly require access to git-attributes)
- support for delayed filters
- figure out how this affects diffs of changed files - they should probably apply worktree conversions beforehand, right? - For now it's fine to no run filters when diffing files between diffs, because they are all in-git and thus normalized.
- when reading attributes and ignore files from index, would filters affect them significantly and should these be applied?
ProbablyNo, actually, filters aren't applied to the files the control them.
- internal format conversion, like line feeds, etc., including
- Other Specialities
- how to handle special filesystem support?
- What about
precomposeUnicode
on MacOS? Some path conversion for more compatibility, we should probably do that too. Out-scope if it belongs elsewhere. This means that all paths going intogitoxide
need to be turned into precomposed forms.- See if
clap
can be initialized from precomposed unicode OsStrings instead, a feature ultimately to be provided togix
users.
- See if
- non-exclusive checkout
- there is a lot of logic to be researched to do the right thing when supporting checkouts into a populated working tree.
- path-specs are needed to properly define what to checkout, even though some convenience APIs might exist to allow checkouts of individual files (i.e. more program driven checkouts)
gix-index
towards 1.0 #293
Reset
-
gix
reset withsoft/mixed/hard/merge/keep
semantics with pathspecs as well. Submodule support should be possible, too. -
gix-worktree-state
reset to reset a working tree according to to an index, with pathspec support. - reset index to match tree based on pathspecs.
Out of scope
- hunk support (i.e.
git reset -p
)
Tasks for add
Add files to the index.
- correctness
- a properly implemented
git-pathspec
- precompose unicode
- special filesystem protections
- a properly implemented
- index to tree
- See if Add a user-friendly way to build trees #924 can be tackled while at the topic of generating trees efficiently.
Tasks for commit
- create tree from index
- create commit
- round-trippable reads and writes (write all index extensions to not degenerate information)
Tasks for fetch/clone
- create index from tree
- can there be an optimization that keeps what didn't change?
Tasks for status
The difference between an index and the work tree. Analysis TBD.
See this blog post for incredible details on how git does things, related to fs-monitor as well.
There is also an alternative implementation which provides a lot of details on how to be better.
@pascalkuthe did a first analysis and concluded that most of the speedup came through congestion-free multi-threading and the usage of something like the untracked-cache. On Linux, it's possible to also speedup syscalls using more specific versions of it, but that should definitely be left as last resort for performance improvements.
Stages
- determine unstaged changes (Diff between worktree and index #805)
- changes between worktree and index
- needs one stat call per file one way or another.
- Question: what's faster:
walkdir
orsymlink_metadata
per index entry? Note thatwalkdir
doesn't use `` - rename/copy tracking - should be based on tree-tree rename tracking, can it be generalized?
- assure status works with
file_size >= u32::MAX
- currently it's acknowledged in the documentation but there is no test for that, nor is it clear how this works in git.
- determine staged changes
- compare tree entries with index entries
- Question: is there a way to avoid having to traverse a tree recursively? Yes, use the
TREE
extension to know the dir ids of all entries, which allows to reproduce the trees and see if they changed, and only if so we lookup the tree itself.
- find untracked files
- can use
untracked-cache
to be faster. Could be coming 'for free' if walkdir would be used
- can use
- fast is-dirty checks - and wiring that up to
describe
Checkout Research
- what git does to checkout a single entry - it's the foundation for anything being checked out, I think.
Follow Ups
- symlink wait for 1.0 release with additional fixes (see thread on MR)
- need to use
remove_symlink()
from this crate, but can't use it for relative paths due to the filename check
- need to use