Skip to content

rough FSCK for git-odb::Store #290

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Jan 2, 2022
Merged

rough FSCK for git-odb::Store #290

merged 11 commits into from
Jan 2, 2022

Conversation

Byron
Copy link
Member

@Byron Byron commented Jan 2, 2022

Verify integrity for the entire store, essentially calling all 'verify_integrity()' implementations of the underlying file types like indices and packs, allowing to deeply checking each and every object.

Note that this doesn't include connectivity checking until it's clear how that should best be implemented so that it can run fast on multi-core machines and to avoid recreating something like 'counting objects' which definitely isn't en-par even for a single core.

However, having a solid way to handle dependencies or to multi-thread it would certainly help with eventually implementing a conversion from SHA1 to SHA256.

Tasks

  • verify_integrity() for Store
    • docs
  • verify_integrity() for loose ODBs
    • docs
  • top-level fsck CLI command to one day also check connectivity and refs (aka reachability/dangling objects)
  • keep track of ways to use the existing highly-parallel pack traversal (along with integration of loose-objects) to build an inverse-ref table to quickly traverse objects bottom-up to change the hash used along with all references, while being fast. This ties into being able to build new packs quickly, ideally even with delta-compression (the latter then has to be re-created as most objects actually change) - re-using deltas for blobs is the only way.

Byron added 11 commits January 2, 2022 14:10
Previously they used many different ways of handling their parameters
despite all boiling down to calling the same 'index::File::traverse()`
method.

This allows for more reuse of `Options` structs and generally makes
clearer how these optinos are used.
That way, even empty slots can be identified as 'changed' compared to
some older index even if they are empty.

This makes it consistent, as indices refer to occupied slots only and
protect themselves from changes using the generation flag, which usually
comes first.
Re-use the index::integrity::Options for multi_index options as it's
exactly the same.
This opportunty was missed the previous we simplified these signatures.
@Byron Byron changed the title FSCK for git-odb::Store rough FSCK for git-odb::Store Jan 2, 2022
@Byron Byron mentioned this pull request Jan 2, 2022
4 tasks
@Byron Byron merged commit 80a4a7a into main Jan 2, 2022
@Byron Byron deleted the store-fsck branch January 10, 2022 08:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant