Commit b3e18dc
committed
extensions: add refFormat extension
Git's reference storage is critical to its function. Creating new
storage formats for references requires adding an extension. This
prevents third-party tools that do not understand that format from
operating incorrectly on the repository. This makes updating ref formats
more difficult than other optional indexes, such as the commit-graph or
multi-pack-index.
However, there are a number of potential ref storage enhancements that
are underway or could be created. Git needs an established mechanism for
coordinating between these different options.
The first obvious format update is the reftable format as documented in
Documentation/technical/reftable.txt. This format has much of its
implementation already in Git, but its connection as a ref backend is
not complete. This change is similar to some changes within one of the
patches intended for the reftable effort [1].
[1] https://lore.kernel.org/git/pull.1215.git.git.1644351400761.gitgitgadget@gmail.com/
However, this change makes a distinct strategy change from the one
recommended by reftable. Here, the extensions.refFormat extension is
provided as a multi-valued list. In the reftable RFC, the extension has
a single value, "files" or "reftable" and explicitly states that this
should not change after 'git init' or 'git clone'.
The single-valued approach has some major drawbacks, including the idea
that the "files" backend cannot coexist with the "reftable" backend at
the same time. In this way, it would not be possible to create a
repository that can write loose references and combine them into a
reftable in the background. With the multi-valued approach, we could
integrate reftable as a drop-in replacement for the packed-refs file and
allow that to be a faster way to do the integration since the test suite
would only need updates when the test is explicitly testing packed-refs.
When upgrading a repository from the "files" backend to the "reftable"
backend, it can help to have a transition period where both are present,
then finally removing the "files" backend after all loose refs are
collected into the reftable.
But the reftable is not the only approach available.
One obvious improvement could be a new file format version for the
packed-refs file. Its current plaintext-based format is inefficient due
to storing object IDs as hexadecimal representations instead of in
their raw format. This extra cost will get worse with SHA-256. In
addition, binary searches need to guess a position and scan to find
newlines for a refname entry. A structured binary format could allow for
more compact representation and faster access. Adding such a format
could be seen as "files-v2", but it is really "packed-v2".
The reftable approach has a concept of a "stack" of reftable files. This
idea would also work for a stack of packed-refs files (in v1 or v2
format). It would be helpful to describe that the refs could be stored
in a stack of packed-ref files independently of whether that is in file
format v1 or v2.
Even in these two options, it might be helpful to indicate whether or
not loose ref files are present. That is one reason to not make them
appear as "files-v2" or "files-v3" options in a single-valued extension.
Even as "packed-v2" or "packed-v3" options, this approach would require
third-party tools to understand the "v2" version if they want to support
the "v3" options. Instead, by splitting the format from the layout, we
can allow third-party tools to integrate only with the most-desired
format options.
For these reasons, this change is defining the extensions.refFormat
extension as well as how the two existing values interact. By default,
Git will assume "files" and "packed" in the list. If any other value
is provided, then the extension is marked as unrecognized.
Add tests that check the behavior of extensions.refFormat, both in that
it requires core.repositoryFormatVersion=1, and Git will refuse to work
with an unknown value of the extension.
There is a gap in the current implementation, though. What happens if
exactly one of "files" or "packed" is provided? The presence of only one
would imply that the other is not available. A later change can
communicate the list contents to the repository struct and then the
reference backend could ignore one of these two layers.
Specifically, having only "files" would mean that Git should not read or
write the packed-refs file and instead only read and write loose ref
files. By contrast, having only "packed" would mean that Git should not
read or write loose ref files and instead always update the packed-refs
file on every ref update.
Signed-off-by: Derrick Stolee <derrickstolee@github.com>1 parent ca339b0 commit b3e18dc
File tree
3 files changed
+57
-0
lines changed- Documentation/config
- t
3 files changed
+57
-0
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
7 | 7 | | |
8 | 8 | | |
9 | 9 | | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
10 | 35 | | |
11 | 36 | | |
12 | 37 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
577 | 577 | | |
578 | 578 | | |
579 | 579 | | |
| 580 | + | |
| 581 | + | |
| 582 | + | |
| 583 | + | |
| 584 | + | |
580 | 585 | | |
581 | 586 | | |
582 | 587 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
0 commit comments