Skip to content

Commit 3b269f1

Browse files
committed
Merge branch 'ds/packed-refs-v2' into seen
* ds/packed-refs-v2: (30 commits) refs: skip hashing when writing packed-refs v2 p1401: create performance test for ref operations ci: run GIT_TEST_PACKED_REFS_VERSION=2 in some builds t*: skip packed-refs v2 over http tests t3210: require packed-refs v1 for some tests t5502: add PACKED_REFS_V1 prerequisite t5312: allow packed-refs v2 format t1409: test with packed-refs v2 packed-backend: create GIT_TEST_PACKED_REFS_VERSION packed-refs: write prefix chunks packed-refs: read optional prefix chunks packed-refs: read file format v2 packed-refs: write file format version 2 packed-backend: create shell of v2 writes config: add config values for packed-refs v2 packed-backend: create abstraction for writing refs packed-backend: extract iterator/updates merge packed-backend: extract add_write_error() refs: extract packfile format to new file chunk-format: parse trailing table of contents ...
2 parents b532c4c + 1199470 commit 3b269f1

37 files changed

+2157
-695
lines changed

Documentation/config.txt

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -493,6 +493,8 @@ include::config/rebase.txt[]
493493

494494
include::config/receive.txt[]
495495

496+
include::config/refs.txt[]
497+
496498
include::config/remote.txt[]
497499

498500
include::config/remotes.txt[]

Documentation/config/extensions.txt

Lines changed: 72 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,69 @@ Note that this setting should only be set by linkgit:git-init[1] or
77
linkgit:git-clone[1]. Trying to change it after initialization will not
88
work and will produce hard-to-diagnose issues.
99

10+
extensions.refFormat::
11+
Specify the reference storage mechanisms used by the repoitory as a
12+
multi-valued list. The acceptable values are `files` and `packed`.
13+
If not specified, the list of `files` and `packed` is assumed. It
14+
is an error to specify this key unless `core.repositoryFormatVersion`
15+
is 1.
16+
+
17+
As new ref formats are added, Git commands may modify this list before and
18+
after upgrading the on-disk reference storage files. The specific values
19+
indicate the existence of different layers:
20+
+
21+
--
22+
`files`;;
23+
When present, references may be stored as "loose" reference files
24+
in the `$GIT_DIR/refs/` directory. The name of the reference
25+
corresponds to the filename after `$GIT_DIR` and the file contains
26+
an object ID as a hexadecimal string. If a loose reference file
27+
exists, then its value takes precedence over all other formats.
28+
29+
`packed`;;
30+
When present, references may be stored as a group in a
31+
`packed-refs` file in its version 1 format. When grouped with
32+
`"files"` or provided on its own, this file is located at
33+
`$GIT_DIR/packed-refs`. This file contains a list of distinct
34+
reference names, paired with their object IDs. When combined with
35+
`files`, the `packed` format will only be used to group multiple
36+
loose object files upon request via the `git pack-refs` command or
37+
via the `pack-refs` maintenance task.
38+
39+
`packed-v2`;;
40+
When present, references may be stored as a group in a
41+
`packed-refs` file in its version 2 format. This file is in the
42+
same position and interacts with loose refs the same as when the
43+
`packed` value exists. Both `packed` and `packed-v2` must exist to
44+
upgrade an existing `packed-refs` file from version 1 to version 2
45+
or to downgrade from version 2 to version 1. When both are
46+
present, the `refs.packedRefsVersion` config value indicates which
47+
file format version is used during writes, but both versions are
48+
understood when reading the file.
49+
--
50+
+
51+
The following combinations are supported by this version of Git:
52+
+
53+
--
54+
`files` and (`packed` and/or `packed-v2`);;
55+
This set of values indicates that references are stored both as
56+
loose reference files and in the `packed-refs` file. Loose
57+
references are preferred, and the `packed-refs` file is updated
58+
only when deleting a reference that is stored in the `packed-refs`
59+
file or during a `git pack-refs` command.
60+
+
61+
The presence of `packed` and `packed-v2` specifies whether the `packed-refs`
62+
file is allowed to be in its v1 or v2 formats, respectively. When only one
63+
is present, Git will refuse to read the `packed-refs` file that do not
64+
match the expected format. When both are present, the `refs.packedRefsVersion`
65+
config option indicates which file format is used during writes.
66+
67+
`files`;;
68+
When only this value is present, Git will ignore the `packed-refs`
69+
file and refuse to write one during `git pack-refs`. All references
70+
will be read from and written to loose reference files.
71+
--
72+
1073
extensions.worktreeConfig::
1174
If enabled, then worktrees will load config settings from the
1275
`$GIT_DIR/config.worktree` file in addition to the
@@ -21,10 +84,15 @@ When enabling `extensions.worktreeConfig`, you must be careful to move
2184
certain values from the common config file to the main working tree's
2285
`config.worktree` file, if present:
2386
+
24-
* `core.worktree` must be moved from `$GIT_COMMON_DIR/config` to
25-
`$GIT_COMMON_DIR/config.worktree`.
26-
* If `core.bare` is true, then it must be moved from `$GIT_COMMON_DIR/config`
27-
to `$GIT_COMMON_DIR/config.worktree`.
87+
--
88+
`core.worktree`;;
89+
This config value must be moved from `$GIT_COMMON_DIR/config` to
90+
`$GIT_COMMON_DIR/config.worktree`.
91+
92+
`core.bare`;;
93+
If true, then this value must be moved from
94+
`$GIT_COMMON_DIR/config` to `$GIT_COMMON_DIR/config.worktree`.
95+
--
2896
+
2997
It may also be beneficial to adjust the locations of `core.sparseCheckout`
3098
and `core.sparseCheckoutCone` depending on your desire for customizable

Documentation/config/index.txt

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,3 +30,11 @@ index.version::
3030
Specify the version with which new index files should be
3131
initialized. This does not affect existing repositories.
3232
If `feature.manyFiles` is enabled, then the default is 4.
33+
34+
index.computeHash::
35+
When enabled, compute the hash of the index file as it is written
36+
and store the hash at the end of the content. This is enabled by
37+
default.
38+
+
39+
If you disable `index.computHash`, then older Git clients may report that
40+
your index is corrupt during `git fsck`.

Documentation/config/refs.txt

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
refs.packedRefsVersion::
2+
Specifies the file format version to use when writing a `packed-refs`
3+
file. Defaults to `1`.
4+
+
5+
The only other value currently allowed is `2`, which uses a structured file
6+
format to result in a smaller `packed-refs` file. In order to write this
7+
file format version, the repository must also have the `packed-v2` extension
8+
enabled. The most typical setup will include the
9+
`core.repositoryFormatVersion=1` config value and the `extensions.refFormat`
10+
key will have three values: `files`, `packed`, and `packed-v2`.
11+
+
12+
If `extensions.refFormat` has the value `packed-v2` and not `packed`, then
13+
`refs.packedRefsVersion` defaults to `2`.

Documentation/gitformat-chunk.txt

Lines changed: 23 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -24,8 +24,9 @@ how they use the chunks to describe structured data.
2424

2525
A chunk-based file format begins with some header information custom to
2626
that format. That header should include enough information to identify
27-
the file type, format version, and number of chunks in the file. From this
28-
information, that file can determine the start of the chunk-based region.
27+
the file type, format version, and (optionally) the number of chunks in
28+
the file. From this information, that file can determine the start of the
29+
chunk-based region.
2930

3031
The chunk-based region starts with a table of contents describing where
3132
each chunk starts and ends. This consists of (C+1) rows of 12 bytes each,
@@ -51,8 +52,27 @@ The final entry in the table of contents must be four zero bytes. This
5152
confirms that the table of contents is ending and provides the offset for
5253
the end of the chunk-based data.
5354

55+
The default chunk format assumes the table of contents appears at the
56+
beginning of the file (after the header information) and the chunks are
57+
ordered by increasing offset. Alternatively, the chunk format allows a
58+
table of contents that is placed at the end of the file (before the
59+
trailing hash) and the offsets are in descending order. In this trailing
60+
table of contents case, the data in order looks instead like the following
61+
table:
62+
63+
| Chunk ID (4 bytes) | Chunk Offset (8 bytes) |
64+
|--------------------|------------------------|
65+
| 0x0000 | OFFSET[C+1] |
66+
| ID[C] | OFFSET[C] |
67+
| ... | ... |
68+
| ID[0] | OFFSET[0] |
69+
70+
The concrete file format that uses the chunk format will mention that it
71+
uses a trailing table of contents if it uses it. By default, the table of
72+
contents is in ascending order before all chunk data.
73+
5474
Note: The chunk-based format expects that the file contains _at least_ a
55-
trailing hash after `OFFSET[C+1]`.
75+
trailing hash after either `OFFSET[C+1]` or the trailing table of contents.
5676

5777
Functions for working with chunk-based file formats are declared in
5878
`chunk-format.h`. Using these methods provide extra checks that assist

Makefile

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1121,6 +1121,8 @@ LIB_OBJS += refs/debug.o
11211121
LIB_OBJS += refs/files-backend.o
11221122
LIB_OBJS += refs/iterator.o
11231123
LIB_OBJS += refs/packed-backend.o
1124+
LIB_OBJS += refs/packed-format-v1.o
1125+
LIB_OBJS += refs/packed-format-v2.o
11241126
LIB_OBJS += refs/ref-cache.o
11251127
LIB_OBJS += refspec.o
11261128
LIB_OBJS += remote.o

cache.h

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1131,6 +1131,8 @@ struct repository_format {
11311131
int hash_algo;
11321132
int sparse_index;
11331133
char *work_tree;
1134+
int ref_format_count;
1135+
enum ref_format_flags ref_format;
11341136
struct string_list unknown_extensions;
11351137
struct string_list v1_only_extensions;
11361138
};

chunk-format.c

Lines changed: 92 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@ struct chunk_info {
1313
chunk_write_fn write_fn;
1414

1515
const void *start;
16+
off_t offset;
1617
};
1718

1819
struct chunkfile {
@@ -56,38 +57,59 @@ void add_chunk(struct chunkfile *cf,
5657
cf->chunks_nr++;
5758
}
5859

59-
int write_chunkfile(struct chunkfile *cf, void *data)
60+
int write_chunkfile(struct chunkfile *cf,
61+
enum chunkfile_flags flags,
62+
void *data)
6063
{
6164
int i, result = 0;
62-
uint64_t cur_offset = hashfile_total(cf->f);
6365

6466
trace2_region_enter("chunkfile", "write", the_repository);
6567

66-
/* Add the table of contents to the current offset */
67-
cur_offset += (cf->chunks_nr + 1) * CHUNK_TOC_ENTRY_SIZE;
68+
if (!(flags & CHUNKFILE_TRAILING_TOC)) {
69+
uint64_t cur_offset = hashfile_total(cf->f);
6870

69-
for (i = 0; i < cf->chunks_nr; i++) {
70-
hashwrite_be32(cf->f, cf->chunks[i].id);
71-
hashwrite_be64(cf->f, cur_offset);
71+
/* Add the table of contents to the current offset */
72+
cur_offset += (cf->chunks_nr + 1) * CHUNK_TOC_ENTRY_SIZE;
7273

73-
cur_offset += cf->chunks[i].size;
74-
}
74+
for (i = 0; i < cf->chunks_nr; i++) {
75+
hashwrite_be32(cf->f, cf->chunks[i].id);
76+
hashwrite_be64(cf->f, cur_offset);
77+
78+
cur_offset += cf->chunks[i].size;
79+
}
7580

76-
/* Trailing entry marks the end of the chunks */
77-
hashwrite_be32(cf->f, 0);
78-
hashwrite_be64(cf->f, cur_offset);
81+
/* Trailing entry marks the end of the chunks */
82+
hashwrite_be32(cf->f, 0);
83+
hashwrite_be64(cf->f, cur_offset);
84+
}
7985

8086
for (i = 0; i < cf->chunks_nr; i++) {
81-
off_t start_offset = hashfile_total(cf->f);
87+
cf->chunks[i].offset = hashfile_total(cf->f);
8288
result = cf->chunks[i].write_fn(cf->f, data);
8389

8490
if (result)
8591
goto cleanup;
8692

87-
if (hashfile_total(cf->f) - start_offset != cf->chunks[i].size)
88-
BUG("expected to write %"PRId64" bytes to chunk %"PRIx32", but wrote %"PRId64" instead",
89-
cf->chunks[i].size, cf->chunks[i].id,
90-
hashfile_total(cf->f) - start_offset);
93+
if (!(flags & CHUNKFILE_TRAILING_TOC)) {
94+
if (hashfile_total(cf->f) - cf->chunks[i].offset != cf->chunks[i].size)
95+
BUG("expected to write %"PRId64" bytes to chunk %"PRIx32", but wrote %"PRId64" instead",
96+
cf->chunks[i].size, cf->chunks[i].id,
97+
hashfile_total(cf->f) - cf->chunks[i].offset);
98+
}
99+
100+
cf->chunks[i].size = hashfile_total(cf->f) - cf->chunks[i].offset;
101+
}
102+
103+
if (flags & CHUNKFILE_TRAILING_TOC) {
104+
size_t last_chunk_tail = hashfile_total(cf->f);
105+
/* First entry marks the end of the chunks */
106+
hashwrite_be32(cf->f, 0);
107+
hashwrite_be64(cf->f, last_chunk_tail);
108+
109+
for (i = cf->chunks_nr - 1; i >= 0; i--) {
110+
hashwrite_be32(cf->f, cf->chunks[i].id);
111+
hashwrite_be64(cf->f, cf->chunks[i].offset);
112+
}
91113
}
92114

93115
cleanup:
@@ -151,6 +173,59 @@ int read_table_of_contents(struct chunkfile *cf,
151173
return 0;
152174
}
153175

176+
int read_trailing_table_of_contents(struct chunkfile *cf,
177+
const unsigned char *mfile,
178+
size_t mfile_size)
179+
{
180+
int i;
181+
uint32_t chunk_id;
182+
const unsigned char *table_of_contents = mfile + mfile_size - the_hash_algo->rawsz;
183+
184+
while (1) {
185+
uint64_t chunk_offset;
186+
187+
table_of_contents -= CHUNK_TOC_ENTRY_SIZE;
188+
189+
chunk_id = get_be32(table_of_contents);
190+
chunk_offset = get_be64(table_of_contents + 4);
191+
192+
/* Calculate the previous chunk size, if it exists. */
193+
if (cf->chunks_nr) {
194+
off_t previous_offset = cf->chunks[cf->chunks_nr - 1].offset;
195+
196+
if (chunk_offset < previous_offset ||
197+
chunk_offset > table_of_contents - mfile) {
198+
error(_("improper chunk offset(s) %"PRIx64" and %"PRIx64""),
199+
previous_offset, chunk_offset);
200+
return -1;
201+
}
202+
203+
cf->chunks[cf->chunks_nr - 1].size = chunk_offset - previous_offset;
204+
}
205+
206+
/* Stop at the null chunk. We only need it for the last size. */
207+
if (!chunk_id)
208+
break;
209+
210+
for (i = 0; i < cf->chunks_nr; i++) {
211+
if (cf->chunks[i].id == chunk_id) {
212+
error(_("duplicate chunk ID %"PRIx32" found"),
213+
chunk_id);
214+
return -1;
215+
}
216+
}
217+
218+
ALLOC_GROW(cf->chunks, cf->chunks_nr + 1, cf->chunks_alloc);
219+
220+
cf->chunks[cf->chunks_nr].id = chunk_id;
221+
cf->chunks[cf->chunks_nr].start = mfile + chunk_offset;
222+
cf->chunks[cf->chunks_nr].offset = chunk_offset;
223+
cf->chunks_nr++;
224+
}
225+
226+
return 0;
227+
}
228+
154229
static int pair_chunk_fn(const unsigned char *chunk_start,
155230
size_t chunk_size,
156231
void *data)

chunk-format.h

Lines changed: 17 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -31,14 +31,30 @@ void add_chunk(struct chunkfile *cf,
3131
uint32_t id,
3232
size_t size,
3333
chunk_write_fn fn);
34-
int write_chunkfile(struct chunkfile *cf, void *data);
34+
35+
enum chunkfile_flags {
36+
CHUNKFILE_TRAILING_TOC = (1 << 0),
37+
};
38+
39+
int write_chunkfile(struct chunkfile *cf,
40+
enum chunkfile_flags flags,
41+
void *data);
3542

3643
int read_table_of_contents(struct chunkfile *cf,
3744
const unsigned char *mfile,
3845
size_t mfile_size,
3946
uint64_t toc_offset,
4047
int toc_length);
4148

49+
/**
50+
* Read the given chunkfile, but read the table of contents from the
51+
* end of the given mfile. The file is expected to be a hashfile with
52+
* the_hash_file->rawsz bytes at the end storing the hash.
53+
*/
54+
int read_trailing_table_of_contents(struct chunkfile *cf,
55+
const unsigned char *mfile,
56+
size_t mfile_size);
57+
4258
#define CHUNK_NOT_FOUND (-2)
4359

4460
/*

ci/run-build-and-tests.sh

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,7 @@ linux-TEST-vars)
3030
export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=master
3131
export GIT_TEST_WRITE_REV_INDEX=1
3232
export GIT_TEST_CHECKOUT_WORKERS=2
33+
export GIT_TEST_PACKED_REFS_VERSION=2
3334
;;
3435
linux-clang)
3536
export GIT_TEST_DEFAULT_HASH=sha1

0 commit comments

Comments
 (0)