Skip to content

Reftable support git-core #539

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 15 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions Documentation/config/extensions.txt
Original file line number Diff line number Diff line change
Expand Up @@ -6,3 +6,12 @@ extensions.objectFormat::
Note that this setting should only be set by linkgit:git-init[1] or
linkgit:git-clone[1]. Trying to change it after initialization will not
work and will produce hard-to-diagnose issues.
+
extensions.refStorage::
Specify the ref storage mechanism to use. The acceptable values are `files` and
`reftable`. If not specified, `files` is assumed. It is an error to specify
this key unless `core.repositoryFormatVersion` is 1.
+
Note that this setting should only be set by linkgit:git-init[1] or
linkgit:git-clone[1]. Trying to change it after initialization will not
work and will produce hard-to-diagnose issues.
7 changes: 7 additions & 0 deletions Documentation/technical/repository-version.txt
Original file line number Diff line number Diff line change
Expand Up @@ -100,3 +100,10 @@ If set, by default "git config" reads from both "config" and
multiple working directory mode, "config" file is shared while
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Han-Wen Nienhuys via GitGitGadget" <gitgitgadget@gmail.com> writes:

> diff --git a/t/t0031-reftable.sh b/t/t0031-reftable.sh
> new file mode 100755
> index 00000000000..d35211c9db2
> --- /dev/null
> +++ b/t/t0031-reftable.sh
> @@ -0,0 +1,31 @@
> +#!/bin/sh
> +#
> +# Copyright (c) 2020 Google LLC
> +#
> +
> +test_description='reftable basics'
> +
> +. ./test-lib.sh
> +
> +test_expect_success 'basic operation of reftable storage' '
> +        git init --ref-storage=reftable repo &&

Test script bodies are supposed to be indented with HT, not 8 SPs.

> +        cd repo &&

You are not supposed to chdir around inside tests, unless you do so
in a subshell.  i.e.

	test_expect_success 'basic operation' '
		git init --ref-storage=reftable repo &&
		(
			cd repo &&
			...
		)
	'

This is because the next test somebody else will add after this one
*will* start from inside that repo (but if and only if this step
gets run---and there are ways to skip steps).

> +        echo "hello" > world.txt &&

A shell redirection operator should immediately be followed by the
filename without SP in between, i.e.

	echo hello >world.txt &&

> +        git add world.txt &&
> +        git commit -m "first post" &&
> +        printf "HEAD\nrefs/heads/master\n" > want &&

There is an easier-to-read helper.  Also, by convention, the file
that holds the expected state is called "expect", while the file
that records the actual state observed is called "actual".

	test_write_lines HEAD refs/heads/master >expect &&

> +        git show-ref | cut -f2 -d" " > got &&

Is the order of "show-ref" output guaranteed in some way?  Otherwise
we may need to make sure that we sort it to keep it stable.

Without --show-head, HEAD should not appear in "git show-ref"
output, but the expected output has HEAD, which is puzzling.

We will not notice if a "git" command that placed on the left hand
side of a pipeline segfaults after showing its output.  

People often split a pipeline like this into separate command for
that reason (but there is probably a better way in this case, as we
see soon).

> +        test_cmp got want &&

Taking the above comments together, perhaps

	test_write_lines refs/heads/master >expect &&
	git for-each-ref --sort=refname >actual &&
	test_cmp expect actual &&

> +        for count in $(test_seq 1 10); do
> +                echo "hello" >> world.txt
> +                git commit -m "number ${count}" world.txt
> +        done &&

Style: write "do" on its own line, aligned with "for" on the
previous line.  Same for "if/then/else/fi" "while/do/done" etc.  

    An easy way to remember is that you never use any semicolon in
    your shell script, except for the double-semicolon you use to
    delimit your case arms.

When any of these steps fail, you do not notice it, unless the
failure happens to occur to "git commit" during the last round.  I
think you can use "return 1" to signal the failure to the test
framework, like so.

	for count in $(test_seq 1 10)
	do
		echo hello >>world.txt &&
		git commit -m "number $count} world.txt ||
		return 1
	done &&

> +        git gc &&
> +        nfiles=$(ls -1 .git/reftable | wc -l | tr -d "[ \t]" ) &&
> +        test "${nfiles}" = "2" &&

No need to pipe "wc -l" output to "tr" to strip spaces.  Just stop
double-quoting it on the using side, and even better is to use
test_line_count.  Perhaps

	ls -1 .git/reftable >files &&
	test_line_count = 2 files

> +        git reflog refs/heads/master > output &&
> +        grep "commit (initial): first post" output &&
> +        grep "commit: number 10" output

Do we give guarantees that we keep all the reflog entries?  Would it
help to count the number of lines in the output file here?

> +'
> +
> +test_done

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Han-Wen Nienhuys wrote (reply to this):

On Wed, Feb 26, 2020 at 7:13 PM Junio C Hamano <gitster@pobox.com> wrote:
> > diff --git a/t/t0031-reftable.sh b/t/t0031-reftable.sh
> > new file mode 100755
> > index 00000000000..d35211c9db2
> > --- /dev/null
> > +++ b/t/t0031-reftable.sh

Thanks for your careful review. Should we not start with the C code,
though? I've written the reftable library to my best ability, but it
hasn't been peer reviewed yet.

> > +test_expect_success 'basic operation of reftable storage' '
> > +        git init --ref-storage=reftable repo &&
>
> Test script bodies are supposed to be indented with HT, not 8 SPs.

Done

> > +        cd repo &&
>
> You are not supposed to chdir around inside tests, unless you do so
> in a subshell.  i.e.
>
>         test_expect_success 'basic operation' '
>                 git init --ref-storage=reftable repo &&
>                 (
>                         cd repo &&

Done.

> > +        echo "hello" > world.txt &&
>
> A shell redirection operator should immediately be followed by the
> filename without SP in between, i.e.

Done.

>         echo hello >world.txt &&
>
> > +        git add world.txt &&
> > +        git commit -m "first post" &&
> > +        printf "HEAD\nrefs/heads/master\n" > want &&
>
> There is an easier-to-read helper.  Also, by convention, the file
> that holds the expected state is called "expect", while the file
> that records the actual state observed is called "actual".

Done

>         test_write_lines HEAD refs/heads/master >expect &&
>
> > +        git show-ref | cut -f2 -d" " > got &&
>
> Is the order of "show-ref" output guaranteed in some way?  Otherwise
> we may need to make sure that we sort it to keep it stable.

The ref iteration result is lexicographically ordered.

> Without --show-head, HEAD should not appear in "git show-ref"
> output, but the expected output has HEAD, which is puzzling.

The HEAD symref is stored in reftable too, and to the reftable code
it's just another ref. How should we refactor this? Shouldn't the
files ref backend produce "HEAD" and then it's up to the show-ref
builtin to filter it out?

> We will not notice if a "git" command that placed on the left hand
> side of a pipeline segfaults after showing its output.
>
> People often split a pipeline like this into separate command for
> that reason (but there is probably a better way in this case, as we
> see soon).
>
> > +        test_cmp got want &&
>
> Taking the above comments together, perhaps
>
>         test_write_lines refs/heads/master >expect &&
>         git for-each-ref --sort=refname >actual &&
>         test_cmp expect actual &&
>
> > +        for count in $(test_seq 1 10); do
> > +                echo "hello" >> world.txt
> > +                git commit -m "number ${count}" world.txt
> > +        done &&
>
> Style: write "do" on its own line, aligned with "for" on the
> previous line.  Same for "if/then/else/fi" "while/do/done" etc.

Done.

>     An easy way to remember is that you never use any semicolon in
>     your shell script, except for the double-semicolon you use to
>     delimit your case arms.
>
> When any of these steps fail, you do not notice it, unless the
> failure happens to occur to "git commit" during the last round.  I
> think you can use "return 1" to signal the failure to the test
> framework, like so.
>
>         for count in $(test_seq 1 10)
>         do
>                 echo hello >>world.txt &&
>                 git commit -m "number $count} world.txt ||
>                 return 1
>         done &&

Done.

>         ls -1 .git/reftable >files &&
>         test_line_count = 2 files
>

Done.

> > +        git reflog refs/heads/master > output &&
> > +        grep "commit (initial): first post" output &&
> > +        grep "commit: number 10" output
>
> Do we give guarantees that we keep all the reflog entries?  Would it
> help to count the number of lines in the output file here?

The reflog entries are only removed if someone calls into the ref
backend to expire them. Does that happen automatically?

-- 
Han-Wen Nienhuys - Google Munich
I work 80%. Don't expect answers from me on Fridays.
--

Google Germany GmbH, Erika-Mann-Strasse 33, 80636 Munich

Registergericht und -nummer: Hamburg, HRB 86891

Sitz der Gesellschaft: Hamburg

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

Han-Wen Nienhuys <hanwen@google.com> writes:

> On Wed, Feb 26, 2020 at 7:13 PM Junio C Hamano <gitster@pobox.com> wrote:
>> > diff --git a/t/t0031-reftable.sh b/t/t0031-reftable.sh
>> > new file mode 100755
>> > index 00000000000..d35211c9db2
>> > --- /dev/null
>> > +++ b/t/t0031-reftable.sh
>
> Thanks for your careful review. Should we not start with the C code,
> though? I've written the reftable library to my best ability, but it
> hasn't been peer reviewed yet.

My intention was to let others, especially those from outside
Google, to comment before I do ;-)

>>         test_write_lines HEAD refs/heads/master >expect &&
>>
>> > +        git show-ref | cut -f2 -d" " > got &&
>>
>> Is the order of "show-ref" output guaranteed in some way?  Otherwise
>> we may need to make sure that we sort it to keep it stable.
>
> The ref iteration result is lexicographically ordered.
>
>> Without --show-head, HEAD should not appear in "git show-ref"
>> output, but the expected output has HEAD, which is puzzling.
>
> The HEAD symref is stored in reftable too, and to the reftable code
> it's just another ref. How should we refactor this? Shouldn't the
> files ref backend produce "HEAD" and then it's up to the show-ref
> builtin to filter it out?

After looking at show-ref.c, the answer is obvious, I would think.
for_each_ref() MUST NOT include HEAD in its enumeration, and that is
why the caller of for_each_ref() makes a separate call to head_ref()
when it wants to talk about HEAD.  As a backend-agnostic abstraction
layer, behaviour of functions like for_each_ref() should not change
depending on what ref backend is in use.

> The reflog entries are only removed if someone calls into the ref
> backend to expire them. Does that happen automatically?

"gc auto" may kick in in theory but not likely in these tests, so it
would be safe and sensible to know and check the expected number of
reflog entries in this test, I guess.

Thanks.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Han-Wen Nienhuys wrote (reply to this):

On Wed, Feb 26, 2020 at 8:59 PM Junio C Hamano <gitster@pobox.com> wrote:
> >> Without --show-head, HEAD should not appear in "git show-ref"
> >> output, but the expected output has HEAD, which is puzzling.
> >
> > The HEAD symref is stored in reftable too, and to the reftable code
> > it's just another ref. How should we refactor this? Shouldn't the
> > files ref backend produce "HEAD" and then it's up to the show-ref
> > builtin to filter it out?
>
> After looking at show-ref.c, the answer is obvious, I would think.
> for_each_ref() MUST NOT include HEAD in its enumeration, and that is
> why the caller of for_each_ref() makes a separate call to head_ref()
> when it wants to talk about HEAD.  As a backend-agnostic abstraction
> layer, behaviour of functions like for_each_ref() should not change
> depending on what ref backend is in use.

So, the ref backend should manage the HEAD ref, but iteration should
not produce the HEAD ref.

Are there other refs that behave like this, or is this the only special case?

-- 
Han-Wen Nienhuys - Google Munich
I work 80%. Don't expect answers from me on Fridays.
--

Google Germany GmbH, Erika-Mann-Strasse 33, 80636 Munich

Registergericht und -nummer: Hamburg, HRB 86891

Sitz der Gesellschaft: Hamburg

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

Han-Wen Nienhuys <hanwen@google.com> writes:

> So, the ref backend should manage the HEAD ref, but iteration should
> not produce the HEAD ref.

Yeah, as there is a dedicated API function head_ref().

Things like ORIG_HEAD and MERGE_HEAD have always been curiosity
outside the official API, but IIRC read_ref() and friends are
prepared to read them [*1*], so the vocabulary may be unbounded [*2*].

These won't be listed in for_each_ref() iteration, either (think of
for_each_ref() as a filtered "ls -R .git/refs/" output).

Thanks.


[Footnotes]

*1* I do not offhand know if these *_HEAD pseudo-refs are written
via the refs API---it probably is the safest to arrange the ref
backends in such a way that they always go to the files backend,
even if HEAD and refs/* are managed by a backend different from the
files one.

*2* I vaguely recall that back when ref-backend infrastructure was
introduced, we had discussion on declaring that "^[A-Z_]*HEAD$"
directly under $GIT_DIR/ are these special refs (and nothing else
directly under $GIT_DIR/ is).  I do not recall what happend to the
discussion, though other folks may.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Han-Wen Nienhuys wrote (reply to this):

On Thu, Feb 27, 2020 at 5:23 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Han-Wen Nienhuys <hanwen@google.com> writes:
>
> > So, the ref backend should manage the HEAD ref, but iteration should
> > not produce the HEAD ref.
>
> Yeah, as there is a dedicated API function head_ref().
>
> Things like ORIG_HEAD and MERGE_HEAD have always been curiosity
> outside the official API, but IIRC read_ref() and friends are
> prepared to read them [*1*], so the vocabulary may be unbounded [*2*].
>
> These won't be listed in for_each_ref() iteration, either (think of
> for_each_ref() as a filtered "ls -R .git/refs/" output).


currently the code says

 int refs_for_each_ref(struct ref_store *refs, each_ref_fn fn, void *cb_data)
 {
   return do_for_each_ref(refs, "", fn, 0, 0, cb_data);
 }

but it looks like this should do

   return do_for_each_ref(refs, "refs/", fn, 0, 0, cb_data);

instead.

-- 
Han-Wen Nienhuys - Google Munich
I work 80%. Don't expect answers from me on Fridays.
--
Google Germany GmbH, Erika-Mann-Strasse 33, 80636 Munich
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg
Geschäftsführer: Paul Manicle, Halimah DeLaine Prado

"config.worktree" is per-working directory (i.e., it's in
GIT_COMMON_DIR/worktrees/<id>/config.worktree)

==== `refStorage`

Specifies the file format for the ref database. Values are `files`
(for the traditional packed + loose ref format) and `reftable` for the
binary reftable format. See https://github.com/google/reftable for
more information.
46 changes: 43 additions & 3 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -728,6 +728,7 @@ TEST_BUILTINS_OBJS += test-read-cache.o
TEST_BUILTINS_OBJS += test-read-graph.o
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Han-Wen Nienhuys via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Han-Wen Nienhuys <hanwen@google.com>
>
> The unittests are under reftable/*_test.c, so all of the reftable code stays in
> one directory. They are called from t/helpers/test-reftable.c in t0031-reftable.sh
>
> Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
> ---

Hmph,

make: *** No rule to make target 't/helper/test-reftable.c', needed by 't/helper/test-reftable.o'.  Stop.
make: *** Waiting for unfinished jobs....

>  Makefile             | 17 ++++++++++++++++-
>  t/helper/test-tool.c |  1 +
>  t/helper/test-tool.h |  1 +
>  t/t0031-reftable.sh  |  5 +++++
>  4 files changed, 23 insertions(+), 1 deletion(-)

Forgot to add a source file?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, hanwen@google.com wrote (reply to this):

From: Han-Wen Nienhuys <hanwen@google.com>

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 t/helper/test-reftable.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)
 create mode 100644 t/helper/test-reftable.c

diff --git a/t/helper/test-reftable.c b/t/helper/test-reftable.c
new file mode 100644
index 0000000000..95d18ba1fa
--- /dev/null
+++ b/t/helper/test-reftable.c
@@ -0,0 +1,15 @@
+#include "reftable/reftable-tests.h"
+#include "test-tool.h"
+
+int cmd__reftable(int argc, const char **argv)
+{
+	block_test_main(argc, argv);
+	merged_test_main(argc, argv);
+	record_test_main(argc, argv);
+	refname_test_main(argc, argv);
+	reftable_test_main(argc, argv);
+	slice_test_main(argc, argv);
+	stack_test_main(argc, argv);
+	tree_test_main(argc, argv);
+	return 0;
+}
-- 
2.27.0.278.ge193c7cf3a9-goog

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

hanwen@google.com writes:

> Subject: Re: [PATCH] Fixup! Add t/helper/test-reftable.c

I do not see a step with "Add t/helper/test-reftable.c" as its
subject; is this supposed to be squashed into "Hookup unittests for
the reftable library." or somewhere else?

I also notice, with or without this squashed in, 775c540f (Hookup
unittests for the reftable library., 2020-06-05) does not compile
cleanly.  Since you are relatively new to this project (even though
you are very experienced and capable), I probably should let you
know that in this project each patch in a series is expected to
cleanly compile and pass the testsuite, not just the endstate.

Thanks.

TEST_BUILTINS_OBJS += test-read-midx.o
TEST_BUILTINS_OBJS += test-ref-store.o
TEST_BUILTINS_OBJS += test-reftable.o
TEST_BUILTINS_OBJS += test-regex.o
TEST_BUILTINS_OBJS += test-repository.o
TEST_BUILTINS_OBJS += test-revision-walking.o
Expand Down Expand Up @@ -804,6 +805,8 @@ TEST_SHELL_PATH = $(SHELL_PATH)

LIB_FILE = libgit.a
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Jeff King wrote (reply to this):

On Mon, Jan 27, 2020 at 02:22:24PM +0000, Han-Wen Nienhuys via GitGitGadget wrote:

> [...]

I'll try to add my 2 cents to all of the XXX spots you flagged.

> @@ -1839,10 +1839,22 @@ static struct ref_store *lookup_ref_store_map(struct hashmap *map,
>  static struct ref_store *ref_store_init(const char *gitdir,
>  					unsigned int flags)
>  {
> -	const char *be_name = "files";
> -	struct ref_storage_be *be = find_ref_storage_backend(be_name);
> +	struct strbuf refs_path = STRBUF_INIT;
> +
> +        /* XXX this should probably come from a git config setting and not
> +           default to reftable. */
> +	const char *be_name = "reftable";

I think the scheme here needs to be something like:

  - "struct repository" gets a new "ref_format" field

  - setup.c:check_repo_format() learns about an extensions.refFormat
    config key, which we use to set repo->ref_format

  - init/clone should take a command-line option for the ref format of
    the new repository. Anybody choosing reftables would want to set
    core.repositoryformatversion to "1" and set the extensions.refFormat
    key.

  - likewise, there should be a config setting to choose the default ref
    format. It would obviously stay at "files" for now, but we'd
    eventually perhaps flip the default to "reftable".

Some thoughts on compatibility:

The idea of the config changes is that older versions of Git will either
complain about repositoryformatversion (if very old), or will complain
that they don't know about extensions.refFormat. But the changes you
made in patch 1 indicate that existing versions of Git won't consider it
to be a Git repository at all!

I think this is slightly non-ideal, in that we'd continue walking up the
tree looking for a Git repo. And either:

  - we'd find one, which would be confusing and possibly wrong

  - we wouldn't, in which case the error would be something like "no git
    repository", and not "your git repository is too new"

So it would be really nice if we continued to have a HEAD file (just
with some sentinel value in it) and a refs/ directory, so that existing
versions of Git realize "there's a repository here, but it's too new for
me".

There's a slight downside in that tools which _aren't_ careful about
repositoryformatversion might try to operate on the repository, writing
into refs/ or whatever. But such tools are broken, and I'm not sure we
should be catering to them (besides which, the packed-refs ship sailed
long ago, so they're already potentially dangerous).

> +/* XXX which ordering are these? Newest or oldest first? */
>  int for_each_reflog_ent(const char *refname, each_reflog_ent_fn fn, void *cb_data);
>  int for_each_reflog_ent_reverse(const char *refname, each_reflog_ent_fn fn, void *cb_data);

The top one is chronological order (i.e., reading the file from start to
finish), and the latter is reverse-chronological (seeking backwards in
the file).

> +static struct ref_iterator *
> +reftable_ref_iterator_begin(struct ref_store *ref_store, const char *prefix,
> +			    unsigned int flags)
> +{
> +	struct reftable_ref_store *refs =
> +		(struct reftable_ref_store *)ref_store;
> +	struct reftable_iterator *ri = xcalloc(1, sizeof(*ri));
> +	struct merged_table *mt = NULL;
> +	int err = 0;
> +	if (refs->err) {
> +		/* how to propagate errors? */
> +		return NULL;
> +	}
> +
> +	mt = stack_merged_table(refs->stack);
> +
> +	/* XXX something with flags? */

I think the flags here are DO_FOR_EACH_*. You might need to consider
INCLUDE_BROKEN here (or it might be dealt with at a layer above).

There's also PER_WORKTREE_ONLY; I think you'd need to check ref_type()
there, like the files backend does.

> +	err = merged_table_seek_ref(mt, &ri->iter, prefix);
> +	/* XXX what to do with err? */

Hmm, I don't think you can return an error from iterator_begin(). It
would probably be OK to record the error in your local struct, and then
later return ITER_ERROR from iterator_advance().

I notice that the more-recent dir_iterator_begin() returns NULL in case
of an error, but looking at the callers of the ref iterators, they don't
seem to be ready to handle NULL.

> +static int write_transaction_table(struct writer *writer, void *arg)
> +{
> +	struct ref_transaction *transaction = (struct ref_transaction *)arg;
> +	struct reftable_ref_store *refs =
> +		(struct reftable_ref_store *)transaction->ref_store;
> +	uint64_t ts = stack_next_update_index(refs->stack);
> +	int err = 0;
> +	/* XXX - are we allowed to mutate the input data? */
> +	qsort(transaction->updates, transaction->nr,
> +	      sizeof(struct ref_update *), ref_update_cmp);

I don't offhand know of anything that would break, but it's probably a
bad idea to do so. If you need a sorted view, can you make an auxiliary
array-of-pointers? Also, the QSORT() macro is a little shorter and has
some extra checks.

> +	for (int i = 0; i < transaction->nr; i++) {
> +		struct ref_update *u = transaction->updates[i];
> +		if (u->flags & REF_HAVE_NEW) {
> +			struct object_id out_oid = {};
> +			int out_flags = 0;
> +			/* XXX who owns the memory here? */
> +			const char *resolved = refs_resolve_ref_unsafe(
> +				transaction->ref_store, u->refname, 0, &out_oid,
> +				&out_flags);

In the "unsafe" version, the memory belongs to a static buffer inside
the function. You shouldn't need to free it.

> +static int
> +reftable_reflog_ref_iterator_advance(struct ref_iterator *ref_iterator)
> [...]
> +		/* XXX const? */
> +		memcpy(&ri->oid.hash, ri->log.new_hash, GIT_SHA1_RAWSZ);

You'd want probably want to use hashcpy() here (or better yet, use
"struct object_id" in ri->log, too, and then use oidcpy()).

But that raises a question: how ready are reftables to handle non-sha1
object ids? I see a lot of GIT_SHA1_RAWSZ, and I think the on-disk
format actually has binary sha1s, right? In theory if those all become
the_hash_algo->rawsz, then it might "Just Work" to read and write
slightly larger entries.

> +reftable_for_each_reflog_ent_newest_first(struct ref_store *ref_store,
> +					  const char *refname,
> +					  each_reflog_ent_fn fn, void *cb_data)
> [...]
> +			/* XXX committer = email? name? */
> +			if (fn(&old_oid, &new_oid, log.name, log.time,
> +			       log.tz_offset, log.message, cb_data)) {
> +				err = -1;
> +				break;
> +			}

The committer entry we pass back to each_reflog_ent_fn should be the
full "Human Name <email@host>".

> +static int reftable_reflog_expire(struct ref_store *ref_store,
> +				  const char *refname,
> +				  const struct object_id *oid,
> +				  unsigned int flags,
> +				  reflog_expiry_prepare_fn prepare_fn,
> +				  reflog_expiry_should_prune_fn should_prune_fn,
> +				  reflog_expiry_cleanup_fn cleanup_fn,
> +				  void *policy_cb_data)
> +{
> +	/*
> +	  XXX
> +
> +	  This doesn't fit with the reftable API. If we are expiring for space
> +	  reasons, the expiry should be combined with a compaction, and there
> +	  should be a policy that can be called for all refnames, not for a
> +	  single ref name.
> +
> +	  If this is for cleaning up individual entries, we'll have to write
> +	  extra data to create tombstones.
> +	 */
> +	return 0;
> +}

I agree that we'd generally want to expire and compact all entries at
once. Unfortunately I think this per-ref interface is exposed by
git-reflog. I.e., you can do "git reflog expire refs/heads/foo".

Could we add an extra API call for "expire everything", and then:

  - have refs/files-backend.c implement that by just iterating over all
    of the refs and pruning them one by one

  - have builtin/reflog.c trigger the "expire everything" API when it
    sees "--all"

  - teach refs/reftables-backend.c a crappy version of the per-ref
    expiration, where it just inserts tombstones but doesn't do any
    compaction. It doesn't have to be fast or produce a good output;
    it's just for compatibility.

    Alternatively, I'd be OK with die("sorry, reftables doesn't support
    per-ref reflog expiration") as a first step.

-Peff

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Martin Fick wrote (reply to this):

On Tuesday, January 28, 2020 2:31:00 AM MST Jeff King wrote:
> Some thoughts on compatibility:
> 
> The idea of the config changes is that older versions of Git will either
> complain about repositoryformatversion (if very old), or will complain
> that they don't know about extensions.refFormat. But the changes you
> made in patch 1 indicate that existing versions of Git won't consider it
> to be a Git repository at all!
> 
> I think this is slightly non-ideal, in that we'd continue walking up the
> tree looking for a Git repo. And either:
> 
>   - we'd find one, which would be confusing and possibly wrong
> 
>   - we wouldn't, in which case the error would be something like "no git
>     repository", and not "your git repository is too new"
> 
> So it would be really nice if we continued to have a HEAD file (just
> with some sentinel value in it) and a refs/ directory, so that existing
> versions of Git realize "there's a repository here, but it's too new for
> me".
> 
> There's a slight downside in that tools which _aren't_ careful about
> repositoryformatversion might try to operate on the repository, writing
> into refs/ or whatever. But such tools are broken, and I'm not sure we
> should be catering to them (besides which, the packed-refs ship sailed
> long ago, so they're already potentially dangerous).

Could you elaborate on this a bit because it seems on the surface that these 
tools aren't very dangerous, and therefore likely many of them exist?

What are the dangers today of tools that understand/operate on loose and 
packed refs without reading repositoryformatversion?

-Martin

-- 
The Qualcomm Innovation Center, Inc. is a member of Code 
Aurora Forum, hosted by The Linux Foundation

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Han-Wen Nienhuys wrote (reply to this):

On Tue, Jan 28, 2020 at 8:31 AM Jeff King <peff@peff.net> wrote:
>
> On Mon, Jan 27, 2020 at 02:22:24PM +0000, Han-Wen Nienhuys via GitGitGadget wrote:
>
> > [...]
>
> I'll try to add my 2 cents to all of the XXX spots you flagged.
>

thanks! that was really helpful.

>
> Some thoughts on compatibility:
>
> The idea of the config changes is that older versions of Git will either
> complain about repositoryformatversion (if very old), or will complain
> that they don't know about extensions.refFormat. But the changes you
> made in patch 1 indicate that existing versions of Git won't consider it
> to be a Git repository at all!
>
> I think this is slightly non-ideal, in that we'd continue walking up the
> tree looking for a Git repo. And either:
>
>   - we'd find one, which would be confusing and possibly wrong
>
>   - we wouldn't, in which case the error would be something like "no git
>     repository", and not "your git repository is too new"
>
> So it would be really nice if we continued to have a HEAD file (just
> with some sentinel value in it) and a refs/ directory, so that existing
> versions of Git realize "there's a repository here, but it's too new for
> me".

You have good points.

JGit currently implements what we have here, as this is what's spelled
out in the spec that Shawn posted  back in the day. It's probably
acceptable to this, though, as the reftable support has only landed in
JGit very recently and will probably appear very experimental to
folks.

How would the layout be then? We'll have

  HEAD - dummy file
  reftable/ - the tables
  refs/ - dummy dir

where shall we store the reftable list? maybe in a file called

  reftable-list

If we have both HEAD/refs + (refable/reftable-list), what should we
put there to ensure that no git version actually manages to use the
repository? (what happens if someone deletes the version setting from
the .git/config file)

> > +/* XXX which ordering are these? Newest or oldest first? */
> >  int for_each_reflog_ent(const char *refname, each_reflog_ent_fn fn, void *cb_data);
> >  int for_each_reflog_ent_reverse(const char *refname, each_reflog_ent_fn fn, void *cb_data);
>
> The top one is chronological order (i.e., reading the file from start to
> finish), and the latter is reverse-chronological (seeking backwards in
> the file).

thanks, I put in a commit to clarify this.

> > +     err = merged_table_seek_ref(mt, &ri->iter, prefix);
> > +     /* XXX what to do with err? */
>
> Hmm, I don't think you can return an error from iterator_begin(). It
> would probably be OK to record the error in your local struct, and then
> later return ITER_ERROR from iterator_advance().

good point. I did that.

> > +     /* XXX - are we allowed to mutate the input data? */
> > +     qsort(transaction->updates, transaction->nr,
> > +           sizeof(struct ref_update *), ref_update_cmp);
>
> I don't offhand know of anything that would break, but it's probably a
> bad idea to do so. If you need a sorted view, can you make an auxiliary
> array-of-pointers? Also, the QSORT() macro is a little shorter and has
> some extra checks.

Done.

> In the "unsafe" version, the memory belongs to a static buffer inside
> the function. You shouldn't need to free it.


OK.

> > +static int
> > +reftable_reflog_ref_iterator_advance(struct ref_iterator *ref_iterator)
> > [...]
> > +             /* XXX const? */
> > +             memcpy(&ri->oid.hash, ri->log.new_hash, GIT_SHA1_RAWSZ);
>
> You'd want probably want to use hashcpy() here (or better yet, use
> "struct object_id" in ri->log, too, and then use oidcpy()).

It's more practical in the library to use pointers rather than fixed
size arrays, so it'll be hashcpy.

> But that raises a question: how ready are reftables to handle non-sha1
> object ids? I see a lot of GIT_SHA1_RAWSZ, and I think the on-disk
> format actually has binary sha1s, right? In theory if those all become
> the_hash_algo->rawsz, then it might "Just Work" to read and write
> slightly larger entries.

The format fixes the reftable at 20 bytes, and there is not enough
framing information to just write more data. We'll have to encode the
hash size in the version number somehow, eg. we could use the  higher
order bit of the version byte to encode it, for example.

But it needs a new version of the spec. I think it's premature to do
this while v1 of reftable isn't in git-core yet.


> The committer entry we pass back to each_reflog_ent_fn should be the
> full "Human Name <email@host>".

thx, done.

> I agree that we'd generally want to expire and compact all entries at
> once. Unfortunately I think this per-ref interface is exposed by
> git-reflog. I.e., you can do "git reflog expire refs/heads/foo".
>
> Could we add an extra API call for "expire everything", and then:

I took note of your remarks, and added a BUG() for now.

I'll focus on getting the CI on gitgitgadget to build this code first.

-- 
Han-Wen Nienhuys - Google Munich
I work 80%. Don't expect answers from me on Fridays.
--

Google Germany GmbH, Erika-Mann-Strasse 33, 80636 Munich

Registergericht und -nummer: Hamburg, HRB 86891

Sitz der Gesellschaft: Hamburg

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Jeff King wrote (reply to this):

On Tue, Jan 28, 2020 at 08:36:53AM -0700, Martin Fick wrote:

> > There's a slight downside in that tools which _aren't_ careful about
> > repositoryformatversion might try to operate on the repository, writing
> > into refs/ or whatever. But such tools are broken, and I'm not sure we
> > should be catering to them (besides which, the packed-refs ship sailed
> > long ago, so they're already potentially dangerous).
> 
> Could you elaborate on this a bit because it seems on the surface that these 
> tools aren't very dangerous, and therefore likely many of them exist?
> 
> What are the dangers today of tools that understand/operate on loose and 
> packed refs without reading repositoryformatversion?

I was mostly thinking of hacky scripts that tried to touch .git/refs
directly. And there are a few levels of dangerous there:

  - if you're doing "echo $sha1 >.git/refs/heads/master", then you're
    not locking properly. But it would probably work most of the time.

  - if you're properly taking a lock on ".git/refs/heads/master.lock"
    and renaming into place but not looking at packed-refs, then you
    might overwrite somebody else's update which is in the packed file

  - if you're trying to read refs and not reading packed-refs, obviously
    you might miss some values

If you're actually doing the correct locking and packed-refs read (which
"real" implementations like libgit2 do) then no, I don't think that's
dangerous. And I think libgit2 properly complains when it sees a
repositoryformatversion above 0. I don't know offhand about JGit, or any
of the lesser-used ones like dulwich or go-git.

-Peff

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Jeff King wrote (reply to this):

On Tue, Jan 28, 2020 at 04:56:26PM +0100, Han-Wen Nienhuys wrote:

> JGit currently implements what we have here, as this is what's spelled
> out in the spec that Shawn posted  back in the day. It's probably
> acceptable to this, though, as the reftable support has only landed in
> JGit very recently and will probably appear very experimental to
> folks.
> 
> How would the layout be then? We'll have
> 
>   HEAD - dummy file
>   reftable/ - the tables
>   refs/ - dummy dir
> 
> where shall we store the reftable list? maybe in a file called
> 
>   reftable-list
> 
> If we have both HEAD/refs + (refable/reftable-list), what should we
> put there to ensure that no git version actually manages to use the
> repository? (what happens if someone deletes the version setting from
> the .git/config file)

Yeah, it would be nice to have something that an older version of Git
would totally choke on, but I'm not sure we have a lot of leeway. What
we put in HEAD has to be syntactically legitimate enough to appease
validate_headref(), so our options are either "ref:
refs/something/bogus" or an object hash that we don't have (e.g.,
0{40}). The former would be preferable because it would (in theory)
prevent us from writing to HEAD, as well.

I wondered what would happen if you put in a syntactically invalid ref,
like "ref: refs/.not/.valid" (leading dots are not allowed in path
components of refnames). It does cause _some_ parts of Git to choke, but
sadly "git update-ref HEAD $sha1" actually writes to .git/refs/.not/.valid.

Even "refs/../../dangerous" doesn't give it pause. Yikes. It seems we're
pretty willing to accept symref destinations without further checking.

Making "refs" a file instead of a directory does work nicely, as any
attempts to read or write would get ENOTDIR. And we can fool
is_git_directory() as long as it's marked executable. That's OK on POSIX
systems, but I'm not sure how it would work on Windows (or maybe it
would work just fine, since we presumably just say "yep, everything is
executable").

So perhaps that's enough, and what we put in HEAD won't matter (since
nobody will be able to write into refs/ anyway).

> > But that raises a question: how ready are reftables to handle non-sha1
> > object ids? I see a lot of GIT_SHA1_RAWSZ, and I think the on-disk
> > format actually has binary sha1s, right? In theory if those all become
> > the_hash_algo->rawsz, then it might "Just Work" to read and write
> > slightly larger entries.
> 
> The format fixes the reftable at 20 bytes, and there is not enough
> framing information to just write more data. We'll have to encode the
> hash size in the version number somehow, eg. we could use the  higher
> order bit of the version byte to encode it, for example.
> 
> But it needs a new version of the spec. I think it's premature to do
> this while v1 of reftable isn't in git-core yet.

I don't know that we technically need the reftables file to say how long
the hashes are. The git config will tell us which hash we're using, and
everything else is supposed to follow. So I think it would work OK as
long as you're able to be told by the rest of Git that hashes are N
bytes, and just use that to compute the fixed-size records.

That said, it might make for easier debugging if the reftables file
declares the size it assumes.

-Peff

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Jeff King wrote (reply to this):

On Tue, Feb 04, 2020 at 08:06:57PM +0100, Han-Wen Nienhuys wrote:

> On Wed, Jan 29, 2020 at 11:47 AM Jeff King <peff@peff.net> wrote:
> 
> > That said, it might make for easier debugging if the reftables file
> > declares the size it assumes.
> 
> If there is a mismatch in size, the reftable will look completely
> corrupted data.  I think this will be a bad experience.

Yes, but I think it would take a bunch of other failures to get there.
And it's true of all of the other parts of Git, too; the master switch
is a config setting that says "I'm a sha-256 repository".

That said, I'm not at all opposed to a header in the reftables file
saying "I'm a sha-256 reftable". We shouldn't ever see a mismatch there,
but that's what I meant by "easier debugging".

-Peff

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Jeff King wrote (reply to this):

On Tue, Feb 04, 2020 at 07:54:12PM +0100, Han-Wen Nienhuys wrote:

> > PS I don't know if it's set in stone yet, but if we're changing things
> >    around, is it an option to put reftable-list inside the reftable
> >    directory, just to keep the top-level uncluttered?
> 
> I put up https://git.eclipse.org/r/c/157167/ to update the spec inside
> JGit. Please review.

That looks quite reasonable. The one change I'd make is to put
"refs/heads/.invalid" into HEAD (instead of "refs/.invalid"). In my
experiments, existing Git is happy to write into an invalid refname via
"git update-ref HEAD". But by putting it past the non-directory "heads",
such a write would always fail.

It's also supposed to be a requirement that HEAD only puts to
refs/heads/, though our validate_headref() does not enforce that. I
don't know if any other implementations might, though.

I did compare earlier your "make refs/heads a file" versus "make refs/ a
file and just give it an executable bit". And I think the "refs/heads"
solution does behave slightly better in a few cases. But if I understand
correctly, just giving "refs/" an executable bit would mean we retain
compatibility with the existing JGit implementation.

It does end up playing games with permissions, which I warned against,
but I think it might be tolerable:

  1. Unlike read vs write, there's not generally a reason users would
     need to separate read and execute.

  2. Portability-wise, POSIX permissions give us what we need, and
     non-POSIX systems tend to just say everything is executable.

So I could also live with that direction, if switching the JGit behavior
at this point is too much of a pain.

-Peff

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Han-Wen Nienhuys wrote (reply to this):

On Tue, Jan 28, 2020 at 8:31 AM Jeff King <peff@peff.net> wrote:
>
> >  {
> > -     const char *be_name = "files";
> > -     struct ref_storage_be *be = find_ref_storage_backend(be_name);
> > +     struct strbuf refs_path = STRBUF_INIT;
> > +
> > +        /* XXX this should probably come from a git config setting and not
> > +           default to reftable. */
> > +     const char *be_name = "reftable";
>
> I think the scheme here needs to be something like:
>
>   - "struct repository" gets a new "ref_format" field
>
>   - setup.c:check_repo_format() learns about an extensions.refFormat
>     config key, which we use to set repo->ref_format
>
>   - init/clone should take a command-line option for the ref format of
>     the new repository. Anybody choosing reftables would want to set
>     core.repositoryformatversion to "1" and set the extensions.refFormat
>     key.

I did this, but are you sure this works? Where would the
repo->ref_storage_format get set? I tried adding it to repo_init(),
but this doesn't get called in a normal startup sequence.

Breakpoint 3, ref_store_init (gitdir=0x555555884b70 ".git",
be_name=0x5555557942ca "reftable", flags=15) at refs.c:1841
warning: Source file is more recent than executable.
1841 {
(gdb) up
#1  0x00005555556de2c8 in get_main_ref_store (r=0x555555871d40
<the_repo>) at refs.c:1862
(gdb) p r->ref_storage_format
$1 = 0x0
}


I'm sending the revised patch series now.

-- 
Han-Wen Nienhuys - Google Munich
I work 80%. Don't expect answers from me on Fridays.
--

Google Germany GmbH, Erika-Mann-Strasse 33, 80636 Munich

Registergericht und -nummer: Hamburg, HRB 86891

Sitz der Gesellschaft: Hamburg

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Han-Wen Nienhuys wrote (reply to this):

On Tue, Feb 4, 2020 at 9:06 PM Jeff King <peff@peff.net> wrote:
>
> On Tue, Feb 04, 2020 at 07:54:12PM +0100, Han-Wen Nienhuys wrote:
>
> > > PS I don't know if it's set in stone yet, but if we're changing things
> > >    around, is it an option to put reftable-list inside the reftable
> > >    directory, just to keep the top-level uncluttered?
> >
> > I put up https://git.eclipse.org/r/c/157167/ to update the spec inside
> > JGit. Please review.
>
> That looks quite reasonable. The one change I'd make is to put
> "refs/heads/.invalid" into HEAD (instead of "refs/.invalid"). In my
> experiments, existing Git is happy to write into an invalid refname via
> "git update-ref HEAD". But by putting it past the non-directory "heads",
> such a write would always fail.

thanks, incorporated.

> It's also supposed to be a requirement that HEAD only puts to
> refs/heads/, though our validate_headref() does not enforce that. I
> don't know if any other implementations might, though.
>
> I did compare earlier your "make refs/heads a file" versus "make refs/ a
> file and just give it an executable bit". And I think the "refs/heads"
> solution does behave slightly better in a few cases. But if I understand
> correctly, just giving "refs/" an executable bit would mean we retain
> compatibility with the existing JGit implementation.

I hadn't looked at the JGit discovery in detail, but it looks like the
currently proposed approach works with JGit. (I haven't adapted JGit
for the new path setup yet.)

-- 
Han-Wen Nienhuys - Google Munich
I work 80%. Don't expect answers from me on Fridays.
--

Google Germany GmbH, Erika-Mann-Strasse 33, 80636 Munich

Registergericht und -nummer: Hamburg, HRB 86891

Sitz der Gesellschaft: Hamburg

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Jeff King wrote (reply to this):

On Tue, Feb 04, 2020 at 09:22:36PM +0100, Han-Wen Nienhuys wrote:

> >   - init/clone should take a command-line option for the ref format of
> >     the new repository. Anybody choosing reftables would want to set
> >     core.repositoryformatversion to "1" and set the extensions.refFormat
> >     key.
> 
> I did this, but are you sure this works? Where would the
> repo->ref_storage_format get set? I tried adding it to repo_init(),
> but this doesn't get called in a normal startup sequence.
> 
> Breakpoint 3, ref_store_init (gitdir=0x555555884b70 ".git",
> be_name=0x5555557942ca "reftable", flags=15) at refs.c:1841
> warning: Source file is more recent than executable.
> 1841 {
> (gdb) up
> #1  0x00005555556de2c8 in get_main_ref_store (r=0x555555871d40
> <the_repo>) at refs.c:1862
> (gdb) p r->ref_storage_format
> $1 = 0x0
> }

Hmm. Looks like repo_init() is only used for submodules. Which is pretty
horrid, because that setup sequence is basically duplicated between
there and setup_git_directory(), which is the traditional mechanism.
That's something that ought to be cleaned up, but it's definitely
outside the scope of your series.

The patch below was enough to make something like this work:

  git init --reftable repo
  cd repo
  git commit --allow-empty -m foo
  git for-each-ref

I had to make some tweaks to the init code, too:

 - setting the_repository->ref_storage_format based on the --reftable
   flag

 - we can still recognize a reinit by the presence of a HEAD file
   (whether it's a real one or a dummy). And we should call
   create_symref() in either case, either to create HEAD or to create
   the HEAD symlink inside reftables

I also fixed a bug in your first patch, where the creation of refs/
moves into files_init_db(). The strbuf needs reset there, or we just
append to it, and end up with a doubled path (you could see it by
running the same commands above without --reftable, but of course not
with your patch series as-is because it will _always_ choose reftable).

diff --git a/builtin/init-db.c b/builtin/init-db.c
index dcea74610c..ea2a333c4a 100644
--- a/builtin/init-db.c
+++ b/builtin/init-db.c
@@ -213,6 +213,8 @@ static int create_default_files(const char *template_path,
 	is_bare_repository_cfg = init_is_bare_repository;
 	if (init_shared_repository != -1)
 		set_shared_repository(init_shared_repository);
+	if (flags & INIT_DB_REFTABLE)
+		the_repository->ref_storage_format = xstrdup("reftable");
 
 	/*
 	 * We would have created the above under user's umask -- under
@@ -222,6 +224,15 @@ static int create_default_files(const char *template_path,
 		adjust_shared_perm(get_git_dir());
 	}
 
+	/*
+	 * Check to see if .git/HEAD exists; this must happen before
+	 * initializing the ref db, because we want to see if there is an
+	 * existing HEAD.
+	 */
+	path = git_path_buf(&buf, "HEAD");
+	reinit = (!access(path, R_OK) ||
+		  readlink(path, junk, sizeof(junk) - 1) != -1);
+
 	/*
 	 * We need to create a "refs" dir in any case so that older
 	 * versions of git can tell that this is a repository.
@@ -234,18 +245,14 @@ static int create_default_files(const char *template_path,
 	 * Create the default symlink from ".git/HEAD" to the "master"
 	 * branch, if it does not exist yet.
 	 */
-	if (flags & INIT_DB_REFTABLE) {
-		reinit = 0; /* XXX - how do we recognize a reinit,
-			     * and what should we do? */
-	} else {
-		path = git_path_buf(&buf, "HEAD");
-		reinit = (!access(path, R_OK) ||
-			  readlink(path, junk, sizeof(junk) - 1) != -1);
-	}
-
 	if (!reinit) {
 		if (create_symref("HEAD", "refs/heads/master", NULL) < 0)
 			exit(1);
+	} else {
+		/*
+		 * XXX should check whether our ref backend matches the
+		 * original one; if not, either die() or convert
+		 */
 	}
 
 	/* This forces creation of new config file */
diff --git a/refs.c b/refs.c
index 493d3fb673..d6ddbbcc67 100644
--- a/refs.c
+++ b/refs.c
@@ -1859,10 +1859,9 @@ struct ref_store *get_main_ref_store(struct repository *r)
 		BUG("attempting to get main_ref_store outside of repository");
 
 	r->refs = ref_store_init(r->gitdir,
-				 /* XXX r->ref_storage_format == NULL. Where
-				  * should the config file be parsed out? */
-				 r->ref_storage_format ? r->ref_storage_format :
-							 "reftable",
+				 r->ref_storage_format ?
+					r->ref_storage_format :
+					"files",
 				 REF_STORE_ALL_CAPS);
 	return r->refs;
 }
diff --git a/refs/files-backend.c b/refs/files-backend.c
index 0c53b246e8..6f9efde9ca 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -3166,6 +3166,7 @@ static int files_init_db(struct ref_store *ref_store, struct strbuf *err)
 	/*
 	 * Create .git/refs/{heads,tags}
 	 */
+	strbuf_reset(&sb);
 	files_ref_path(refs, &sb, "refs/heads");
 	safe_create_dir(sb.buf, 1);
 
diff --git a/repository.c b/repository.c
index aff47302c9..ff0988dac8 100644
--- a/repository.c
+++ b/repository.c
@@ -174,9 +174,7 @@ int repo_init(struct repository *repo,
 	if (worktree)
 		repo_set_worktree(repo, worktree);
 
-	repo->ref_storage_format = format.ref_storage != NULL ?
-					   xstrdup(format.ref_storage) :
-					   "files"; /* XXX */
+	repo->ref_storage_format = xstrdup_or_null(format.ref_storage);
 
 	clear_repository_format(&format);
 	return 0;
diff --git a/setup.c b/setup.c
index a339186d83..58c5cd3bf0 100644
--- a/setup.c
+++ b/setup.c
@@ -450,7 +450,7 @@ static int check_repo_format(const char *var, const char *value, void *vdata)
 			data->partial_clone = xstrdup(value);
 		} else if (!strcmp(ext, "worktreeconfig")) {
 			data->worktree_config = git_config_bool(var, value);
-		} else if (!strcmp(ext, "refStorage")) {
+		} else if (!strcmp(ext, "refstorage")) {
 			data->ref_storage = xstrdup(value);
 		} else
 			string_list_append(&data->unknown_extensions, ext);
@@ -1184,8 +1184,11 @@ const char *setup_git_directory_gently(int *nongit_ok)
 				gitdir = DEFAULT_GIT_DIR_ENVIRONMENT;
 			setup_git_env(gitdir);
 		}
-		if (startup_info->have_repository)
+		if (startup_info->have_repository) {
 			repo_set_hash_algo(the_repository, repo_fmt.hash_algo);
+			the_repository->ref_storage_format =
+				xstrdup_or_null(repo_fmt.ref_storage);
+		}
 	}
 
 	strbuf_release(&dir);

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, "brian m. carlson" wrote (reply to this):


--ILuaRSyQpoVaJ1HG
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On 2020-02-06 at 22:55:56, Han-Wen Nienhuys via GitGitGadget wrote:
> @@ -403,6 +417,10 @@ int init_db(const char *git_dir, const char *real_gi=
t_dir,
>  		git_config_set("receive.denyNonFastforwards", "true");
>  	}
> =20
> +	if (flags & INIT_DB_REFTABLE) {
> +		git_config_set("extensions.refStorage", "reftable");
> +	}

This does seem like the best way to do this.  Can we get an addition to
Documentation/technical/repository-version.txt that documents this
extension and the values it takes?  I presume "reftable" and "files" are
options, but it would be nice it folks didn't have to dig through the
code to find that out.

One side note: we typically omit the braces for a single-line if.

> +
>  	if (!(flags & INIT_DB_QUIET)) {
>  		int len =3D strlen(git_dir);
> =20
> @@ -481,15 +499,18 @@ int cmd_init_db(int argc, const char **argv, const =
char *prefix)
>  	const char *template_dir =3D NULL;
>  	unsigned int flags =3D 0;
>  	const struct option init_db_options[] =3D {
> -		OPT_STRING(0, "template", &template_dir, N_("template-directory"),
> -				N_("directory from which templates will be used")),
> +		OPT_STRING(0, "template", &template_dir,
> +			   N_("template-directory"),
> +			   N_("directory from which templates will be used")),
>  		OPT_SET_INT(0, "bare", &is_bare_repository_cfg,
> -				N_("create a bare repository"), 1),
> +			    N_("create a bare repository"), 1),
>  		{ OPTION_CALLBACK, 0, "shared", &init_shared_repository,
> -			N_("permissions"),
> -			N_("specify that the git repository is to be shared amongst several u=
sers"),
> -			PARSE_OPT_OPTARG | PARSE_OPT_NONEG, shared_callback, 0},
> +		  N_("permissions"),
> +		  N_("specify that the git repository is to be shared amongst several =
users"),
> +		  PARSE_OPT_OPTARG | PARSE_OPT_NONEG, shared_callback, 0 },
>  		OPT_BIT('q', "quiet", &flags, N_("be quiet"), INIT_DB_QUIET),
> +		OPT_BIT(0, "reftable", &flags, N_("use reftable"),
> +			INIT_DB_REFTABLE),

I wonder if this might be better as --refs-backend=3D{files|reftable}.  If
reftable becomes the default at some point in the future, it would be
easier to let people explicitly specify that they want a non-reftable
version.  If we learn support for a hypothetical reftable v2 in the
future, maybe we'd want to let folks specify that instead.

>  		OPT_STRING(0, "separate-git-dir", &real_git_dir, N_("gitdir"),
>  			   N_("separate git dir from working tree")),
>  		OPT_END()
> diff --git a/cache.h b/cache.h
> index cbfaead23a..929a61e861 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -625,6 +625,7 @@ int path_inside_repo(const char *prefix, const char *=
path);
> =20
>  #define INIT_DB_QUIET 0x0001
>  #define INIT_DB_EXIST_OK 0x0002
> +#define INIT_DB_REFTABLE 0x0004
> =20
>  int init_db(const char *git_dir, const char *real_git_dir,
>  	    const char *template_dir, unsigned int flags);
> @@ -1041,6 +1042,7 @@ struct repository_format {
>  	int is_bare;
>  	int hash_algo;
>  	char *work_tree;
> +	char *ref_storage;
>  	struct string_list unknown_extensions;
>  };
> =20
> diff --git a/refs.c b/refs.c
> index 1ab0bb54d3..e4e5af8ea0 100644
> --- a/refs.c
> +++ b/refs.c
> @@ -20,7 +20,7 @@
>  /*
>   * List of all available backends
>   */
> -static struct ref_storage_be *refs_backends =3D &refs_be_files;
> +static struct ref_storage_be *refs_backends =3D &refs_be_reftable;

I'm not sure why we're making this change.  It doesn't look like the
default is changing.

> @@ -1913,7 +1916,7 @@ struct ref_store *get_submodule_ref_store(const cha=
r *submodule)
>  		goto done;
> =20
>  	/* assume that add_submodule_odb() has been called */
> -	refs =3D ref_store_init(submodule_sb.buf,
> +	refs =3D ref_store_init(submodule_sb.buf, "files", /* XXX */
>  			      REF_STORE_READ | REF_STORE_ODB);
>  	register_ref_store_map(&submodule_ref_stores, "submodule",
>  			       refs, submodule);
> @@ -1927,6 +1930,7 @@ struct ref_store *get_submodule_ref_store(const cha=
r *submodule)
> =20
>  struct ref_store *get_worktree_ref_store(const struct worktree *wt)
>  {
> +	const char *format =3D "files"; /* XXX */

If the question is whether we still want to default to "files" as the
default, then yes, I think we do.  Suddenly changing things would be a
problem if for some reason someone needed to downgrade Git versions.

Since we have two instances of "files" above, it might be better to
create a #define like DEFAULT_REF_BACKEND or some such.
--=20
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204

--ILuaRSyQpoVaJ1HG
Content-Type: application/pgp-signature; name="signature.asc"

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.2.19 (GNU/Linux)

iQIzBAABCgAdFiEEX8OngXdrJt+H9ww3v1NdgR9S9osFAl48pgEACgkQv1NdgR9S
9oviTg/+O2H4ZwdThwGK5hajU0rVaRFzsStyFODwc9zydj65CD0GTYq4Ax16emwn
aE0O/Kq/63pf6fms+06Mf539C0RsPYf3Qjz/mgu+OqYiouciDlLg4nyRw6BHWuyF
lOngCoqk9FpOMi6gBo1zZ7ImgmqI1dDuWuenv4iyjpZj9mW03k55ySGO2axkdCzk
TrMDnGAqqL8u7I25s2kbuENZVwla4sp52FCHz4T2KRIMPXnGsjWoEpUerPaXf0Fp
pQi1nzSIJ7IzIT/sNdvkT1XHTGt5fIwC3dC+Gey749FU6H3TPlHNaKOVuh+AD776
E5Zj6Jv5wpfUrkf3SlSRmwMAZej4+WcHxnAa9zvBDgjXIaF3KUMdb4m/e5h7J/Yj
pmAw26ANZiEXh3vttHBuAWPW3RO8gIjk8Fb7MPAUF5OINobVPQXf46cf7GMd7r68
s9Ej+DJz4IHIZddSECBXcyJVeOuMx4Q6BqI/laFTQ/l3RYz37BjNTrCwG86qQqU6
4wzbTBwTwi6Zm1dO6XOvRtxT/dLnum1lVgUreBsWRAYFesRft84zsf0VCsWSOcht
2VswJz5Xb7AdmnKiQlFUr7Ri8od5I67CSX9gIcVp8Hd1ZWUaqhGHd94RFplPUU5P
HC+9FUYgZku4xotAA/s516cHCn4wx7pDaEVnL43YIbuAHYEJc6E=
=d1jO
-----END PGP SIGNATURE-----

--ILuaRSyQpoVaJ1HG--

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Han-Wen Nienhuys wrote (reply to this):

On Fri, Feb 7, 2020 at 12:49 AM brian m. carlson
<sandals@crustytoothpaste.net> wrote:
>
> On 2020-02-06 at 22:55:56, Han-Wen Nienhuys via GitGitGadget wrote:
> > @@ -403,6 +417,10 @@ int init_db(const char *git_dir, const char *real_git_dir,
> >               git_config_set("receive.denyNonFastforwards", "true");
> >       }
> >
> > +     if (flags & INIT_DB_REFTABLE) {
> > +             git_config_set("extensions.refStorage", "reftable");
> > +     }
>
> This does seem like the best way to do this.  Can we get an addition to
> Documentation/technical/repository-version.txt that documents this
> extension and the values it takes?  I presume "reftable" and "files" are
> options, but it would be nice it folks didn't have to dig through the
> code to find that out.

done.

> I wonder if this might be better as --refs-backend={files|reftable}.  If
> reftable becomes the default at some point in the future, it would be
> easier to let people explicitly specify that they want a non-reftable
> version.  If we learn support for a hypothetical reftable v2 in the
> future, maybe we'd want to let folks specify that instead.

done.

> > -static struct ref_storage_be *refs_backends = &refs_be_files;
> > +static struct ref_storage_be *refs_backends = &refs_be_reftable;
>
> I'm not sure why we're making this change.  It doesn't look like the
> default is changing.

This is adding the reftable backend to the ref_storage_be linked list.

>
> > @@ -1913,7 +1916,7 @@ struct ref_store *get_submodule_ref_store(const char *submodule)
> >               goto done;
> >
> >       /* assume that add_submodule_odb() has been called */
> > -     refs = ref_store_init(submodule_sb.buf,
> > +     refs = ref_store_init(submodule_sb.buf, "files", /* XXX */
> >                             REF_STORE_READ | REF_STORE_ODB);
> >       register_ref_store_map(&submodule_ref_stores, "submodule",
> >                              refs, submodule);
> > @@ -1927,6 +1930,7 @@ struct ref_store *get_submodule_ref_store(const char *submodule)
> >
> >  struct ref_store *get_worktree_ref_store(const struct worktree *wt)
> >  {
> > +     const char *format = "files"; /* XXX */
>
> If the question is whether we still want to default to "files" as the
> default, then yes, I think we do.  Suddenly changing things would be a
> problem if for some reason someone needed to downgrade Git versions.
>
> Since we have two instances of "files" above, it might be better to
> create a #define like DEFAULT_REF_BACKEND or some such.

I did this. I stuck it in refs.h which is probably the wrong place.
Suggestions welcome.

-- 
Han-Wen Nienhuys - Google Munich
I work 80%. Don't expect answers from me on Fridays.
--

Google Germany GmbH, Erika-Mann-Strasse 33, 80636 Munich

Registergericht und -nummer: Hamburg, HRB 86891

Sitz der Gesellschaft: Hamburg

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado

XDIFF_LIB = xdiff/lib.a
REFTABLE_LIB = reftable/libreftable.a
REFTABLE_TEST_LIB = reftable/libreftable_test.a

GENERATED_H += command-list.h
GENERATED_H += config-list.h
Expand Down Expand Up @@ -964,6 +967,7 @@ LIB_OBJS += reflog-walk.o
LIB_OBJS += refs.o
LIB_OBJS += refs/debug.o
LIB_OBJS += refs/files-backend.o
LIB_OBJS += refs/reftable-backend.o
LIB_OBJS += refs/iterator.o
LIB_OBJS += refs/packed-backend.o
LIB_OBJS += refs/ref-cache.o
Expand Down Expand Up @@ -1172,7 +1176,7 @@ THIRD_PARTY_SOURCES += compat/regex/%
THIRD_PARTY_SOURCES += sha1collisiondetection/%
THIRD_PARTY_SOURCES += sha1dc/%

GITLIBS = common-main.o $(LIB_FILE) $(XDIFF_LIB)
GITLIBS = common-main.o $(LIB_FILE) $(XDIFF_LIB) $(REFTABLE_LIB)
EXTLIBS =

GIT_USER_AGENT = git/$(GIT_VERSION)
Expand Down Expand Up @@ -2379,10 +2383,40 @@ XDIFF_OBJS += xdiff/xpatience.o
XDIFF_OBJS += xdiff/xprepare.o
XDIFF_OBJS += xdiff/xutils.o

REFTABLE_OBJS += reftable/basics.o
REFTABLE_OBJS += reftable/error.o
REFTABLE_OBJS += reftable/block.o
REFTABLE_OBJS += reftable/blocksource.o
REFTABLE_OBJS += reftable/iter.o
REFTABLE_OBJS += reftable/merged.o
REFTABLE_OBJS += reftable/pq.o
REFTABLE_OBJS += reftable/publicbasics.o
REFTABLE_OBJS += reftable/reader.o
REFTABLE_OBJS += reftable/record.o
REFTABLE_OBJS += reftable/refname.o
REFTABLE_OBJS += reftable/reftable.o
REFTABLE_OBJS += reftable/stack.o
REFTABLE_OBJS += reftable/tree.o
REFTABLE_OBJS += reftable/writer.o
REFTABLE_OBJS += reftable/zlib-compat.o

REFTABLE_TEST_OBJS += reftable/basics_test.o
REFTABLE_TEST_OBJS += reftable/block_test.o
REFTABLE_TEST_OBJS += reftable/dump.o
REFTABLE_TEST_OBJS += reftable/merged_test.o
REFTABLE_TEST_OBJS += reftable/record_test.o
REFTABLE_TEST_OBJS += reftable/refname_test.o
REFTABLE_TEST_OBJS += reftable/reftable_test.o
REFTABLE_TEST_OBJS += reftable/stack_test.o
REFTABLE_TEST_OBJS += reftable/test_framework.o
REFTABLE_TEST_OBJS += reftable/tree_test.o

TEST_OBJS := $(patsubst %$X,%.o,$(TEST_PROGRAMS)) $(patsubst %,t/helper/%,$(TEST_BUILTINS_OBJS))
OBJECTS := $(LIB_OBJS) $(BUILTIN_OBJS) $(PROGRAM_OBJS) $(TEST_OBJS) \
$(XDIFF_OBJS) \
$(FUZZ_OBJS) \
$(REFTABLE_OBJS) \
$(REFTABLE_TEST_OBJS) \
common-main.o \
git.o
ifndef NO_CURL
Expand Down Expand Up @@ -2534,6 +2568,12 @@ $(LIB_FILE): $(LIB_OBJS)
$(XDIFF_LIB): $(XDIFF_OBJS)
$(QUIET_AR)$(RM) $@ && $(AR) $(ARFLAGS) $@ $^

$(REFTABLE_LIB): $(REFTABLE_OBJS)
$(QUIET_AR)$(RM) $@ && $(AR) $(ARFLAGS) $@ $^

$(REFTABLE_TEST_LIB): $(REFTABLE_TEST_OBJS)
$(QUIET_AR)$(RM) $@ && $(AR) $(ARFLAGS) $@ $^

export DEFAULT_EDITOR DEFAULT_PAGER

Documentation/GIT-EXCLUDED-PROGRAMS: FORCE
Expand Down Expand Up @@ -2812,7 +2852,7 @@ perf: all

t/helper/test-tool$X: $(patsubst %,t/helper/%,$(TEST_BUILTINS_OBJS))

t/helper/test-%$X: t/helper/test-%.o GIT-LDFLAGS $(GITLIBS)
t/helper/test-%$X: t/helper/test-%.o GIT-LDFLAGS $(GITLIBS) $(REFTABLE_TEST_LIB)
$(QUIET_LINK)$(CC) $(ALL_CFLAGS) -o $@ $(ALL_LDFLAGS) $(filter %.o,$^) $(filter %.a,$^) $(LIBS)

check-sha1:: t/helper/test-tool$X
Expand Down Expand Up @@ -3142,7 +3182,7 @@ cocciclean:
clean: profile-clean coverage-clean cocciclean
$(RM) *.res
$(RM) $(OBJECTS)
$(RM) $(LIB_FILE) $(XDIFF_LIB)
$(RM) $(LIB_FILE) $(XDIFF_LIB) $(REFTABLE_LIB) $(REFTABLE_TEST_LIB)
$(RM) $(ALL_PROGRAMS) $(SCRIPT_LIB) $(BUILT_INS) git$X
$(RM) $(TEST_PROGRAMS)
$(RM) $(FUZZ_PROGRAMS)
Expand Down
5 changes: 3 additions & 2 deletions builtin/clone.c
Original file line number Diff line number Diff line change
Expand Up @@ -1139,7 +1139,7 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
}

init_db(git_dir, real_git_dir, option_template, GIT_HASH_UNKNOWN, NULL,
INIT_DB_QUIET);
default_ref_storage(), INIT_DB_QUIET);

if (real_git_dir)
git_dir = real_git_dir;
Expand Down Expand Up @@ -1277,7 +1277,8 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
* Now that we know what algorithm the remote side is using,
* let's set ours to the same thing.
*/
initialize_repository_version(hash_algo, 1);
initialize_repository_version(hash_algo, 1,
default_ref_storage());
repo_set_hash_algo(the_repository, hash_algo);

mapped_refs = wanted_peer_refs(refs, &remote->fetch);
Expand Down
76 changes: 62 additions & 14 deletions builtin/init-db.c
Original file line number Diff line number Diff line change
Expand Up @@ -179,12 +179,14 @@ static int needs_work_tree_config(const char *git_dir, const char *work_tree)
return 1;
}

void initialize_repository_version(int hash_algo, int reinit)
void initialize_repository_version(int hash_algo, int reinit,
const char *ref_storage_format)
{
char repo_version_string[10];
int repo_version = GIT_REPO_VERSION;

if (hash_algo != GIT_HASH_SHA1)
if (hash_algo != GIT_HASH_SHA1 ||
!strcmp(ref_storage_format, "reftable"))
repo_version = GIT_REPO_VERSION_READ;

/* This forces creation of new config file */
Expand Down Expand Up @@ -238,6 +240,7 @@ static int create_default_files(const char *template_path,
is_bare_repository_cfg = init_is_bare_repository;
if (init_shared_repository != -1)
set_shared_repository(init_shared_repository);
the_repository->ref_storage_format = xstrdup(fmt->ref_storage);

/*
* We would have created the above under user's umask -- under
Expand All @@ -247,6 +250,24 @@ static int create_default_files(const char *template_path,
adjust_shared_perm(get_git_dir());
}

/*
* Check to see if .git/HEAD exists; this must happen before
* initializing the ref db, because we want to see if there is an
* existing HEAD.
*/
path = git_path_buf(&buf, "HEAD");
reinit = (!access(path, R_OK) ||
readlink(path, junk, sizeof(junk) - 1) != -1);

/*
* refs/heads is a file when using reftable. We can't reinitialize with
* a reftable because it will overwrite HEAD
*/
if (reinit && (!strcmp(fmt->ref_storage, "reftable")) ==
is_directory(git_path_buf(&buf, "refs/heads"))) {
die("cannot switch ref storage format.");
}

/*
* We need to create a "refs" dir in any case so that older
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Han-Wen Nienhuys via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Han-Wen Nienhuys <hanwen@google.com>
> Subject: Re: [PATCH v2 2/5] create .git/refs in files-backend.c
>
> This prepares for supporting the reftable format, which creates a file
> in that place.

The idea is sound, I think.  We want to let each backend to be
responsible for creating and maintaining what is at .git/refs on the
filesystem.

> Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>


> Change-Id: I2fc47c89f5ec605734007ceff90321c02474aa92

Do we need to keep this, which is pretty much private name for the
patch that is not valid for most of the people on the list?

> ---
>  builtin/init-db.c    | 2 --
>  refs/files-backend.c | 4 ++++
>  2 files changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/builtin/init-db.c b/builtin/init-db.c
> index 944ec77fe1..45bdea0589 100644
> --- a/builtin/init-db.c
> +++ b/builtin/init-db.c
> @@ -226,8 +226,6 @@ static int create_default_files(const char *template_path,
>  	 * We need to create a "refs" dir in any case so that older
>  	 * versions of git can tell that this is a repository.
>  	 */
> -	safe_create_dir(git_path("refs"), 1);
> -	adjust_shared_perm(git_path("refs"));
>  
>  	if (refs_init_db(&err))
>  		die("failed to set up refs db: %s", err.buf);
> diff --git a/refs/files-backend.c b/refs/files-backend.c
> index 0ea66a28b6..f49b6f2ab6 100644
> --- a/refs/files-backend.c
> +++ b/refs/files-backend.c
> @@ -3158,6 +3158,10 @@ static int files_init_db(struct ref_store *ref_store, struct strbuf *err)
>  		files_downcast(ref_store, REF_STORE_WRITE, "init_db");
>  	struct strbuf sb = STRBUF_INIT;
>  
> +	files_ref_path(refs, &sb, "refs");
> +	safe_create_dir(sb.buf, 1);
> + 	// XXX adjust_shared_perm ?

I am not sure what's there to wonder about with the question mark.

If this step is meant to be a preparation before we actually allow
a different backend to be used, shouldn't the updated and prepared
code behave identically in externally visible ways?

Thanks.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Han-Wen Nienhuys wrote (reply to this):

On Mon, Jan 27, 2020 at 11:28 PM Junio C Hamano <gitster@pobox.com> wrote:

> > Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
>
>
> > Change-Id: I2fc47c89f5ec605734007ceff90321c02474aa92
>
> Do we need to keep this, which is pretty much private name for the
> patch that is not valid for most of the people on the list?

No, I can make do with like gitgitgadget.

> > -     safe_create_dir(git_path("refs"), 1);
> > -     adjust_shared_perm(git_path("refs"));
> >
> >       if (refs_init_db(&err))
> >               die("failed to set up refs db: %s", err.buf);
> > diff --git a/refs/files-backend.c b/refs/files-backend.c
> > index 0ea66a28b6..f49b6f2ab6 100644
> > --- a/refs/files-backend.c
> > +++ b/refs/files-backend.c
> > @@ -3158,6 +3158,10 @@ static int files_init_db(struct ref_store *ref_store, struct strbuf *err)
> >               files_downcast(ref_store, REF_STORE_WRITE, "init_db");
> >       struct strbuf sb = STRBUF_INIT;
> >
> > +     files_ref_path(refs, &sb, "refs");
> > +     safe_create_dir(sb.buf, 1);
> > +     // XXX adjust_shared_perm ?
>
> I am not sure what's there to wonder about with the question mark.

I forgot why I put the XXX, but note that safe_create_dirs runs
adjust_shared_perms implicitly.

-- 
Han-Wen Nienhuys - Google Munich
I work 80%. Don't expect answers from me on Fridays.
--

Google Germany GmbH, Erika-Mann-Strasse 33, 80636 Munich

Registergericht und -nummer: Hamburg, HRB 86891

Sitz der Gesellschaft: Hamburg

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

Han-Wen Nienhuys <hanwen@google.com> writes:

>> > +     files_ref_path(refs, &sb, "refs");
>> > +     safe_create_dir(sb.buf, 1);
>> > +     // XXX adjust_shared_perm ?
>
> I forgot why I put the XXX, but note that safe_create_dirs runs
> adjust_shared_perms implicitly.

That piece of information would make an excellent part of the log
message, which is to describe why "questionable" changes are made.

"git init" is designed to be runnable in an existing repository, but
safe_create_dir() would not run adjust_shared_perm() when mkdir()
says EEXIST, and I think that is why the original makes a call to
adjust_shared_perm() after safe_create_dir() returns.

Thanks.


Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Han-Wen Nienhuys via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Han-Wen Nienhuys <hanwen@google.com>
>
> This prepares for supporting the reftable format, which will want
> create its own file system layout in .git
>
> Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
> ---
>  builtin/init-db.c    | 2 --
>  refs/files-backend.c | 5 +++++
>  2 files changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/builtin/init-db.c b/builtin/init-db.c
> index 944ec77fe1..45bdea0589 100644
> --- a/builtin/init-db.c
> +++ b/builtin/init-db.c
> @@ -226,8 +226,6 @@ static int create_default_files(const char *template_path,
>  	 * We need to create a "refs" dir in any case so that older
>  	 * versions of git can tell that this is a repository.
>  	 */
> -	safe_create_dir(git_path("refs"), 1);
> -	adjust_shared_perm(git_path("refs"));
>  
>  	if (refs_init_db(&err))
>  		die("failed to set up refs db: %s", err.buf);
> diff --git a/refs/files-backend.c b/refs/files-backend.c
> index 0ea66a28b6..0c53b246e8 100644
> --- a/refs/files-backend.c
> +++ b/refs/files-backend.c
> @@ -3158,6 +3158,11 @@ static int files_init_db(struct ref_store *ref_store, struct strbuf *err)
>  		files_downcast(ref_store, REF_STORE_WRITE, "init_db");
>  	struct strbuf sb = STRBUF_INIT;
>  
> +	files_ref_path(refs, &sb, "refs");

Have you run "git diff --check" before committing?

> +	safe_create_dir(sb.buf, 1);
> +        /* adjust permissions even if directory already exists. */

whitespace error here, but this comment is really appreciated.  It
wasn't immediately clear why we do this again in the original.

> +	adjust_shared_perm(sb.buf);
> +
>  	/*
>  	 * Create .git/refs/{heads,tags}
>  	 */

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Han-Wen Nienhuys wrote (reply to this):

On Tue, Feb 4, 2020 at 10:29 PM Junio C Hamano <gitster@pobox.com> wrote:
> > +     files_ref_path(refs, &sb, "refs");
>
> Have you run "git diff --check" before committing?

let me install the clang-format pre-commit hook.

-- 
Han-Wen Nienhuys - Google Munich
I work 80%. Don't expect answers from me on Fridays.
--

Google Germany GmbH, Erika-Mann-Strasse 33, 80636 Munich

Registergericht und -nummer: Hamburg, HRB 86891

Sitz der Gesellschaft: Hamburg

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, SZEDER Gábor wrote (reply to this):

On Tue, Feb 04, 2020 at 08:27:37PM +0000, Han-Wen Nienhuys via GitGitGadget wrote:
> From: Han-Wen Nienhuys <hanwen@google.com>
> 
> This prepares for supporting the reftable format, which will want
> create its own file system layout in .git

This breaks 'git init', and, consequently, the whole test suite:

  $ ./git init /tmp/foo
  /tmp/foo/.git/refs/tmp/foo/.git/refs/heads: No such file or directory


>  builtin/init-db.c    | 2 --
>  refs/files-backend.c | 5 +++++
>  2 files changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/builtin/init-db.c b/builtin/init-db.c
> index 944ec77fe1..45bdea0589 100644
> --- a/builtin/init-db.c
> +++ b/builtin/init-db.c
> @@ -226,8 +226,6 @@ static int create_default_files(const char *template_path,
>  	 * We need to create a "refs" dir in any case so that older
>  	 * versions of git can tell that this is a repository.
>  	 */
> -	safe_create_dir(git_path("refs"), 1);
> -	adjust_shared_perm(git_path("refs"));
>  
>  	if (refs_init_db(&err))
>  		die("failed to set up refs db: %s", err.buf);
> diff --git a/refs/files-backend.c b/refs/files-backend.c
> index 0ea66a28b6..0c53b246e8 100644
> --- a/refs/files-backend.c
> +++ b/refs/files-backend.c
> @@ -3158,6 +3158,11 @@ static int files_init_db(struct ref_store *ref_store, struct strbuf *err)
>  		files_downcast(ref_store, REF_STORE_WRITE, "init_db");
>  	struct strbuf sb = STRBUF_INIT;
>  
> +	files_ref_path(refs, &sb, "refs");
> +	safe_create_dir(sb.buf, 1);
> +        /* adjust permissions even if directory already exists. */
> +	adjust_shared_perm(sb.buf);
> +
>  	/*
>  	 * Create .git/refs/{heads,tags}
>  	 */
> -- 
> gitgitgadget
> 

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Jeff King wrote (reply to this):

On Wed, Feb 05, 2020 at 12:42:10PM +0100, SZEDER Gábor wrote:

> On Tue, Feb 04, 2020 at 08:27:37PM +0000, Han-Wen Nienhuys via GitGitGadget wrote:
> > From: Han-Wen Nienhuys <hanwen@google.com>
> > 
> > This prepares for supporting the reftable format, which will want
> > create its own file system layout in .git
> 
> This breaks 'git init', and, consequently, the whole test suite:
> 
>   $ ./git init /tmp/foo
>   /tmp/foo/.git/refs/tmp/foo/.git/refs/heads: No such file or directory

Yeah, this is one of the fixes in the patch I sent earlier in the
thread. The issue is here:

> > @@ -3158,6 +3158,11 @@ static int files_init_db(struct ref_store *ref_store, struct strbuf *err)
> >  		files_downcast(ref_store, REF_STORE_WRITE, "init_db");
> >  	struct strbuf sb = STRBUF_INIT;
> >  
> > +	files_ref_path(refs, &sb, "refs");
> > +	safe_create_dir(sb.buf, 1);
> > +        /* adjust permissions even if directory already exists. */
> > +	adjust_shared_perm(sb.buf);
> > +
> >  	/*
> >  	 * Create .git/refs/{heads,tags}
> >  	 */

Right after this context, we call files_ref_path() with "sb" again.
There needs to be a strbuf_reset() beforehand.

-Peff

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Emily Shaffer wrote (reply to this):

On Mon, Apr 27, 2020 at 08:13:29PM +0000, Han-Wen Nienhuys via GitGitGadget wrote:
> 
> 
> This prepares for supporting the reftable format, which will want
> create its own file system layout in .git
> 
> Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
> ---
>  builtin/init-db.c    | 2 --
>  refs/files-backend.c | 6 ++++++
>  2 files changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/builtin/init-db.c b/builtin/init-db.c
> index 0b7222e7188..3b50b1aa0e5 100644
> --- a/builtin/init-db.c
> +++ b/builtin/init-db.c
> @@ -251,8 +251,6 @@ static int create_default_files(const char *template_path,
>  	 * We need to create a "refs" dir in any case so that older
>  	 * versions of git can tell that this is a repository.
>  	 */
> -	safe_create_dir(git_path("refs"), 1);
> -	adjust_shared_perm(git_path("refs"));

Is the reftable completely replacing the refs/ dir? Or is the idea that
the refs/ dir is only used by the files backend? The commit message
makes it sound like it's an additional format to support, so I'm a
little confused. Why does the other currently-existing backend not need
the refs/ dir at this stage?

>  
>  	if (refs_init_db(&err))
>  		die("failed to set up refs db: %s", err.buf);
> diff --git a/refs/files-backend.c b/refs/files-backend.c
> index 561c33ac8a9..ab7899a9c77 100644
> --- a/refs/files-backend.c
> +++ b/refs/files-backend.c
> @@ -3157,9 +3157,15 @@ static int files_init_db(struct ref_store *ref_store, struct strbuf *err)
>  		files_downcast(ref_store, REF_STORE_WRITE, "init_db");
>  	struct strbuf sb = STRBUF_INIT;
>  
> +	files_ref_path(refs, &sb, "refs");
> +	safe_create_dir(sb.buf, 1);
> +	/* adjust permissions even if directory already exists. */
> +	adjust_shared_perm(sb.buf);
> +
>  	/*
>  	 * Create .git/refs/{heads,tags}
>  	 */
> +	strbuf_reset(&sb);
>  	files_ref_path(refs, &sb, "refs/heads");
>  	safe_create_dir(sb.buf, 1);
>  
> -- 
> gitgitgadget
> 

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

Emily Shaffer <emilyshaffer@google.com> writes:

>> -	safe_create_dir(git_path("refs"), 1);
>> -	adjust_shared_perm(git_path("refs"));
>
> Is the reftable completely replacing the refs/ dir? Or is the idea that
> the refs/ dir is only used by the files backend? The commit message
> makes it sound like it's an additional format to support, so I'm a
> little confused. Why does the other currently-existing backend not need
> the refs/ dir at this stage?

Other current backend being the files-backend, which creates it
below, isn't this a no-op at this stage?

What's more interesting is what would happen when a repo is
initialized with the reftable backend as its sole backend.  

Even then, the repository discovery code of existing versions of Git
needs to see a directory with a ".git" subdirectory, in the latter
of which there must be HEAD and objects and refs subdirectories, so
we'd need to create refs/ directory even if reftable does not use it.

This step, from that point of view, may not be necessary.  But as
long as we make sure any repository creates refs/ and objects/
directories, regardless of the ref backend, we'd be OK.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Han-Wen Nienhuys wrote (reply to this):

On Thu, Apr 30, 2020 at 11:24 PM Emily Shaffer <emilyshaffer@google.com> wrote:
>
> On Mon, Apr 27, 2020 at 08:13:29PM +0000, Han-Wen Nienhuys via GitGitGadget wrote:
> >
> >
> > This prepares for supporting the reftable format, which will want
> > create its own file system layout in .git
> >
> > Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
> > ---
> >  builtin/init-db.c    | 2 --
> >  refs/files-backend.c | 6 ++++++
> >  2 files changed, 6 insertions(+), 2 deletions(-)
> >
> > diff --git a/builtin/init-db.c b/builtin/init-db.c
> > index 0b7222e7188..3b50b1aa0e5 100644
> > --- a/builtin/init-db.c
> > +++ b/builtin/init-db.c
> > @@ -251,8 +251,6 @@ static int create_default_files(const char *template_path,
> >        * We need to create a "refs" dir in any case so that older
> >        * versions of git can tell that this is a repository.
> >        */
> > -     safe_create_dir(git_path("refs"), 1);
> > -     adjust_shared_perm(git_path("refs"));
>
> Is the reftable completely replacing the refs/ dir? Or is the idea that
> the refs/ dir is only used by the files backend? The commit message
> makes it sound like it's an additional format to support, so I'm a
> little confused. Why does the other currently-existing backend not need
> the refs/ dir at this stage?

I've dropped this patch from the series; it probably was from before
the compat hack that is described in

  https://git.eclipse.org/r/#/c/157167/

for further clarification: reftable doesn't used the refs/ dir at all,
but we still want to not confuse older git versions.

-- 
Han-Wen Nienhuys - Google Munich
I work 80%. Don't expect answers from me on Fridays.
--

Google Germany GmbH, Erika-Mann-Strasse 33, 80636 Munich

Registergericht und -nummer: Hamburg, HRB 86891

Sitz der Gesellschaft: Hamburg

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado

* versions of git can tell that this is a repository.
Expand All @@ -261,9 +282,6 @@ static int create_default_files(const char *template_path,
* Point the HEAD symref to the initial branch with if HEAD does
* not yet exist.
*/
path = git_path_buf(&buf, "HEAD");
reinit = (!access(path, R_OK)
|| readlink(path, junk, sizeof(junk)-1) != -1);
if (!reinit) {
char *ref;

Expand All @@ -280,7 +298,7 @@ static int create_default_files(const char *template_path,
free(ref);
}

initialize_repository_version(fmt->hash_algo, 0);
initialize_repository_version(fmt->hash_algo, 0, fmt->ref_storage);

/* Check filemode trustability */
path = git_path_buf(&buf, "config");
Expand Down Expand Up @@ -396,7 +414,7 @@ static void validate_hash_algorithm(struct repository_format *repo_fmt, int hash

int init_db(const char *git_dir, const char *real_git_dir,
const char *template_dir, int hash, const char *initial_branch,
unsigned int flags)
const char *ref_storage_format, unsigned int flags)
{
int reinit;
int exist_ok = flags & INIT_DB_EXIST_OK;
Expand Down Expand Up @@ -435,9 +453,31 @@ int init_db(const char *git_dir, const char *real_git_dir,
* is an attempt to reinitialize new repository with an old tool.
*/
check_repository_format(&repo_fmt);
repo_fmt.ref_storage = xstrdup(ref_storage_format);

validate_hash_algorithm(&repo_fmt, hash);

/*
* At this point, the_repository we have in-core does not look
* anything like one that we would see initialized in an already
* working repository after calling setup_git_directory().
*
* Calling repository.c::initialize_the_repository() may have
* prepared the .index .objects and .parsed_objects members, but
* other members like .gitdir, .commondir, etc. have not been
* initialized.
*
* Many API functions assume they are working with the_repository
* that has sensibly been initialized, but because we haven't
* really read from an existing repository, we need to hand-craft
* the necessary members of the structure to get out of this
* chicken-and-egg situation.
*
* For now, we update the hash algorithm member to what the
* validate_hash_algorithm() call decided for us.
*/
repo_set_hash_algo(the_repository, repo_fmt.hash_algo);

reinit = create_default_files(template_dir, original_git_dir,
initial_branch, &repo_fmt,
flags & INIT_DB_QUIET);
Expand Down Expand Up @@ -468,6 +508,9 @@ int init_db(const char *git_dir, const char *real_git_dir,
git_config_set("receive.denyNonFastforwards", "true");
}

if (!strcmp(ref_storage_format, "reftable"))
git_config_set("extensions.refStorage", ref_storage_format);

if (!(flags & INIT_DB_QUIET)) {
int len = strlen(git_dir);

Expand Down Expand Up @@ -541,6 +584,7 @@ static const char *const init_db_usage[] = {
int cmd_init_db(int argc, const char **argv, const char *prefix)
{
const char *git_dir;
const char *ref_storage_format = default_ref_storage();
const char *real_git_dir = NULL;
const char *work_tree;
const char *template_dir = NULL;
Expand All @@ -549,15 +593,18 @@ int cmd_init_db(int argc, const char **argv, const char *prefix)
const char *initial_branch = NULL;
int hash_algo = GIT_HASH_UNKNOWN;
const struct option init_db_options[] = {
OPT_STRING(0, "template", &template_dir, N_("template-directory"),
N_("directory from which templates will be used")),
OPT_STRING(0, "template", &template_dir,
N_("template-directory"),
N_("directory from which templates will be used")),
OPT_SET_INT(0, "bare", &is_bare_repository_cfg,
N_("create a bare repository"), 1),
N_("create a bare repository"), 1),
{ OPTION_CALLBACK, 0, "shared", &init_shared_repository,
N_("permissions"),
N_("specify that the git repository is to be shared amongst several users"),
PARSE_OPT_OPTARG | PARSE_OPT_NONEG, shared_callback, 0},
N_("permissions"),
N_("specify that the git repository is to be shared amongst several users"),
PARSE_OPT_OPTARG | PARSE_OPT_NONEG, shared_callback, 0 },
OPT_BIT('q', "quiet", &flags, N_("be quiet"), INIT_DB_QUIET),
OPT_STRING(0, "ref-storage", &ref_storage_format, N_("backend"),
N_("the ref storage format to use")),
OPT_STRING(0, "separate-git-dir", &real_git_dir, N_("gitdir"),
N_("separate git dir from working tree")),
OPT_STRING('b', "initial-branch", &initial_branch, N_("name"),
Expand Down Expand Up @@ -698,10 +745,11 @@ int cmd_init_db(int argc, const char **argv, const char *prefix)
}

UNLEAK(real_git_dir);
UNLEAK(ref_storage_format);
UNLEAK(git_dir);
UNLEAK(work_tree);

flags |= INIT_DB_EXIST_OK;
return init_db(git_dir, real_git_dir, template_dir, hash_algo,
initial_branch, flags);
initial_branch, ref_storage_format, flags);
}
27 changes: 24 additions & 3 deletions builtin/worktree.c
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@
#include "utf8.h"
#include "worktree.h"
#include "quote.h"
#include "../refs/refs-internal.h"

static const char * const worktree_usage[] = {
N_("git worktree add [<options>] <path> [<commit-ish>]"),
Expand Down Expand Up @@ -330,9 +331,29 @@ static int add_worktree(const char *path, const char *refname,
* worktree.
*/
strbuf_reset(&sb);
strbuf_addf(&sb, "%s/HEAD", sb_repo.buf);
write_file(sb.buf, "%s", oid_to_hex(&null_oid));
strbuf_reset(&sb);
if (get_main_ref_store(the_repository)->be == &refs_be_reftable) {
/* XXX this is cut & paste from reftable_init_db. */
strbuf_addf(&sb, "%s/HEAD", sb_repo.buf);
write_file(sb.buf, "%s", "ref: refs/heads/.invalid\n");
strbuf_reset(&sb);

strbuf_addf(&sb, "%s/refs", sb_repo.buf);
safe_create_dir(sb.buf, 1);
strbuf_reset(&sb);

strbuf_addf(&sb, "%s/refs/heads", sb_repo.buf);
write_file(sb.buf, "this repository uses the reftable format");
strbuf_reset(&sb);

strbuf_addf(&sb, "%s/reftable", sb_repo.buf);
safe_create_dir(sb.buf, 1);
strbuf_reset(&sb);
} else {
strbuf_addf(&sb, "%s/HEAD", sb_repo.buf);
write_file(sb.buf, "%s", oid_to_hex(&null_oid));
strbuf_reset(&sb);
}

strbuf_addf(&sb, "%s/commondir", sb_repo.buf);
write_file(sb.buf, "../..");

Expand Down
8 changes: 5 additions & 3 deletions cache.h
Original file line number Diff line number Diff line change
Expand Up @@ -629,9 +629,10 @@ int path_inside_repo(const char *prefix, const char *path);
#define INIT_DB_EXIST_OK 0x0002

int init_db(const char *git_dir, const char *real_git_dir,
const char *template_dir, int hash_algo,
const char *initial_branch, unsigned int flags);
void initialize_repository_version(int hash_algo, int reinit);
const char *template_dir, int hash_algo, const char *initial_branch,
const char *ref_storage_format, unsigned int flags);
void initialize_repository_version(int hash_algo, int reinit,
const char *ref_storage_format);

void sanitize_stdfds(void);
int daemonize(void);
Expand Down Expand Up @@ -1045,6 +1046,7 @@ struct repository_format {
int is_bare;
int hash_algo;
char *work_tree;
char *ref_storage;
struct string_list unknown_extensions;
struct string_list v1_only_extensions;
};
Expand Down
2 changes: 1 addition & 1 deletion config.mak.uname
Original file line number Diff line number Diff line change
Expand Up @@ -699,7 +699,7 @@ vcxproj:
# Make .vcxproj files and add them
unset QUIET_GEN QUIET_BUILT_IN; \
perl contrib/buildsystems/generate -g Vcxproj
git add -f git.sln {*,*/lib,t/helper/*}/*.vcxproj
git add -f git.sln {*,*/lib,*/libreftable,t/helper/*}/*.vcxproj

# Generate the LinkOrCopyBuiltins.targets and LinkOrCopyRemoteHttp.targets file
(echo '<Project xmlns="http://schemas.microsoft.com/developer/msbuild/2003">' && \
Expand Down
Loading