-
Notifications
You must be signed in to change notification settings - Fork 143
cat-file: add %(objectmode) and submodule message to batch commands #1929
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -307,6 +307,11 @@ newline. The available atoms are: | |
`objecttype`:: | ||
The type of the object (the same as `cat-file -t` reports). | ||
|
||
`objectmode`:: | ||
If the specified object has mode information (such as a tree or | ||
index entry), the mode expressed as an octal integer. Otherwise, | ||
empty string. | ||
|
||
`objectsize`:: | ||
The size, in bytes, of the object (the same as `cat-file -s` | ||
reports). | ||
|
@@ -368,6 +373,14 @@ If a name is specified that might refer to more than one object (an ambiguous sh | |
<object> SP ambiguous LF | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. On the Git mailing list, Jeff King wrote (reply to this): On Mon, Jun 02, 2025 at 06:55:55PM +0000, Victoria Dye via GitGitGadget wrote:
> To disambiguate without needing to invoke a separate Git process (e.g.
> 'ls-tree'), print the message "<oid> submodule" for such objects instead of
> "<object> missing". In addition to the change from "missing" to "submodule",
> the new message differs from the old in that it always prints the resolved
> tree entry's OID, rather than the input object specification.
OK. I read over the discussion from last year, which I think mostly
centered around this patch. I do still think in the long run it would be
nice for cat-file to produce what output it _can_ for a missing object
(e.g., the oid and mode).
But I think it is OK to punt on that for now. Because "<oid> missing"
lines already exist, we'd probably need to put such behavior behind a
new command-line option. So while "<oid> submodule" lines would be
unnecessary in that hypothetical future world, we are not digging the
hole any deeper, from a backwards-compatibility standpoint.
Although speaking of backwards compatibility, I guess older readers may
be surprised that the old "missing" message becomes a "submodule" one.
They may need to be updated if they were written carefully to bail on
unknown input (and were happy seeing "missing" messages for submodules).
So there may be some fallout, but it's not like the existing messages
were particularly useful in the first place.
> Note that this implementation maintains a distinction between submodules
> where the commit OID is not present in the repo, and submodules where the
> commit OID *is* present; the former will now print "<object> submodule", but
> the latter will still print the full object content.
Hmm, that is an interesting point. It feels kind of arbitrary, but I'm
having trouble making a strong argument for one direction or the other.
The way you've written it means that readers need to be prepared to
parse _both_ the mode and "<oid> submodule" lines to find submodules.
But maybe there's some value in finding out more information about
submodule commits you do have in-repo.
The implementations are similar. Replacing this hunk:
> diff --git a/builtin/cat-file.c b/builtin/cat-file.c
> index b11576756bcc..4b23fcecbd8e 100644
> --- a/builtin/cat-file.c
> +++ b/builtin/cat-file.c
> @@ -496,7 +496,10 @@ static void batch_object_write(const char *obj_name,
> &data->oid, &data->info,
> OBJECT_INFO_LOOKUP_REPLACE);
> if (ret < 0) {
> - report_object_status(opt, obj_name, &data->oid, "missing");
> + if (data->mode == S_IFGITLINK)
> + report_object_status(opt, oid_to_hex(&data->oid), &data->oid, "submodule");
> + else
> + report_object_status(opt, obj_name, &data->oid, "missing");
> return;
> }
>
with:
diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 4b23fcecbd..1b200e1607 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -488,6 +488,11 @@ static void batch_object_write(const char *obj_name,
if (opt->objects_filter.choice == LOFC_BLOB_LIMIT)
data->info.sizep = &data->size;
+ if (data->mode == S_IFGITLINK) {
+ report_object_status(opt, oid_to_hex(&data->oid), &data->oid, "submodule");
+ return;
+ }
+
if (pack)
ret = packed_object_info(the_repository, pack, offset,
&data->info);
so I think the decision is really about what people will find most
useful. So I dunno. It is mostly a coin-flip, leading me to say that
what you picked just came up "heads" and is good enough. ;)
-Peff There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. On the Git mailing list, Victoria Dye wrote (reply to this): Jeff King wrote:
> On Mon, Jun 02, 2025 at 06:55:55PM +0000, Victoria Dye via GitGitGadget wrote:
>
>> To disambiguate without needing to invoke a separate Git process (e.g.
>> 'ls-tree'), print the message "<oid> submodule" for such objects instead of
>> "<object> missing". In addition to the change from "missing" to "submodule",
>> the new message differs from the old in that it always prints the resolved
>> tree entry's OID, rather than the input object specification.
>
> OK. I read over the discussion from last year, which I think mostly
> centered around this patch. I do still think in the long run it would be
> nice for cat-file to produce what output it _can_ for a missing object
> (e.g., the oid and mode).
One way to handle that could be changing the message to something like:
submodule SP <mode> SP <oid>
but...
>
> But I think it is OK to punt on that for now. Because "<oid> missing"
> lines already exist, we'd probably need to put such behavior behind a
> new command-line option. So while "<oid> submodule" lines would be
> unnecessary in that hypothetical future world, we are not digging the
> hole any deeper, from a backwards-compatibility standpoint.
>
> Although speaking of backwards compatibility, I guess older readers may
> be surprised that the old "missing" message becomes a "submodule" one.
> They may need to be updated if they were written carefully to bail on
> unknown input (and were happy seeing "missing" messages for submodules).
> So there may be some fallout, but it's not like the existing messages
> were particularly useful in the first place.
...I suspect that'd be even less compatible with existing automation around
'cat-file' than just swapping out "submodule" for "missing", and users can
theoretically infer that the mode is 160000 (S_IFGITLINK). That said, if at
some point in the future we support submodules with a different mode, then
an explicit value would be fairly useful.
Happy to change it or keep it the same, I have no strong preference either
way.
>
>> Note that this implementation maintains a distinction between submodules
>> where the commit OID is not present in the repo, and submodules where the
>> commit OID *is* present; the former will now print "<object> submodule", but
>> the latter will still print the full object content.
>
> Hmm, that is an interesting point. It feels kind of arbitrary, but I'm
> having trouble making a strong argument for one direction or the other.
> The way you've written it means that readers need to be prepared to
> parse _both_ the mode and "<oid> submodule" lines to find submodules.
> But maybe there's some value in finding out more information about
> submodule commits you do have in-repo.
This was pretty much my thought process on it. It was a somewhat arbitrary
choice, but what tipped me towards distinguishing the cases is that I'd
rather have information like size, content, etc. about a commit and not need
to use it, than need it but not have it available. That, and it does
maintain the existing treatment of self-referential submodules. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. On the Git mailing list, Jeff King wrote (reply to this): On Wed, Jun 04, 2025 at 05:12:54PM -0700, Victoria Dye wrote:
> > OK. I read over the discussion from last year, which I think mostly
> > centered around this patch. I do still think in the long run it would be
> > nice for cat-file to produce what output it _can_ for a missing object
> > (e.g., the oid and mode).
>
> One way to handle that could be changing the message to something like:
>
> submodule SP <mode> SP <oid>
Hmm, yeah. That seemed weird to me at first because it doesn't
necessarily match what the caller asked for via batch-check. But really,
mode and oid are the only reasonable things we could report anyway[1].
And the mode is implicit in the word "submodule", so really there is
only the oid to report.
[1] For now, at least. If we ever finally unify all of the various
formatting code, then one might in theory be able to feed a refname
to cat-file and print information about the ref, or perhaps other
meta-information. But let's not worry about that hypothetical for
now.
> ...I suspect that'd be even less compatible with existing automation around
> 'cat-file' than just swapping out "submodule" for "missing", and users can
> theoretically infer that the mode is 160000 (S_IFGITLINK). That said, if at
> some point in the future we support submodules with a different mode, then
> an explicit value would be fairly useful.
>
> Happy to change it or keep it the same, I have no strong preference either
> way.
Right, that makes sense. I do wonder if:
<oid> missing submodule
might be friendlier to readers who are matching on /^[0-9a-f]+ missing/,
but now I am just guessing at a hypothetical program. So it may not be
worth going down that rabbit hole, and we can just go with what you
posted.
We can always worry about extending it later with an option to say "turn
placeholders for missing objects into empty strings" or similar.
I did come across one other interesting case while thinking about this,
though. When running:
git cat-file --batch-check='%(objectname) %(objectmode)'
we do not need to access the object at all! So why does a submodule
entry cause us to complain? The answer is that cat-file will (mostly for
historical reasons) confirm the existence of the object name that is fed
to it by calling oid_object_info(). The only exception is when we are
doing --batch-all-objects, since there we know we have the object,
because we found it by iterating the odb. And we optimize out the extra
call for that case (which makes a big difference if you're just printing
the object names).
But since we don't expect submodule entries to exist in the first place,
it might be reasonable to loosen that check. Something like this, though
I think it could benefit from some refactoring:
diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 4b23fcecbd..bb52d9b673 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -304,8 +304,20 @@ struct expand_data {
* This flag will be true if the requested batch format and options
* don't require us to call oid_object_info, which can then be
* optimized out.
+ *
+ * The "submodule" variant is true if the format doesn't require it,
+ * but other options mean we'd usually continue to do so to check
+ * object existence. We can still omit the call for submodules in that
+ * case.
+ *
+ * This might be less confusing if we break skip_object_info down into
+ * two parts:
+ * - does the format require oid_object_info?
+ * - do the other options require checking existence?
*/
unsigned skip_object_info : 1;
+ unsigned skip_submodule_info : 1;
+
};
#define EXPAND_DATA_INIT { .mode = S_IFINVALID }
@@ -477,7 +489,8 @@ static void batch_object_write(const char *obj_name,
struct packed_git *pack,
off_t offset)
{
- if (!data->skip_object_info) {
+ if (!(data->skip_object_info ||
+ (data->skip_submodule_info && data->mode == S_IFGITLINK))) {
int ret;
if (use_mailmap ||
@@ -939,6 +952,12 @@ static int batch_objects(struct batch_options *opt)
strbuf_release(&output);
return 0;
+ } else {
+ struct object_info empty = OBJECT_INFO_INIT;
+
+ if (!memcmp(&data.info, &empty, sizeof(empty)) &&
+ opt->objects_filter.choice == LOFC_DISABLED)
+ data.skip_submodule_info = 1;
}
/*
I don't think that needs to be part of your series, though. We'd still
potentially need to handle the missing-submodule case for format
requests that actually look at the object, which would hit the "<oid>
submodule" case you're adding. So it could come later (or not at all),
and it's probably only worth pursuing if it would make life easier for
your intended caller.
-Peff |
||
------------ | ||
|
||
If a name is specified that refers to a submodule entry in a tree and the | ||
target object does not exist in the repository, then `cat-file` will ignore | ||
any custom format and print (with the object ID of the submodule): | ||
|
||
------------ | ||
<oid> SP submodule LF | ||
------------ | ||
|
||
If `--follow-symlinks` is used, and a symlink in the repository points | ||
outside the repository, then `cat-file` will ignore any custom format | ||
and print: | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Jeff King wrote (reply to this):