Skip to content

Rebase to 2.51.0-rc2 #783

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 304 commits into from
Aug 21, 2025
Merged

Rebase to 2.51.0-rc2 #783

merged 304 commits into from
Aug 21, 2025

Conversation

dscho
Copy link
Member

@dscho dscho commented Aug 13, 2025

This rebases #781 on top of Git for Windows v2.51.0-rc2.

jamill and others added 30 commits August 14, 2025 00:25
Teach status serialization to take an optional pathname on
the command line to direct that cache data be written there
rather than to stdout.  When used this way, normal status
results will still be written to stdout.

When no path is given, only binary serialization data is
written to stdout.

Usage:
    git status --serialize[=<path>]

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Teach status deserialize code to reject status cache
when printing in porcelain V2 and there are unresolved
conflicts in the cache file.  A follow-on task might
extend the cache format to include this additiona data.

See code for longer explanation.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
When sparse-checkout is enabled, add the sparse-checkout percentage to
the Trace2 data stream.  This number was already computed and printed
on the console in the "You are in a sparse checkout..." message.  It
would be helpful to log it too for performance monitoring.

Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
Changes to the global or repo-local excludes files can change the
results returned by "git status" for untracked files.  Therefore,
it is important that the exclude-file values used during serialization
are still current at the time of deserialization.

Teach "git status --serialize" to report metadata on the user's global
exclude file (which defaults to "$XDG_HOME/git/ignore") and for the
repo-local excludes file (which is in ".git/info/excludes").  Serialize
will record the pathnames and mtimes for these files in the serialization
header (next to the mtime data for the .git/index file).

Teach "git status --deserialize" to validate this new metadata.  If either
exclude file has changed since the serialization-cache-file was written,
then deserialize will reject the cache file and force a full/normal status
run.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Add VFS checkout hydration percentage information to the default `git
status` output.  When VFS is enable, users will now see a "You are in
a partially-hydrated checkout with <percentage> of tracked files
present." message.

Upstream `git status` normally prints a "You are in a sparse checkout
with <percentage> of tracked files present."  This message was hidden
in `microsoft/git` when `core_virtualfilesystem` is set (because GVFS
users are always (and secretly) in a sparse checkout) and it was
thought that it would annoy users.

However, we now believe that it may be helpful for users to always see
the percentage and know when they are over-hyrdated, since
over-hyrdation can occur by accident and may greatly impact their Git
performance.  Knowing this value may help with GVFS support.

Helped-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
Teach `git status --deserialize` to either wait indefintely
or immediately fail if the status serialization cache file
is stale.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
With the "--untracked-files=complete" option status computes a
superset of the untracked files.  We use this when writing the
status cache.  If subsequent deserialize commands ask for either
the complete set or one of the "no", "normal", or "all" subsets,
it can still use the cache file because of filtering in the
deserialize parser.

When running status with the "-uno" option, the long format
status would print a "(use -u to show untracked files)" hint.

When deserializing with the "-uno" option and using a cache computed
with "-ucomplete", the "nothing to commit, working tree clean" message
would be printed instead of the hint.

It was easy to miss because the correct hint message was printed
if the cache was rejected for any reason (and status did the full
fallback).

The "struct wt_status des" structure was initialized with the
content of the status cache (and thus defaulted to "complete").
This change sets "des.show_untracked_files" to the requested
subset from the command-line or config.  This allows the long
format to print the hint.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
When using fsmonitor the CE_FSMONITOR_VALID flag should be checked when
wanting to know if the entry has been updated. If the flag is set the
entry should be considered up to date and the same as if the CE_UPTODATE
is set.

In order to trust the CE_FSMONITOR_VALID flag, the fsmonitor data needs to
be refreshed when the fsmonitor bitmap is applied to the index in
tweak_fsmonitor. Since the fsmonitor data is kept up to date for every
command, some tests needed to be updated to take that into account.

istate->untracked->use_fsmonitor was set in tweak_fsmonitor when the
fsmonitor bitmap data was loaded and is now in refresh_fsmonitor since
that is being called in tweak_fsmonitor. refresh_fsmonitor will only be
called once and any other callers should be setting it when refreshing
the fsmonitor data so that code can use the fsmonitor data when checking
untracked files.

When writing the index, fsmonitor_last_update is used to determine if
the fsmonitor bitmap should be created and the extension data written to
the index. When running through unpack-trees this is not copied to the
result index. This makes the next time a git command is ran do all the
work of lstating all files to determine what is clean since all entries
in the index are marked as dirty since there wasn't any fsmonitor data
saved in the index extension.

Copying the fsmonitor_last_update to the result index will cause the
extension data for fsmonitor to be in the index for the next git command
to use.

Signed-off-by: Kevin Willford <Kevin.Willford@microsoft.com>
The fsmonitor script that can be used for running all the git tests
using watchman was causing some of the tests to fail because it wrote
to stderr and created some files for debugging purposes.

Add a new debug script to use with debugging and modify the other script
to remove the code that would cause tests to fail.

Signed-off-by: Kevin Willford <Kevin.Willford@microsoft.com>
Disable deserialization when verbose output requested.

Verbose mode causes Git to print diffs for modified files.
This requires the index to be loaded to have the currently
staged OID values.  Without loading the index, verbose output
make it look like everything was deleted.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Add trace2 region around read_object_process to collect
time spent waiting for missing objects to be dynamically
fetched.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Verify that `git status --deserialize=x -v` does not crash and
generates the same output as a normal (scanning) status command.

These issues are described in the previous 2 commits.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Add trace2 region and data events describing attempts to deserialize
status data using a status cache.

A category:status, label:deserialize region is pushed around the
deserialize code.

Deserialization results when reading from a file are:
    category:status, path   = <path>
    category:status, polled = <number_of_attempts>
    category:status, result = "ok" | "reject"

When reading from STDIN are:
    category:status, path   = "STDIN"
    category:status, result = "ok" | "reject"

Status will fallback and run a normal status scan when a "reject"
is reported (unless "--deserialize-wait=fail").

If "ok" is reported, status was able to use the status cache and
avoid scanning the workdir.

Additionally, a cmd_mode is emitted for each step: collection,
deserialization, and serialization.  For example, if deserialization
is attempted and fails and status falls back to actually computing
the status, a cmd_mode message containing "deserialize" is issued
and then a cmd_mode for "collect" is issued.

Also, if deserialization fails, a data message containing the
rejection reason is emitted.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Teach Git to not throw a fatal error when an explicitly-specified
status-cache file (`git status --deserialize=<foo>`) could not be
found or opened for reading and silently fallback to a traditional
scan.

This matches the behavior when the status-cache file is implicitly
given via a config setting.

Note: the current version causes a test to start failing. Mark this as
an expected result for now.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Add trace information around status serialization.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Report virtual filesystem summary data.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Create trace2_initialize_clock() and call from main() to capture
process start time in isolation and before other sub-systems are
ready.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Add trace2_thread_start() and trace2_thread_exit() events to the worker
threads used to read the index.  This gives per-thread perf data.

These workers were introduced in:
abb4bb8 read-cache: load cache extensions on a worker thread
77ff112 read-cache: load cache entries on worker threads

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
…ension

Add regions around code to read and write the cache-tree extension
when the index is read or written.

This is an experiment and may be dropped in future releases if
we don't need it anymore.

This experiment demonstrates that it takes more time to parse and
deserialize the cache-tree extension than it does to read the
cache-entries.

Commits [1] and [2] spreads cache-entry reading across N-1 cores
and dedicates a single core to simultaneously read the index extensions.

Local testing (on my machine) shows that reading the cache-tree extension
takes ~0.28 seconds.  The 11 cache-entry threads take ~0.08 seconds.
The main thread is blocked for 0.15 to 0.20 seconds waiting for the
extension thread to finish.

Let's use this commit to gather some telemetry and confirm this.

My point is that improvements, such as index V5 which makes the
cache entries smaller, may improve performance, but the gains may
be limited because of this extension.  And that we may need to
look inside the cache-tree extension to truly improve do_read_index()
performance.

[1] abb4bb8 read-cache: load cache extensions on a worker thread
[2] 77ff112 read-cache: load cache entries on worker threads

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
…and report_tracking()

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Teach subprocess_start() to use a copy of the passed `cmd` string
rather than borrowing the buffer from the caller.

Some callers of subprocess_start() pass the value returned from
find_hook() which points to a static buffer and therefore is only
good until the next call to find_hook().  This could cause problems
for the long-running background processes managed by sub-process.c
where later calls to subprocess_find_entry() to get an existing
process will fail.  This could cause more than 1 long-running
process to be created.

TODO Need to confirm, but if only read_object_hook() uses
TODO subprocess_start() in this manner, we could drop this
TODO commit when we drop support for read_object_hook().

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Add function to start a subprocess with an argv.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
dscho and others added 17 commits August 14, 2025 00:26
As pointed out by CodeQL, `lookup_commit()` can return NULL.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
The code is a bit too hard to reason about for CodeQL to figure out
whether the `fill_commit_graph_info()`  function is at all called after
`write_commit_graph()` returns (and hence whether `topo_levels` goes out
of context before it is used again).

The Git project insists that this is correct (and does not want to make
the code more obviously correct), so let's silence CodeQL's complaints
in this instance.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
…ray past end

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
…oes NUL-terminate correctly

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Some fixes in `clar`, pointed out by CodeQL.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Let's exclude GitWeb from being scanned; It is not distributed by us.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
A few places where CodeQL thinks that variables might be uninitialized.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
These patches implement some defensive programming to address complaints
some static analyzers might have.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
CodeQL pointed out a couple of issues, which are addressed in this patch
series.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
This patch series has been long in the making, ever since Johannes
Nicolai and myself spiked this in November/December 2020.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
This patch series has been long in the making, ever since Johannes
Nicolai and myself spiked this in November/December 2020.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
On Linux, the following command would cause the terminal to be stuck
waiting:

```
  git fetch origin foobar
```

The issue would be that the fetch would fail with the error

```
  fatal: couldn't find remote ref foobar
```

but the underlying `git-gvfs-helper` process wouldn't die. The
`subprocess_exit_handler()` method would close its stdin and stdout, but
that wouldn't be enough to cause the process to end.

This PR addresses that by skipping the `finish_command()` call of the
`clean_on_exit_handler` and instead lets `cleanup_children()` send a
SIGTERM to terminate those spawned child processes.
This patch series has been long in the making, ever since Johannes
Nicolai and myself spiked this in November/December 2020.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
As of 9e59b38 (object-file: emit corruption errors when detected,
2022-12-14), Git will loudly complain about corrupt objects.

That is fine, as long as the idea isn't to re-download locally-corrupted
objects. But that's exactly what we want to do in VFS for Git via the
`read-object` hook, as per the `GitCorruptObjectTests` code
added in microsoft/VFSForGit@2db0c030eb25 (New
features: [...] -  GVFS can now recover from corrupted git object files
[...] , 2018-02-16).

So let's support precisely that, and add a regression test that ensures
that re-downloading corrupt objects via the `read-object` hook works.

While at it, avoid the XOR operator to flip the bits, when we actually
want to make sure that they are turned off: Use the AND-NOT operator for
that purpose.

Helped-by: Matthew John Cheetham <mjcheetham@outlook.com>
Helped-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
As of 9e59b38, Git will loudly complain about corrupt objects.

That is fine, as long as the idea isn't to re-download locally-corrupted
objects. But that's exactly what we want to do in VFS for Git. This is
even tested for in the functional tests of VFS for Git, which has been
identified as a regression in
microsoft/VFSForGit#1853.
@dscho dscho requested a review from mjcheetham August 13, 2025 22:46
@dscho dscho self-assigned this Aug 13, 2025
@dscho
Copy link
Member Author

dscho commented Aug 13, 2025

Range-diff relative to -rc1

@dscho dscho marked this pull request as ready for review August 14, 2025 11:00
@mjcheetham
Copy link
Member

  • 34: 78eeb63 ! 34: 0133ef0 Hydrate missing loose objects in check_and_freshen()
@@ odb.c: int odb_has_alternates(struct object_database *odb)
 +	const char *cmd = find_hook(r, "read-object");
 +	uint64_t start;
 +
++	if (!cmd)
++		die(_("could not find the `read-object` hook"));
++
 +	start = getnanotime();
 +
 +	if (!subprocess_map_initialized) {

Was this accidentally dropped in rc1 or rc0?

Copy link
Member

@mjcheetham mjcheetham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, other than the one question above.

@dscho
Copy link
Member Author

dscho commented Aug 14, 2025

Was this accidentally dropped in rc1 or rc0?

@mjcheetham no, this was a bug (that has been in the code base since the read-object hook was added) which I ran into during my attempts to add a test case for #782 (which is where that change entered the repository as a fixup!).

@dscho dscho mentioned this pull request Aug 21, 2025
@dscho dscho enabled auto-merge August 21, 2025 11:04
@dscho dscho disabled auto-merge August 21, 2025 11:04
@dscho dscho merged commit a0c5e8c into vfs-2.51.0-rc2 Aug 21, 2025
237 of 239 checks passed
@dscho dscho deleted the tentative/vfs-2.51.0-rc2 branch August 21, 2025 11:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants