Rebase onto Git for Windows v2.53.0.windows.1#857
Merged
dscho merged 295 commits intovfs-2.53.0from Feb 3, 2026
Merged
Conversation
String formatting can be a performance issue when there are hundreds of thousands of trees. Change to stop using the strbuf_addf and just add the strings or characters individually. There are a limited number of modes so added a switch for the known ones and a default case if something comes through that are not a known one for git. In one scenario regarding a huge worktree, this reduces the time required for a `git checkout <branch>` from 44 seconds to 38 seconds, i.e. it is a non-negligible performance improvement. Signed-off-by: Kevin Willford <kewillf@microsoft.com>
The following commands and options are not currently supported when working in a GVFS repo. Add code to detect and block these commands from executing. 1) fsck 2) gc 4) prune 5) repack 6) submodule 8) update-index --split-index 9) update-index --index-version (other than 4) 10) update-index --[no-]skip-worktree 11) worktree Signed-off-by: Ben Peart <benpeart@microsoft.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
In earlier versions of `microsoft/git`, we found a user who had set `core.gvfs = false` in their global config. This should not have been necessary, but it also should not have caused a problem. However, it did. The reason was that `gvfs_load_config_value()` was called from `config.c` when reading config key/value pairs from all the config files. The local config should override the global config, and this is done by `config.c` reading the global config first then reading the local config. However, our logic only allowed writing the `core_gvfs` variable once. In v2.51.0, we had to adapt to upstream changes that changed way the `core.gvfs` config value is read, and the special handling is no longer necessary, yet we still want the test case that ensures that this bug does not experience a regression. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Add the ability to block built-in commands based on if the `core.gvfs` setting has the `GVFS_USE_VIRTUAL_FILESYSTEM` bit set. This allows us to selectively block commands that use the GVFS protocol, but don't use VFS for Git (for example repos cloned via `scalar clone` against Azure DevOps). Signed-off-by: Matthew John Cheetham <mjcheetham@outlook.com>
Loosen the blocking of the `repack` command from all "GVFS repos" (those that have `core.gvfs` set) to only those that actually use the virtual file system (VFS for Git only). This allows for `repack` to be used in Scalar clones. Signed-off-by: Matthew John Cheetham <mjcheetham@outlook.com>
Loosen the blocking of the `fsck` command from all "GVFS repos" (those that have `core.gvfs` set) to only those that actually use the virtual file system (VFS for Git only). This allows for `fsck` to be used in Scalar clones. Signed-off-by: Matthew John Cheetham <mjcheetham@outlook.com>
Loosen the blocking of the `prune` command from all "GVFS repos" (those that have `core.gvfs` set) to only those that actually use the virtual file system (VFS for Git only). This allows for `prune` to be used in Scalar clones. Signed-off-by: Matthew John Cheetham <mjcheetham@outlook.com>
Replace the special casing of the `worktree` command being blocked on VFS-enabled repos with the new `BLOCK_ON_VFS_ENABLED` flag. Signed-off-by: Matthew John Cheetham <mjcheetham@outlook.com>
Emit a warning message when the `gvfs.sharedCache` option is set that the `repack` command will not perform repacking on the shared cache. In the future we can teach `repack` to operate on the shared cache, at which point we can drop this commit. Signed-off-by: Matthew John Cheetham <mjcheetham@outlook.com>
On index load, clear/set the skip worktree bits based on the virtual file system data. Use virtual file system data to update skip-worktree bit in unpack-trees. Use virtual file system data to exclude files and folders not explicitly requested. Update 2022-04-05: disable the "present-despite-SKIP_WORKTREE" file removal behavior when 'core.virtualfilesystem' is enabled. Signed-off-by: Ben Peart <benpeart@microsoft.com>
…x has been redirected Fixes #13 Some git commands spawn helpers and redirect the index to a different location. These include "difftool -d" and the sequencer (i.e. `git rebase -i`, `git cherry-pick` and `git revert`) and others. In those instances we don't want to update their temporary index with our virtualization data. Helped-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Ben Peart <Ben.Peart@microsoft.com>
Add check to see if a directory is included in the virtualfilesystem before checking the directory hashmap. This allows a directory entry like foo/ to find all untracked files in subdirectories.
When our patches to support that hook were upstreamed, the hook's name was eliciting some reviewer suggestions, and it was renamed to `post-index-change`. These patches (with the new name) made it into v2.22.0. However, VFSforGit users may very well have checkouts with that hook installed under the original name. To support this, let's just introduce a hack where we look a bit more closely when we just failed to find the `post-index-change` hook, and allow any `post-indexchanged` hook to run instead (if it exists).
Teach STATUS to optionally serialize the results of a status computation to a file. Teach STATUS to optionally read an existing serialization file and simply print the results, rather than actually scanning. This is intended for immediate status results on extremely large repos and assumes the use of a service/daemon to maintain a fresh current status snapshot. 2021-10-30: packet_read() changed its prototype in ec9a37d (pkt-line.[ch]: remove unused packet_read_line_buf(), 2021-10-14). 2021-10-30: sscanf() now does an extra check that "%d" goes into an "int" and complains about "uint32_t". Replacing with "%u" fixes the compile-time error. 2021-10-30: string_list_init() was removed by abf897b (string-list.[ch]: remove string_list_init() compatibility function, 2021-09-28), so we need to initialize manually. Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Teach status serialization to take an optional pathname on
the command line to direct that cache data be written there
rather than to stdout. When used this way, normal status
results will still be written to stdout.
When no path is given, only binary serialization data is
written to stdout.
Usage:
git status --serialize[=<path>]
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
When using a virtual file system layer, the FSMonitor does not make sense. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Teach status deserialize code to reject status cache when printing in porcelain V2 and there are unresolved conflicts in the cache file. A follow-on task might extend the cache format to include this additiona data. See code for longer explanation. Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
When sparse-checkout is enabled, add the sparse-checkout percentage to the Trace2 data stream. This number was already computed and printed on the console in the "You are in a sparse checkout..." message. It would be helpful to log it too for performance monitoring. Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
Changes to the global or repo-local excludes files can change the results returned by "git status" for untracked files. Therefore, it is important that the exclude-file values used during serialization are still current at the time of deserialization. Teach "git status --serialize" to report metadata on the user's global exclude file (which defaults to "$XDG_HOME/git/ignore") and for the repo-local excludes file (which is in ".git/info/excludes"). Serialize will record the pathnames and mtimes for these files in the serialization header (next to the mtime data for the .git/index file). Teach "git status --deserialize" to validate this new metadata. If either exclude file has changed since the serialization-cache-file was written, then deserialize will reject the cache file and force a full/normal status run. Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Teach `git status --deserialize` to either wait indefintely or immediately fail if the status serialization cache file is stale. Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
With the "--untracked-files=complete" option status computes a superset of the untracked files. We use this when writing the status cache. If subsequent deserialize commands ask for either the complete set or one of the "no", "normal", or "all" subsets, it can still use the cache file because of filtering in the deserialize parser. When running status with the "-uno" option, the long format status would print a "(use -u to show untracked files)" hint. When deserializing with the "-uno" option and using a cache computed with "-ucomplete", the "nothing to commit, working tree clean" message would be printed instead of the hint. It was easy to miss because the correct hint message was printed if the cache was rejected for any reason (and status did the full fallback). The "struct wt_status des" structure was initialized with the content of the status cache (and thus defaulted to "complete"). This change sets "des.show_untracked_files" to the requested subset from the command-line or config. This allows the long format to print the hint. Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
When using fsmonitor the CE_FSMONITOR_VALID flag should be checked when wanting to know if the entry has been updated. If the flag is set the entry should be considered up to date and the same as if the CE_UPTODATE is set. In order to trust the CE_FSMONITOR_VALID flag, the fsmonitor data needs to be refreshed when the fsmonitor bitmap is applied to the index in tweak_fsmonitor. Since the fsmonitor data is kept up to date for every command, some tests needed to be updated to take that into account. istate->untracked->use_fsmonitor was set in tweak_fsmonitor when the fsmonitor bitmap data was loaded and is now in refresh_fsmonitor since that is being called in tweak_fsmonitor. refresh_fsmonitor will only be called once and any other callers should be setting it when refreshing the fsmonitor data so that code can use the fsmonitor data when checking untracked files. When writing the index, fsmonitor_last_update is used to determine if the fsmonitor bitmap should be created and the extension data written to the index. When running through unpack-trees this is not copied to the result index. This makes the next time a git command is ran do all the work of lstating all files to determine what is clean since all entries in the index are marked as dirty since there wasn't any fsmonitor data saved in the index extension. Copying the fsmonitor_last_update to the result index will cause the extension data for fsmonitor to be in the index for the next git command to use. Signed-off-by: Kevin Willford <Kevin.Willford@microsoft.com>
The fsmonitor script that can be used for running all the git tests using watchman was causing some of the tests to fail because it wrote to stderr and created some files for debugging purposes. Add a new debug script to use with debugging and modify the other script to remove the code that would cause tests to fail. Signed-off-by: Kevin Willford <Kevin.Willford@microsoft.com>
Disable deserialization when verbose output requested. Verbose mode causes Git to print diffs for modified files. This requires the index to be loaded to have the currently staged OID values. Without loading the index, verbose output make it look like everything was deleted. Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Verify that `git status --deserialize=x -v` does not crash and generates the same output as a normal (scanning) status command. These issues are described in the previous 2 commits. Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Add trace2 region around read_object_process to collect time spent waiting for missing objects to be dynamically fetched. Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Teach Git to not throw a fatal error when an explicitly-specified status-cache file (`git status --deserialize=<foo>`) could not be found or opened for reading and silently fallback to a traditional scan. This matches the behavior when the status-cache file is implicitly given via a config setting. Note: the current version causes a test to start failing. Mark this as an expected result for now. Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Add trace2 region and data events describing attempts to deserialize
status data using a status cache.
A category:status, label:deserialize region is pushed around the
deserialize code.
Deserialization results when reading from a file are:
category:status, path = <path>
category:status, polled = <number_of_attempts>
category:status, result = "ok" | "reject"
When reading from STDIN are:
category:status, path = "STDIN"
category:status, result = "ok" | "reject"
Status will fallback and run a normal status scan when a "reject"
is reported (unless "--deserialize-wait=fail").
If "ok" is reported, status was able to use the status cache and
avoid scanning the workdir.
Additionally, a cmd_mode is emitted for each step: collection,
deserialization, and serialization. For example, if deserialization
is attempted and fails and status falls back to actually computing
the status, a cmd_mode message containing "deserialize" is issued
and then a cmd_mode for "collect" is issued.
Also, if deserialization fails, a data message containing the
rejection reason is emitted.
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Add the `--ref-format` option to the `scalar clone` command. This will allow users to opt-in to creating a Scalar repository using alternative ref storage backends, such as reftable. Example: scalar clone --ref-format reftable $URL Signed-off-by: Matthew John Cheetham <mjcheetham@outlook.com>
Add the `--ref-format` option to the `scalar clone` command. This will allow users to opt-in to creating a Scalar repository using alternative ref storage backends, such as reftable. Example: ```shell scalar clone --ref-format reftable $URL ```
When we are installing a loose object, finalize_object_file() first checks to see if the contents match what already exists in a loose object file of the target name. However, this doesn't check if the target is valid, it assumes the target is valid. However, in the case of a power outage or something like that, the file may be corrupt (for example: all NUL bytes). That is a common occurrence when we are needing to install a loose object _again_: we don't think we have it already so any copy that exists is probably bogus. Use the flagged version with FOF_SKIP_COLLISION_CHECK to avoid these types of errors, as seen in GitHub issue #837. Signed-off-by: Derrick Stolee <stolee@gmail.com>
Users sometimes see transient network errors, but they are actually due to some other problem within the installation of a packfile. Observed resolutions include freeing up space on a full disk or deleting the shared object cache because something was broken due to a file corruption or power outage. This change only provides the advice to suggest those workarounds to help users help themselves. This is our first advice custom to the microsoft/git fork, so I have partitioned the key away from the others to avoid adjacent change conflicts (at least until upstream adds a new change at the end of the alphabetical list). We could consider providing a tool that does a more robust check of the shared object cache, but since 'git fsck' isn't safe to run as it may download missing objects, we do not have that ability at the moment. The good news is that it is safe to delete and rebuild the shared object cache as long as all local branches are pushed. The branches must be pushed because the local .git/objects/ directory is moved to the shared object cache in the 'cache-local-objects' maintenance task. Signed-off-by: Derrick Stolee <stolee@gmail.com>
Similar to a recent change to avoid the collision check for loose objects, do the same for prefetch packfiles. This should be more rare, but the same prefetch packfile could be downloaded from the same cache server so this isn't out of the range of possibility. Signed-off-by: Derrick Stolee <stolee@gmail.com>
…ed object cache (#840) There have been a number of customer-reported problems with errors of the form ``` error: inflate: data stream error (unknown compression method) error: unable to unpack a163b1302d4729ebdb0a12d3876ca5bca4e1a8c3 header error: files 'D:/.scalarCache/id_49b0c9f4-555f-4624-8157-a57e6df513b3/pack/tempPacks/t-20260106-014520-049919-0001.temp' and 'D:/.scalarCache/id_49b0c9f4-555f-4624-8157-a57e6df513b3/a1/63b1302d4729ebdb0a12d3876ca5bca4e1a8c3' differ in contents error: gvfs-helper error: 'could not install loose object 'D:/.scalarCache/id_49b0c9f4-555f-4624-8157-a57e6df513b3/a1/63b1302d4729ebdb0a12d3876ca5bca4e1a8c3': from GET a163b1302d4729ebdb0a12d3876ca5bca4e1a8c3' ``` or ``` Receiving packfile 1/1 with 1 objects (bytes received): 17367934, done. Receiving packfile 1/1 with 1 objects [retry 1/6] (bytes received): 17367934, done. Waiting to retry after network error (sec): 100% (8/8), done. Receiving packfile 1/1 with 1 objects [retry 2/6] (bytes received): 17367934, done. Waiting to retry after network error (sec): 100% (16/16), done. Receiving packfile 1/1 with 1 objects [retry 3/6] (bytes received): 17367934, done. ``` These are not actually due to network issues, but they look like it based on the stack that is doing retries. Instead, these actually have problems when installing the loose object or packfile into the shared object cache. The loose objects are hitting issues when installing and the target loose object is corrupt in some way, such as all NUL bytes because the disk wasn't flushed when the machine shut down. The error results because we are doing a collision check without confirming that the existing contents are valid. The packfiles may be hitting similar comparison cases, but it is less likely. We update these packfile installations to also skip the collision check. In both cases, if we have a transient network error, we add a new advice message that helps users with the two most common workarounds: 1. Your disk may be full. Make room. 2. Your shared object cache may be corrupt. Push all branches, delete it, and fetch to refill it. I make special note of when the shared object cache doesn't exist and point that it probably should so the whole repo is suspect at that point. * [X] This change only applies to interactions with Azure DevOps and the GVFS Protocol. Resolves #837.
In anticipation of tests for multiple cache-servers, update the existing logic that sets up and tears down cache-servers to allow multiple instances on different ports. Signed-off-by: Derrick Stolee <stolee@gmail.com>
This extension of the gvfs.cache-server config now allows a new key, gvfs.prefetch.cache-server, to override the cache-server URL for only the prefetch endpoint. The purpose of this config is to allow for incremental testing and deployment of new cache-server infrastructure. Hypothetically, we could have special-purpose cache-servers that are glorified bundle servers and other servers that focus on the object and size endpoints. More realistically, this will allow us to test cache servers that have only the prefetch endpoint ready to go. This allows some incremental rollout that is more controlled than a flag day replacing the entire infrastructure. Signed-off-by: Derrick Stolee <stolee@gmail.com>
This extension of the gvfs.cache-server config now allows a new key, gvfs.get.cache-server, to override the cache-server URL for only the prefetch endpoint. The purpose of this config is to allow for incremental testing and deployment of new cache-server infrastructure. Signed-off-by: Derrick Stolee <stolee@gmail.com>
This extension of the gvfs.cache-server config now allows a new key, gvfs.post.cache-server, to overrid the cache-server URL for only the batched objects endpoint. The purpose of this config is to allow for incremental testing and deployment of new cache-server infrastructure. Signed-off-by: Derrick Stolee <stolee@gmail.com>
Test that all three verb-specific cache-server configs can be used simultaneously, each directing requests to a different server. This verifies that prefetch, get, and post verbs each respect their own override and don't interfere with each other.
The t5799-gvfs-helper.sh script is long and takes forever. This slows down PR merges and the local development inner loop is a pain. Before distributing the tests into a set of new test scripts by topic, extract important helper methods that can be imported by the new scripts. Signed-off-by: Derrick Stolee <stolee@gmail.com>
Move the tests from t5799-gvfs-helper.sh into multiple scripts that can run in parallel. To ensure that the ports do not overlap, add a large multiplier on the instance when needing multiple ports within the same test (currently limited to the verb-specific cache servers). Signed-off-by: Derrick Stolee <stolee@gmail.com>
Starting with upstream commit 4580bcd ("osxkeychain: avoid incorrectly skipping store operation", 2025-11-14), the osxkeychain credential helper includes git-compat-util.h and links against libgit.a. The osxkeychain Makefile has: CFLAGS ?= -g -O2 -Wall -I../../.. $(BASIC_CFLAGS) The ?= operator means "set only if not already set". When building via a parent Makefile that passes CFLAGS on the command line, this entire assignment is ignored - including the -I../../.. needed to find git-compat-util.h and the $(BASIC_CFLAGS) needed for -DNO_OPENSSL: git-credential-osxkeychain.c:5:10: fatal error: 'git-compat-util.h' file not found Using += instead of ?= is not sufficient either, because command-line variables in Make override even += assignments. We need to use "override CFLAGS +=" to force these flags to be appended regardless of how CFLAGS was set. Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
I'm exploring ideas around a new cache-server infrastructure. One of the
trickiest parts of deploying new infrastructure is the need to have all
endpoints ready to go at launch without incremental learning.
This change adds new `gvfs.<verb>.cache-server` config keys that allow
for verb-by-verb overrides of the typical `gvfs.cache-server` config
key. These are loaded on a per-verb basis and then reset after the
request. Further, if there is a failure then the request is retried with
the base cache-server URL.
This would allow us to, for example, deploy a service that serves only
the `gvfs/prefetch` endpoint and see if that is improving on latency and
throughput expectations before moving on to the GET and POST verbs for
single and batched object downloads.
As I was adding tests, I realized that we should split this test script
into distinct parts so we can have a faster inner loop when testing
specific areas. I know that this script is frequently the longest script
running in our PR and CI builds, so the parallel split should help
significantly.
Use commit-by-commit review. I tried to keep the last two commits as
obviously "copy-and-paste only" except for a small change to the port
calculation to avoid overlap when using multiple ports in parallel
tests.
* [X] This change only applies to interactions with Azure DevOps and the
GVFS Protocol.
This includes the fixes I had to make to get https://github.com/microsoft/git/releases/tag/v2.53.0-rc0.vfs.0.0 to build, plus a fix for an overlooked stale `GIT-VERSION-GEN`,
mjcheetham
approved these changes
Feb 3, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR rebases Microsoft Git patches onto Git for Windows v2.53.0.windows.1.
Previous base: microsoft/vfs-2.53.0-rc2
Range-diff vs microsoft/vfs-2.53.0-rc2
@@ builtin/survey.c: struct survey_report_ref_summary { + uint32_t cnt_cached; /* see oi.whence */ + uint32_t cnt_loose; /* see oi.whence */ + uint32_t cnt_packed; /* see oi.whence */ -+ uint32_t cnt_dbcached; /* see oi.whence */ + + uint64_t sum_size; /* sum(object_size) */ + uint64_t sum_disk_size; /* sum(disk_size) */ @@ builtin/survey.c: static void increment_totals(struct survey_context *ctx, + ctx->report.reachable_objects.commits.parent_cnt_pbin[k]++; + base = &ctx->report.reachable_objects.commits.base; + break; - } ++ } + case OBJ_TREE: { + struct tree *tree = lookup_tree(ctx->repo, &oids->oid[i]); + if (tree) { @@ builtin/survey.c: static void increment_totals(struct survey_context *ctx, + uint64_t nr_entries; + int qb; + -+ parse_tree(tree); ++ repo_parse_tree(the_repository, tree); + init_tree_desc(&desc, &oids->oid[i], tree->buffer, tree->size); + nr_entries = 0; + while (tree_entry(&desc, &entry)) @@ builtin/survey.c: static void increment_totals(struct survey_context *ctx, + case OI_PACKED: + base->cnt_packed++; + break; -+ case OI_DBCACHED: -+ base->cnt_dbcached++; -+ break; + default: + break; -+ } + } + + base->sum_size += object_length; + base->sum_disk_size += disk_sizep;gvfs.fallbackconfig settingmonitor-componentsworkflow in msft-gituniversalconfigcommand for backwards compatibilityendpointcommandgvfs.sharedCachecache-servercommandclone --no-fetch-commits-and-treesfor backwards compatibilitygit stash -ucmdsizevariable is initializedlookup_commit()sane_istest()does not access array past endzinmallocz()strbuf_read()does NUL-terminate correctly