-
Notifications
You must be signed in to change notification settings - Fork 102
Rebase to v2.51.0 #785
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: vfs-2.51.0
Are you sure you want to change the base?
Rebase to v2.51.0 #785
Conversation
Teach status serialization to take an optional pathname on the command line to direct that cache data be written there rather than to stdout. When used this way, normal status results will still be written to stdout. When no path is given, only binary serialization data is written to stdout. Usage: git status --serialize[=<path>] Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Teach status deserialize code to reject status cache when printing in porcelain V2 and there are unresolved conflicts in the cache file. A follow-on task might extend the cache format to include this additiona data. See code for longer explanation. Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
When using a virtual file system layer, the FSMonitor does not make sense. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Changes to the global or repo-local excludes files can change the results returned by "git status" for untracked files. Therefore, it is important that the exclude-file values used during serialization are still current at the time of deserialization. Teach "git status --serialize" to report metadata on the user's global exclude file (which defaults to "$XDG_HOME/git/ignore") and for the repo-local excludes file (which is in ".git/info/excludes"). Serialize will record the pathnames and mtimes for these files in the serialization header (next to the mtime data for the .git/index file). Teach "git status --deserialize" to validate this new metadata. If either exclude file has changed since the serialization-cache-file was written, then deserialize will reject the cache file and force a full/normal status run. Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
When sparse-checkout is enabled, add the sparse-checkout percentage to the Trace2 data stream. This number was already computed and printed on the console in the "You are in a sparse checkout..." message. It would be helpful to log it too for performance monitoring. Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
Teach `git status --deserialize` to either wait indefintely or immediately fail if the status serialization cache file is stale. Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Add VFS checkout hydration percentage information to the default `git status` output. When VFS is enable, users will now see a "You are in a partially-hydrated checkout with <percentage> of tracked files present." message. Upstream `git status` normally prints a "You are in a sparse checkout with <percentage> of tracked files present." This message was hidden in `microsoft/git` when `core_virtualfilesystem` is set (because GVFS users are always (and secretly) in a sparse checkout) and it was thought that it would annoy users. However, we now believe that it may be helpful for users to always see the percentage and know when they are over-hyrdated, since over-hyrdation can occur by accident and may greatly impact their Git performance. Knowing this value may help with GVFS support. Helped-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
Add trace2 region around read_object_process to collect time spent waiting for missing objects to be dynamically fetched. Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
With the "--untracked-files=complete" option status computes a superset of the untracked files. We use this when writing the status cache. If subsequent deserialize commands ask for either the complete set or one of the "no", "normal", or "all" subsets, it can still use the cache file because of filtering in the deserialize parser. When running status with the "-uno" option, the long format status would print a "(use -u to show untracked files)" hint. When deserializing with the "-uno" option and using a cache computed with "-ucomplete", the "nothing to commit, working tree clean" message would be printed instead of the hint. It was easy to miss because the correct hint message was printed if the cache was rejected for any reason (and status did the full fallback). The "struct wt_status des" structure was initialized with the content of the status cache (and thus defaulted to "complete"). This change sets "des.show_untracked_files" to the requested subset from the command-line or config. This allows the long format to print the hint. Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Add trace2 region and data events describing attempts to deserialize status data using a status cache. A category:status, label:deserialize region is pushed around the deserialize code. Deserialization results when reading from a file are: category:status, path = <path> category:status, polled = <number_of_attempts> category:status, result = "ok" | "reject" When reading from STDIN are: category:status, path = "STDIN" category:status, result = "ok" | "reject" Status will fallback and run a normal status scan when a "reject" is reported (unless "--deserialize-wait=fail"). If "ok" is reported, status was able to use the status cache and avoid scanning the workdir. Additionally, a cmd_mode is emitted for each step: collection, deserialization, and serialization. For example, if deserialization is attempted and fails and status falls back to actually computing the status, a cmd_mode message containing "deserialize" is issued and then a cmd_mode for "collect" is issued. Also, if deserialization fails, a data message containing the rejection reason is emitted. Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
When using fsmonitor the CE_FSMONITOR_VALID flag should be checked when wanting to know if the entry has been updated. If the flag is set the entry should be considered up to date and the same as if the CE_UPTODATE is set. In order to trust the CE_FSMONITOR_VALID flag, the fsmonitor data needs to be refreshed when the fsmonitor bitmap is applied to the index in tweak_fsmonitor. Since the fsmonitor data is kept up to date for every command, some tests needed to be updated to take that into account. istate->untracked->use_fsmonitor was set in tweak_fsmonitor when the fsmonitor bitmap data was loaded and is now in refresh_fsmonitor since that is being called in tweak_fsmonitor. refresh_fsmonitor will only be called once and any other callers should be setting it when refreshing the fsmonitor data so that code can use the fsmonitor data when checking untracked files. When writing the index, fsmonitor_last_update is used to determine if the fsmonitor bitmap should be created and the extension data written to the index. When running through unpack-trees this is not copied to the result index. This makes the next time a git command is ran do all the work of lstating all files to determine what is clean since all entries in the index are marked as dirty since there wasn't any fsmonitor data saved in the index extension. Copying the fsmonitor_last_update to the result index will cause the extension data for fsmonitor to be in the index for the next git command to use. Signed-off-by: Kevin Willford <Kevin.Willford@microsoft.com>
Add trace information around status serialization. Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
The fsmonitor script that can be used for running all the git tests using watchman was causing some of the tests to fail because it wrote to stderr and created some files for debugging purposes. Add a new debug script to use with debugging and modify the other script to remove the code that would cause tests to fail. Signed-off-by: Kevin Willford <Kevin.Willford@microsoft.com>
Report virtual filesystem summary data. Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Disable deserialization when verbose output requested. Verbose mode causes Git to print diffs for modified files. This requires the index to be loaded to have the currently staged OID values. Without loading the index, verbose output make it look like everything was deleted. Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Create trace2_initialize_clock() and call from main() to capture process start time in isolation and before other sub-systems are ready. Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Verify that `git status --deserialize=x -v` does not crash and generates the same output as a normal (scanning) status command. These issues are described in the previous 2 commits. Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Teach Git to not throw a fatal error when an explicitly-specified status-cache file (`git status --deserialize=<foo>`) could not be found or opened for reading and silently fallback to a traditional scan. This matches the behavior when the status-cache file is implicitly given via a config setting. Note: the current version causes a test to start failing. Mark this as an expected result for now. Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Add trace2_thread_start() and trace2_thread_exit() events to the worker threads used to read the index. This gives per-thread perf data. These workers were introduced in: abb4bb8 read-cache: load cache extensions on a worker thread 77ff112 read-cache: load cache entries on worker threads Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
…ension Add regions around code to read and write the cache-tree extension when the index is read or written. This is an experiment and may be dropped in future releases if we don't need it anymore. This experiment demonstrates that it takes more time to parse and deserialize the cache-tree extension than it does to read the cache-entries. Commits [1] and [2] spreads cache-entry reading across N-1 cores and dedicates a single core to simultaneously read the index extensions. Local testing (on my machine) shows that reading the cache-tree extension takes ~0.28 seconds. The 11 cache-entry threads take ~0.08 seconds. The main thread is blocked for 0.15 to 0.20 seconds waiting for the extension thread to finish. Let's use this commit to gather some telemetry and confirm this. My point is that improvements, such as index V5 which makes the cache entries smaller, may improve performance, but the gains may be limited because of this extension. And that we may need to look inside the cache-tree extension to truly improve do_read_index() performance. [1] abb4bb8 read-cache: load cache extensions on a worker thread [2] 77ff112 read-cache: load cache entries on worker threads Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
…and report_tracking() Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Teach subprocess_start() to use a copy of the passed `cmd` string rather than borrowing the buffer from the caller. Some callers of subprocess_start() pass the value returned from find_hook() which points to a static buffer and therefore is only good until the next call to find_hook(). This could cause problems for the long-running background processes managed by sub-process.c where later calls to subprocess_find_entry() to get an existing process will fail. This could cause more than 1 long-running process to be created. TODO Need to confirm, but if only read_object_hook() uses TODO subprocess_start() in this manner, we could drop this TODO commit when we drop support for read_object_hook(). Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Add data for the number of files created/overwritten and deleted during the checkout. Give proper category name to all events in unpack-trees.c and eliminate "exp". This is modified slightly from the original version due to interactions with 26f924d (unpack-trees: exit check_updates() early if updates are not wanted, 2020-01-07). Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
In some instances, CodeQL's web UI on github.com leaves questions unanswered. For example, in some alerts it is really necessary to follow the entire "taint flow" to understand why something might be an issue. The alerts for the `cpp/uncontrolled-allocation-size` rule, for example, are all false positives, and only when inspecting the exact flow does it become obvious that one alert wants to point out that the size of a binary patch hunk, which is specified in the patch, is then used to determine how much memory to allocate, which may potentially run out of memory (and is hence just Git doing what it is asked to, and does not need to be changed). To help with those issues, publish the `.sarif` file as part of every workflow run; This allows downloading that file and inspecting it e.g. with the SARIF viewer extension in VS Code (for details, see https://marketplace.visualstudio.com/items?itemName=MS-SarifVSCode.sarif-viewer). Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
As pointed out by CodeQL, `lookup_commit()` can return NULL. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
A couple of CodeQL's queries are opinionated in a way that is obviously not shared by Git's source code's state, and apparently intentionally so. For example, the "For loop variable changed in body" query as well as the "No trivial switch statements" one result in too many results that are apparently intentional in Git's source code. Let's not worry about those, then. Also, Git has plenty of instances where variables shadow other variables. Other valid yet not quite critical issues identified by CodeQL include complex conditionals and nested switch statements spanning several pages. We probably want to address these issues at some stage, but they are not as critical as other problems pointed out by CodeQL, so let's silence those queries for now and take care of them at a later stage. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
The code is a bit too hard to reason about for CodeQL to figure out whether the `fill_commit_graph_info()` function is at all called after `write_commit_graph()` returns (and hence whether `topo_levels` goes out of context before it is used again). The Git project insists that this is correct (and does not want to make the code more obviously correct), so let's silence CodeQL's complaints in this instance. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
…ray past end Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
…oes NUL-terminate correctly Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Let's exclude GitWeb from being scanned; It is not distributed by us. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Some fixes in `clar`, pointed out by CodeQL. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
A few places where CodeQL thinks that variables might be uninitialized. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
These patches implement some defensive programming to address complaints some static analyzers might have. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
CodeQL pointed out a couple of issues, which are addressed in this patch series. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
This patch series has been long in the making, ever since Johannes Nicolai and myself spiked this in November/December 2020. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
On Linux, the following command would cause the terminal to be stuck waiting: ``` git fetch origin foobar ``` The issue would be that the fetch would fail with the error ``` fatal: couldn't find remote ref foobar ``` but the underlying `git-gvfs-helper` process wouldn't die. The `subprocess_exit_handler()` method would close its stdin and stdout, but that wouldn't be enough to cause the process to end. This PR addresses that by skipping the `finish_command()` call of the `clean_on_exit_handler` and instead lets `cleanup_children()` send a SIGTERM to terminate those spawned child processes.
This patch series has been long in the making, ever since Johannes Nicolai and myself spiked this in November/December 2020. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
This patch series has been long in the making, ever since Johannes Nicolai and myself spiked this in November/December 2020. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Range-diff relative to v2.51.0-rc2
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Super-clean range-diff. Almost suspiciously clean.
codeql: run the CodeQL workflow every week Run the CodeQL GitHub workflow every week, on Mondays at 03:00 UTC. It's good practice to run this regularly as queries are updated. [This was accidentally dropped from vfs-2.50.1] Signed-off-by: Matthew John Cheetham <mjcheetham@outlook.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
We need that test case to verify the functionality... Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
# move the loose objects into the shared objects as if they had been | ||
# fetched via the `gvfs-helper` | ||
mv gvfs-worktree/.git/objects/?? gvfs-shared/objects/ && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@derrickstolee I know this is a blast from the past, but I do have a question: When you wrote 2864f7c#diff-a9031966d3e59230a29bf2a48c0902e7f74dfd1debf7f813b3880ca458af63a7L1026-R1031, the idea was to care about the loose objects downloaded by the gvfs-helper
, right?
The problem I have with this (and which I only realized today, when I wrote this here regression test case in order to validate my rebased variant of your original 2864f7c): There are two places in Scalar enlistments where loose objects can exist, the shared objects and the repository-local objects/
. The latter houses e.g. local commits. And those latter seem not to ever be repacked with 2864f7c (which remains true for my rebased variant).
Is this problematic in real-world enlistements? I guess it it a lot less important to take care of locally-generated Git objects (which are bandwidth-limited by the fact that they are generated by an individual human being) than to take care of potentially millions of hydrated Git objects from the remote repository. Is that the reasoning we should adopt here?
The usual rebase to upstream.