Skip to content

Commit d9fcb0b

Browse files
jeffhostetlerdscho
authored andcommitted
status: add status serialization mechanism
Teach STATUS to optionally serialize the results of a status computation to a file. Teach STATUS to optionally read an existing serialization file and simply print the results, rather than actually scanning. This is intended for immediate status results on extremely large repos and assumes the use of a service/daemon to maintain a fresh current status snapshot. 2021-10-30: packet_read() changed its prototype in ec9a37d (pkt-line.[ch]: remove unused packet_read_line_buf(), 2021-10-14). 2021-10-30: sscanf() now does an extra check that "%d" goes into an "int" and complains about "uint32_t". Replacing with "%u" fixes the compile-time error. 2021-10-30: string_list_init() was removed by abf897b (string-list.[ch]: remove string_list_init() compatibility function, 2021-09-28), so we need to initialize manually. Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
1 parent b2d8392 commit d9fcb0b

16 files changed

+1356
-4
lines changed

Documentation/config/status.adoc

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -77,3 +77,9 @@ status.submoduleSummary::
7777
the --ignore-submodules=dirty command-line option or the 'git
7878
submodule summary' command, which shows a similar output but does
7979
not honor these settings.
80+
81+
status.deserializePath::
82+
EXPERIMENTAL, Pathname to a file containing cached status results
83+
generated by `--serialize`. This will be overridden by
84+
`--deserialize=<path>` on the command line. If the cache file is
85+
invalid or stale, git will fall-back and compute status normally.

Documentation/git-status.adoc

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -151,6 +151,19 @@ ignored, then the directory is not shown, but all contents are shown.
151151
threshold.
152152
See also linkgit:git-diff[1] `--find-renames`.
153153

154+
--serialize[=<version>]::
155+
(EXPERIMENTAL) Serialize raw status results to stdout in a
156+
format suitable for use by `--deserialize`. Valid values for
157+
`<version>` are "1" and "v1".
158+
159+
--deserialize[=<path>]::
160+
(EXPERIMENTAL) Deserialize raw status results from a file or
161+
stdin rather than scanning the worktree. If `<path>` is omitted
162+
and `status.deserializePath` is unset, input is read from stdin.
163+
--no-deserialize::
164+
(EXPERIMENTAL) Disable implicit deserialization of status results
165+
from the value of `status.deserializePath`.
166+
154167
<pathspec>...::
155168
See the 'pathspec' entry in linkgit:gitglossary[7].
156169

@@ -424,6 +437,26 @@ quoted as explained for the configuration variable `core.quotePath`
424437
(see linkgit:git-config[1]).
425438
426439
440+
SERIALIZATION and DESERIALIZATION (EXPERIMENTAL)
441+
------------------------------------------------
442+
443+
The `--serialize` option allows git to cache the result of a
444+
possibly time-consuming status scan to a binary file. A local
445+
service/daemon watching file system events could use this to
446+
periodically pre-compute a fresh status result.
447+
448+
Interactive users could then use `--deserialize` to simply
449+
(and immediately) print the last-known-good result without
450+
waiting for the status scan.
451+
452+
The binary serialization file format includes some worktree state
453+
information allowing `--deserialize` to reject the cached data
454+
and force a normal status scan if, for example, the commit, branch,
455+
or status modes/options change. The format cannot, however, indicate
456+
when the cached data is otherwise stale -- that coordination belongs
457+
to the task driving the serializations.
458+
459+
427460
CONFIGURATION
428461
-------------
429462
Lines changed: 107 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,107 @@
1+
Git status serialization format
2+
===============================
3+
4+
Git status serialization enables git to dump the results of a status scan
5+
to a binary file. This file can then be loaded by later status invocations
6+
to print the cached status results.
7+
8+
The file contains the essential fields from:
9+
() the index
10+
() the "struct wt_status" for the overall results
11+
() the contents of "struct wt_status_change_data" for tracked changed files
12+
() the list of untracked and ignored files
13+
14+
Version 1 Format:
15+
=================
16+
17+
The V1 file begins with a required header section followed by optional
18+
sections for each type of item (changed, untracked, ignored). Individual
19+
item sections are only present if necessary. Each item section begins
20+
with an item-type header with the number of items in the section.
21+
22+
Each "line" in the format is encoded using pkt-line with a final LF.
23+
Flush packets are used to terminate sections.
24+
25+
-----------------
26+
PKT-LINE("version" SP "1")
27+
<v1-header-section>
28+
[<v1-changed-item-section>]
29+
[<v1-untracked-item-section>]
30+
[<v1-ignored-item-section>]
31+
-----------------
32+
33+
34+
V1 Header
35+
---------
36+
37+
The v1-header-section fields are taken directly from "struct wt_status".
38+
Each field is printed on a separate pkt-line. Lines for NULL string
39+
values are omitted. All integers are printed with "%d". OIDs are
40+
printed in hex.
41+
42+
v1-header-section = <v1-index-headers>
43+
<v1-wt-status-headers>
44+
PKT-LINE(<flush>)
45+
46+
v1-index-headers = PKT-LINE("index_mtime" SP <sec> SP <nsec> LF)
47+
48+
v1-wt-status-headers = PKT-LINE("is_initial" SP <integer> LF)
49+
[ PKT-LINE("branch" SP <branch-name> LF) ]
50+
[ PKT-LINE("reference" SP <reference-name> LF) ]
51+
PKT-LINE("show_ignored_files" SP <integer> LF)
52+
PKT-LINE("show_untracked_files" SP <integer> LF)
53+
PKT-LINE("show_ignored_directory" SP <integer> LF)
54+
[ PKT-LINE("ignore_submodule_arg" SP <string> LF) ]
55+
PKT-LINE("detect_rename" SP <integer> LF)
56+
PKT-LINE("rename_score" SP <integer> LF)
57+
PKT-LINE("rename_limit" SP <integer> LF)
58+
PKT-LINE("detect_break" SP <integer> LF)
59+
PKT-LINE("sha1_commit" SP <oid> LF)
60+
PKT-LINE("committable" SP <integer> LF)
61+
PKT-LINE("workdir_dirty" SP <integer> LF)
62+
63+
64+
V1 Changed Items
65+
----------------
66+
67+
The v1-changed-item-section lists all of the changed items with one
68+
item per pkt-line. Each pkt-line contains: a binary block of data
69+
from "struct wt_status_serialize_data_fixed" in a fixed header where
70+
integers are in network byte order and OIDs are in raw (non-hex) form.
71+
This is followed by one or two raw pathnames (not c-quoted) with NUL
72+
terminators (both NULs are always present even if there is no rename).
73+
74+
v1-changed-item-section = PKT-LINE("changed" SP <count> LF)
75+
[ PKT-LINE(<changed_item> LF) ]+
76+
PKT-LINE(<flush>)
77+
78+
changed_item = <byte[4] worktree_status>
79+
<byte[4] index_status>
80+
<byte[4] stagemask>
81+
<byte[4] score>
82+
<byte[4] mode_head>
83+
<byte[4] mode_index>
84+
<byte[4] mode_worktree>
85+
<byte[4] dirty_submodule>
86+
<byte[4] new_submodule_commits>
87+
<byte[20] oid_head>
88+
<byte[20] oid_index>
89+
<byte[*] path>
90+
NUL
91+
[ <byte[*] src_path> ]
92+
NUL
93+
94+
95+
V1 Untracked and Ignored Items
96+
------------------------------
97+
98+
These sections are simple lists of pathnames. They ARE NOT
99+
c-quoted.
100+
101+
v1-untracked-item-section = PKT-LINE("untracked" SP <count> LF)
102+
[ PKT-LINE(<pathname> LF) ]+
103+
PKT-LINE(<flush>)
104+
105+
v1-ignored-item-section = PKT-LINE("ignored" SP <count> LF)
106+
[ PKT-LINE(<pathname> LF) ]+
107+
PKT-LINE(<flush>)

Makefile

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1353,6 +1353,8 @@ LIB_OBJS += wrapper.o
13531353
LIB_OBJS += write-or-die.o
13541354
LIB_OBJS += ws.o
13551355
LIB_OBJS += wt-status.o
1356+
LIB_OBJS += wt-status-deserialize.o
1357+
LIB_OBJS += wt-status-serialize.o
13561358
LIB_OBJS += xdiff-interface.o
13571359
LIB_OBJS += xdiff/xdiffi.o
13581360
LIB_OBJS += xdiff/xemit.o

builtin/commit.c

Lines changed: 122 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -166,6 +166,70 @@ static int opt_parse_porcelain(const struct option *opt, const char *arg, int un
166166
return 0;
167167
}
168168

169+
static int do_serialize = 0;
170+
static int do_implicit_deserialize = 0;
171+
static int do_explicit_deserialize = 0;
172+
static char *deserialize_path = NULL;
173+
174+
/*
175+
* --serialize | --serialize=1 | --serialize=v1
176+
*
177+
* Request that we serialize our output rather than printing in
178+
* any of the established formats. Optionally specify serialization
179+
* version.
180+
*/
181+
static int opt_parse_serialize(const struct option *opt, const char *arg, int unset)
182+
{
183+
enum wt_status_format *value = (enum wt_status_format *)opt->value;
184+
if (unset || !arg)
185+
*value = STATUS_FORMAT_SERIALIZE_V1;
186+
else if (!strcmp(arg, "v1") || !strcmp(arg, "1"))
187+
*value = STATUS_FORMAT_SERIALIZE_V1;
188+
else
189+
die("unsupported serialize version '%s'", arg);
190+
191+
if (do_explicit_deserialize)
192+
die("cannot mix --serialize and --deserialize");
193+
do_implicit_deserialize = 0;
194+
195+
do_serialize = 1;
196+
return 0;
197+
}
198+
199+
/*
200+
* --deserialize | --deserialize=<path> |
201+
* --no-deserialize
202+
*
203+
* Request that we deserialize status data from some existing resource
204+
* rather than performing a status scan.
205+
*
206+
* The input source can come from stdin or a path given here -- or be
207+
* inherited from the config settings.
208+
*/
209+
static int opt_parse_deserialize(const struct option *opt UNUSED, const char *arg, int unset)
210+
{
211+
if (unset) {
212+
do_implicit_deserialize = 0;
213+
do_explicit_deserialize = 0;
214+
} else {
215+
if (do_serialize)
216+
die("cannot mix --serialize and --deserialize");
217+
if (arg) {
218+
/* override config or stdin */
219+
free(deserialize_path);
220+
deserialize_path = xstrdup(arg);
221+
}
222+
if (deserialize_path && *deserialize_path
223+
&& (access(deserialize_path, R_OK) != 0))
224+
die("cannot find serialization file '%s'",
225+
deserialize_path);
226+
227+
do_explicit_deserialize = 1;
228+
}
229+
230+
return 0;
231+
}
232+
169233
static int opt_parse_m(const struct option *opt, const char *arg, int unset)
170234
{
171235
struct strbuf *buf = opt->value;
@@ -1208,6 +1272,8 @@ static enum untracked_status_type parse_untracked_setting_name(const char *u)
12081272
return SHOW_NORMAL_UNTRACKED_FILES;
12091273
else if (!strcmp(u, "all"))
12101274
return SHOW_ALL_UNTRACKED_FILES;
1275+
else if (!strcmp(u,"complete"))
1276+
return SHOW_COMPLETE_UNTRACKED_FILES;
12111277
else
12121278
return SHOW_UNTRACKED_FILES_ERROR;
12131279
}
@@ -1503,6 +1569,19 @@ static int git_status_config(const char *k, const char *v,
15031569
s->relative_paths = git_config_bool(k, v);
15041570
return 0;
15051571
}
1572+
if (!strcmp(k, "status.deserializepath")) {
1573+
/*
1574+
* Automatically assume deserialization if this is
1575+
* set in the config and the file exists. Do not
1576+
* complain if the file does not exist, because we
1577+
* silently fall back to normal mode.
1578+
*/
1579+
if (v && *v && access(v, R_OK) == 0) {
1580+
do_implicit_deserialize = 1;
1581+
deserialize_path = xstrdup(v);
1582+
}
1583+
return 0;
1584+
}
15061585
if (!strcmp(k, "status.showuntrackedfiles")) {
15071586
enum untracked_status_type u;
15081587

@@ -1542,7 +1621,8 @@ struct repository *repo UNUSED)
15421621
static const char *rename_score_arg = (const char *)-1;
15431622
static struct wt_status s;
15441623
unsigned int progress_flag = 0;
1545-
int fd;
1624+
int try_deserialize;
1625+
int fd = -1;
15461626
struct object_id oid;
15471627
static struct option builtin_status_options[] = {
15481628
OPT__VERBOSE(&verbose, N_("be verbose")),
@@ -1557,6 +1637,12 @@ struct repository *repo UNUSED)
15571637
OPT_CALLBACK_F(0, "porcelain", &status_format,
15581638
N_("version"), N_("machine-readable output"),
15591639
PARSE_OPT_OPTARG, opt_parse_porcelain),
1640+
OPT_CALLBACK_F(0, "serialize", &status_format,
1641+
N_("version"), N_("serialize raw status data to stdout"),
1642+
PARSE_OPT_OPTARG | PARSE_OPT_NONEG, opt_parse_serialize),
1643+
OPT_CALLBACK_F(0, "deserialize", NULL,
1644+
N_("path"), N_("deserialize raw status data from file"),
1645+
PARSE_OPT_OPTARG, opt_parse_deserialize),
15601646
OPT_SET_INT(0, "long", &status_format,
15611647
N_("show status in long format (default)"),
15621648
STATUS_FORMAT_LONG),
@@ -1618,10 +1704,26 @@ struct repository *repo UNUSED)
16181704
s.show_untracked_files == SHOW_NO_UNTRACKED_FILES)
16191705
die(_("Unsupported combination of ignored and untracked-files arguments"));
16201706

1707+
if (s.show_untracked_files == SHOW_COMPLETE_UNTRACKED_FILES &&
1708+
s.show_ignored_mode == SHOW_NO_IGNORED)
1709+
die(_("Complete Untracked only supported with ignored files"));
1710+
16211711
parse_pathspec(&s.pathspec, 0,
16221712
PATHSPEC_PREFER_FULL,
16231713
prefix, argv);
16241714

1715+
/*
1716+
* If we want to try to deserialize status data from a cache file,
1717+
* we need to re-order the initialization code. The problem is that
1718+
* this makes for a very nasty diff and causes merge conflicts as we
1719+
* carry it forward. And it easy to mess up the merge, so we
1720+
* duplicate some code here to hopefully reduce conflicts.
1721+
*/
1722+
try_deserialize = (!do_serialize &&
1723+
(do_implicit_deserialize || do_explicit_deserialize));
1724+
if (try_deserialize)
1725+
goto skip_init;
1726+
16251727
enable_fscache(0);
16261728
if (status_format != STATUS_FORMAT_PORCELAIN &&
16271729
status_format != STATUS_FORMAT_PORCELAIN_V2)
@@ -1636,6 +1738,7 @@ struct repository *repo UNUSED)
16361738
else
16371739
fd = -1;
16381740

1741+
skip_init:
16391742
s.is_initial = repo_get_oid(the_repository, s.reference, &oid) ? 1 : 0;
16401743
if (!s.is_initial)
16411744
oidcpy(&s.oid_commit, &oid);
@@ -1652,6 +1755,24 @@ struct repository *repo UNUSED)
16521755
s.rename_score = parse_rename_score(&rename_score_arg);
16531756
}
16541757

1758+
if (try_deserialize) {
1759+
if (s.relative_paths)
1760+
s.prefix = prefix;
1761+
1762+
if (wt_status_deserialize(&s, deserialize_path) == DESERIALIZE_OK)
1763+
return 0;
1764+
1765+
/* deserialize failed, so force the initialization we skipped above. */
1766+
enable_fscache(1);
1767+
repo_read_index_preload(the_repository, &s.pathspec, 0);
1768+
refresh_index(the_repository->index, REFRESH_QUIET|REFRESH_UNMERGED, &s.pathspec, NULL, NULL);
1769+
1770+
if (use_optional_locks())
1771+
fd = repo_hold_locked_index(the_repository, &index_lock, 0);
1772+
else
1773+
fd = -1;
1774+
}
1775+
16551776
wt_status_collect(&s);
16561777

16571778
if (0 <= fd)

contrib/completion/git-completion.bash

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1837,7 +1837,7 @@ _git_clone ()
18371837
esac
18381838
}
18391839

1840-
__git_untracked_file_modes="all no normal"
1840+
__git_untracked_file_modes="all no normal complete"
18411841

18421842
__git_trailer_tokens ()
18431843
{

meson.build

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -540,6 +540,8 @@ libgit_sources = [
540540
'write-or-die.c',
541541
'ws.c',
542542
'wt-status.c',
543+
'wt-status-deserialize.c',
544+
'wt-status-serialize.c',
543545
'xdiff-interface.c',
544546
'xdiff/xdiffi.c',
545547
'xdiff/xemit.c',

pkt-line.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -230,7 +230,7 @@ static int do_packet_write(const int fd_out, const char *buf, size_t size,
230230
return 0;
231231
}
232232

233-
static int packet_write_gently(const int fd_out, const char *buf, size_t size)
233+
int packet_write_gently(const int fd_out, const char *buf, size_t size)
234234
{
235235
struct strbuf err = STRBUF_INIT;
236236
if (do_packet_write(fd_out, buf, size, &err)) {

pkt-line.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,7 @@ void packet_write(int fd_out, const char *buf, size_t size);
2929
void packet_buf_write(struct strbuf *buf, const char *fmt, ...) __attribute__((format (printf, 2, 3)));
3030
int packet_flush_gently(int fd);
3131
int packet_write_fmt_gently(int fd, const char *fmt, ...) __attribute__((format (printf, 2, 3)));
32+
int packet_write_gently(const int fd_out, const char *buf, size_t size);
3233
int write_packetized_from_fd_no_flush(int fd_in, int fd_out);
3334
int write_packetized_from_buf_no_flush_count(const char *src_in, size_t len,
3435
int fd_out, int *packet_counter);

0 commit comments

Comments
 (0)