Skip to content

Commit f559d6d

Browse files
pks-tgitster
authored andcommitted
revision: avoid hitting packfiles when commits are in commit-graph
When queueing references in git-rev-list(1), we try to optimize parsing of commits via the commit-graph. To do so, we first look up the object's type, and if it is a commit we call `repo_parse_commit()` instead of `parse_object()`. This is quite inefficient though given that we're always uncompressing the object header in order to determine the type. Instead, we can opportunistically search the commit-graph for the object ID: in case it's found, we know it's a commit and can directly fill in the commit object without having to uncompress the object header. Expose a new function `lookup_commit_in_graph()`, which tries to find a commit in the commit-graph by ID, and convert `get_reference()` to use this function. This provides a big performance win in cases where we load references in a repository with lots of references pointing to commits. The following has been executed in a real-world repository with about 2.2 million refs: Benchmark #1: HEAD~: rev-list --unsorted-input --objects --quiet --not --all --not $newrev Time (mean ± σ): 4.458 s ± 0.044 s [User: 4.115 s, System: 0.342 s] Range (min … max): 4.409 s … 4.534 s 10 runs Benchmark #2: HEAD: rev-list --unsorted-input --objects --quiet --not --all --not $newrev Time (mean ± σ): 3.089 s ± 0.015 s [User: 2.768 s, System: 0.321 s] Range (min … max): 3.061 s … 3.105 s 10 runs Summary 'HEAD: rev-list --unsorted-input --objects --quiet --not --all --not $newrev' ran 1.44 ± 0.02 times faster than 'HEAD~: rev-list --unsorted-input --objects --quiet --not --all --not $newrev' Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
1 parent 809ea28 commit f559d6d

File tree

3 files changed

+40
-10
lines changed

3 files changed

+40
-10
lines changed

commit-graph.c

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -891,6 +891,30 @@ static int find_commit_pos_in_graph(struct commit *item, struct commit_graph *g,
891891
}
892892
}
893893

894+
struct commit *lookup_commit_in_graph(struct repository *repo, const struct object_id *id)
895+
{
896+
struct commit *commit;
897+
uint32_t pos;
898+
899+
if (!repo->objects->commit_graph)
900+
return NULL;
901+
if (!search_commit_pos_in_graph(id, repo->objects->commit_graph, &pos))
902+
return NULL;
903+
if (!repo_has_object_file(repo, id))
904+
return NULL;
905+
906+
commit = lookup_commit(repo, id);
907+
if (!commit)
908+
return NULL;
909+
if (commit->object.parsed)
910+
return commit;
911+
912+
if (!fill_commit_in_graph(repo, commit, repo->objects->commit_graph, pos))
913+
return NULL;
914+
915+
return commit;
916+
}
917+
894918
static int parse_commit_in_graph_one(struct repository *r,
895919
struct commit_graph *g,
896920
struct commit *item)

commit-graph.h

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,14 @@ int open_commit_graph(const char *graph_file, int *fd, struct stat *st);
4040
*/
4141
int parse_commit_in_graph(struct repository *r, struct commit *item);
4242

43+
/*
44+
* Look up the given commit ID in the commit-graph. This will only return a
45+
* commit if the ID exists both in the graph and in the object database such
46+
* that we don't return commits whose object has been pruned. Otherwise, this
47+
* function returns `NULL`.
48+
*/
49+
struct commit *lookup_commit_in_graph(struct repository *repo, const struct object_id *id);
50+
4351
/*
4452
* It is possible that we loaded commit contents from the commit buffer,
4553
* but we also want to ensure the commit-graph content is correctly

revision.c

Lines changed: 8 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -360,20 +360,18 @@ static struct object *get_reference(struct rev_info *revs, const char *name,
360360
unsigned int flags)
361361
{
362362
struct object *object;
363+
struct commit *commit;
363364

364365
/*
365-
* If the repository has commit graphs, repo_parse_commit() avoids
366-
* reading the object buffer, so use it whenever possible.
366+
* If the repository has commit graphs, we try to opportunistically
367+
* look up the object ID in those graphs. Like this, we can avoid
368+
* parsing commit data from disk.
367369
*/
368-
if (oid_object_info(revs->repo, oid, NULL) == OBJ_COMMIT) {
369-
struct commit *c = lookup_commit(revs->repo, oid);
370-
if (!repo_parse_commit(revs->repo, c))
371-
object = (struct object *) c;
372-
else
373-
object = NULL;
374-
} else {
370+
commit = lookup_commit_in_graph(revs->repo, oid);
371+
if (commit)
372+
object = &commit->object;
373+
else
375374
object = parse_object(revs->repo, oid);
376-
}
377375

378376
if (!object) {
379377
if (revs->ignore_missing)

0 commit comments

Comments
 (0)