revset: improve perf of revset intersections #1591

claytonrcarter · 2025-08-26T12:03:40Z

This implements a simple optimization for revset functions intersection() and difference() that significantly improves the eval time of revsets like stack() & paths.changed(foo).

When revsets are evaluated, their arguments are eagerly evaluated – totally independently – before each particular dag/revset operation is applied to them. The issue I have found is that expressions like stack() & paths.changed(foo) have to evaluate both stack() and paths.changed(foo) before the intersection can be calculated, but the "matching" revsets (like paths.changed() and author.email()) operate only on all visible commits. This means that, in cases like stack() & paths.changed(foo), it examines way more commits that it really needs to and often feels very slow. See #1586, for example.

The solution here accomplishes this by using the result of the left-hand-side during the evaluation of the right-hand-side. This was already sort of happening for the branches() revset (from when we added the ability to filter the matched branch by name), so this just extends this "which commits to match on" plumbing into the rest of the revsets, and then updates the intersection() and difference() functions to use that plumbing. (Both of these functions will always limit the results to – at most – the result of the lhs.)

The other 2-arg revsets (union(), only(), and range()) are not affected, though I wonder if a related optimization could be applied to only() and range(), both of which should only care about the descendants of of the lhs. (I think.) In any event, I have not updated those because I haven't experienced slow performance when I use them.

Original description

When we evaluate binary revset expressions (for example stack() & branches()) we currently evaluate both revsets on their own and then evaluate the intersection of the two. This change passes the left hand side revset result into the right hand side function as a "hint", so that the rhs can reduce the scope of it's evaluation to just those commits from the lhs result. This is implemented to only affect intersection() and difference() (because they are the only binary functions that reduce one set by another) and has the most dramatic effect on revsets that have historically been evaluated against all public commits like paths.changed().

The main caveat of this approach is that – because the hints are passed left-to-right only – 2 equivalent revsets may evaluate in 2 very different amounts of time: stack() & paths.changed(foo) and paths.changed(foo) & stack() will (should!) produce identical results, but – with this change – the former will be very fast while the latter will be possibly slow.

I have not done any systematic benchmarks, but my experience of using a revset like stack() & paths.changed(foo) is that the evaluation time goes from "hmm, that's taking longer than I expected" to "oh, that was so fast that I should check that it's correct". So I'm satisfied.

I'm certain that there are vast libraries full of rigorous academic research and battle-tested, real-world reports about how to optimize order-of-execution for things like this. I've looked into none of it. This solution was fairly simple, easy to implement, and provides a significant performance boost in many cases. I don't believe that this will prevent us from making further improvements in the future, either.

Alternate approaches
That said, one alternate approach may be to record hints (different kinds of hints) at compile time, and then to use those to decide in which order to evaluate binary expressions. For example, if something like stack() as tagged or hinted as Hint::ProbablySmall and paths.changed() was hinted as Hint::PossiblyLarge, we could use that info to dynamically determine the order in which to evaluate revsets, evaluating the ProbablySmall first and then using that to evaluate the PossiblyLarge function.. That being said, it seems fairly common for arguments to be evaluated left to right, so it's not clear if this would be beneficial or just possibly confusing or counter-intuitive.

claytonrcarter · 2025-08-27T20:54:47Z

~~Oops. I "refactored" this last night and must have broken it; it's no longer fast. And apparently not as simple as I thought.~~ Fixed

`make_pattern_matcher_for_set()` was added in 8ca6b55 to operate on a given set (instead of all visible commits), so this just adds plumbing so that *all* revset functions are able to receive an `Option<CommitSet>` which they may use to optimize (or not) their own evaluation. Having done that, though, no such hints are actually passed around yet.

Now that we can pass revsets around to limit how many commits a rhs examines during it's evaluation, this change actually starts passing such hints, but only for `intersection()` and `difference()`. This is a crude approach, of course, but it gets the job done and it significantly improves performance for `intersection()` and `difference()`.

claytonrcarter force-pushed the revset-perf-hints branch 3 times, most recently from 9a67137 to 57011ca Compare August 27, 2025 01:57

claytonrcarter marked this pull request as ready for review August 27, 2025 02:39

claytonrcarter force-pushed the revset-perf-hints branch from 57011ca to 2547cc0 Compare August 27, 2025 02:41

claytonrcarter marked this pull request as draft August 27, 2025 20:54

claytonrcarter force-pushed the revset-perf-hints branch from 2547cc0 to 84bfc80 Compare August 27, 2025 22:23

claytonrcarter marked this pull request as ready for review August 27, 2025 22:23

claytonrcarter marked this pull request as draft August 28, 2025 00:47

claytonrcarter force-pushed the revset-perf-hints branch from 84bfc80 to da54821 Compare August 28, 2025 01:26

claytonrcarter force-pushed the revset-perf-hints branch from da54821 to 6dae2a3 Compare September 7, 2025 11:56

claytonrcarter added 2 commits September 7, 2025 08:40

claytonrcarter force-pushed the revset-perf-hints branch from 6dae2a3 to d6a4f3c Compare September 7, 2025 12:41

claytonrcarter marked this pull request as ready for review September 7, 2025 12:45

claytonrcarter merged commit 49447a7 into arxanas:master Sep 7, 2025
13 checks passed

claytonrcarter deleted the revset-perf-hints branch September 7, 2025 12:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

revset: improve perf of revset intersections #1591

revset: improve perf of revset intersections #1591

Uh oh!

claytonrcarter commented Aug 26, 2025 •

edited

Loading

Uh oh!

claytonrcarter commented Aug 27, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

revset: improve perf of revset intersections #1591

revset: improve perf of revset intersections #1591

Uh oh!

Conversation

claytonrcarter commented Aug 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

claytonrcarter commented Aug 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

claytonrcarter commented Aug 26, 2025 •

edited

Loading

claytonrcarter commented Aug 27, 2025 •

edited

Loading