Skip to content

Conversation

@rodaine
Copy link
Member

@rodaine rodaine commented Nov 18, 2025

This patch updates our benchmarks to be more focused and "micro" which should make it easier to identify and address particular perf bottlenecks. Only a couple of benchmarks have been added so far, covering singular scalar fields, repeated scalar and message fields, and repeated fields including a unique rule. The benchmarks can be run consistently with make bench which has some args that may be customized (see the Makefile).


This patch also includes a handful of performance improvements, focused on heap usage (though there was a ~5% CPU time improvement):

  • We use cel-go's ReduceResiduals to minimize/optimize the CEL programs. This means the rule & rules globals variables used by standard and predefined CEL expressions can be eliminated from the final program (since the values we use from it are injected as constant literals in the reduced AST). However, these globals were persisted in the cel.Env which caused cel-go to allocate a composite Activation to make them accessible alongside the this variable. Instead of using CEL globals, this patch uses them as normal variables prior to computing residuals, and elides them during actual execution of the CEL program, avoiding the allocation.

  • In order to keep repeated.unique O(n), during validation we build up a map[T]struct{}{} to check for uniqueness in the list. This rule is particularly expensive, resulting in this map being allocated and thrown away on every validation. While this rule could avoid allocations altogether by making the comparison O(n^2) (effectively the CEL expression this.all(x, this.exists_one(y, x == y))), I instead opted to have the unique maps pull from a sync.Pool. Since the O(n^2) is only an issue for large lists, in the future we could either use a heuristic to swap between the CEL above or the map-based solution.

  • errors.As ends up allocating when you take the double-pointer to the target error, even when the source error is nil. (The escape analysis can't see that far, unfortunately). Since the majority of the time validation is successful, err is almost always nil. Performing a nil check before calls to errors.As eliminates this allocation (albeit small).

  • For every call to Validate, we construct a config struct that's drives the behavior of that single validation (things like fail-fast mode, filtering, and the now CEL function). Typically, these are set globally on the Validator instance itself, but can be overridden at validation time. However, even if they weren't set at validation time, we were still computing a new config object for every call, causing an extra allocation. Now, the config is constructed only once with the Validator and only copied and overwritten if validation time options are provided.

These changes resulted in the following improvements on the (admittedly limited) set of benchmarks added in d716bad:

→ benchstat .tmp/bench/2025-11-18:12:58:39.bench.txt .tmp/bench/2025-11-18:13:01:39.bench.txt 
goos: darwin
goarch: arm64:52:03.cpu.profile  2025-11-18:12:58:39.bench.txt    2025-11-18:12:58:39.mem.profile  2025-11-18:13:01:39.cpu.profile                                 
pkg: buf.build/go/protovalidate
cpu: Apple M1 Max
                          │ .tmp/bench/2025-11-18:12:58:39.bench.txt │ .tmp/bench/2025-11-18:13:01:39.bench.txt │
                          │                  sec/op                  │      sec/op        vs base               │
Scalar-10                                                421.8n ± 1%         396.0n ± 3%  -6.12% (p=0.000 n=10)
Repeated/Scalar-10                                       480.5n ± 1%         455.0n ± 2%  -5.30% (p=0.001 n=10)
Repeated/Message-10                                      607.0n ± 1%         561.2n ± 1%  -7.55% (p=0.000 n=10)
Repeated/Unique/Scalar-10                                735.4n ± 3%         686.2n ± 2%  -6.68% (p=0.000 n=10)
Repeated/Unique/Bytes-10                                 987.1n ± 4%         933.9n ± 3%  -5.39% (p=0.000 n=10)
geomean                                                  616.8n              578.5n       -6.21%

                          │ .tmp/bench/2025-11-18:12:58:39.bench.txt │ .tmp/bench/2025-11-18:13:01:39.bench.txt  │
                          │                   B/op                   │     B/op      vs base                     │
Scalar-10                                                 72.00 ± 0%      0.00 ± 0%  -100.00% (p=0.000 n=10)
Repeated/Scalar-10                                        192.0 ± 0%     120.0 ± 0%   -37.50% (p=0.000 n=10)
Repeated/Message-10                                       256.0 ± 0%     120.0 ± 0%   -53.12% (p=0.000 n=10)
Repeated/Unique/Scalar-10                                1064.0 ± 0%     536.0 ± 0%   -49.62% (p=0.000 n=10)
Repeated/Unique/Bytes-10                                2.398Ki ± 0%   1.743Ki ± 0%   -27.32% (p=0.000 n=10)
geomean                                                   391.9                      ?                       ¹ ²
¹ summaries must be >0 to compute geomean
² ratios must be >0 to compute geomean

                          │ .tmp/bench/2025-11-18:12:58:39.bench.txt │ .tmp/bench/2025-11-18:13:01:39.bench.txt │
                          │                allocs/op                 │  allocs/op   vs base                     │
Scalar-10                                                 3.000 ± 0%    0.000 ± 0%  -100.00% (p=0.000 n=10)
Repeated/Scalar-10                                        6.000 ± 0%    3.000 ± 0%   -50.00% (p=0.000 n=10)
Repeated/Message-10                                       8.000 ± 0%    3.000 ± 0%   -62.50% (p=0.000 n=10)
Repeated/Unique/Scalar-10                                 40.00 ± 0%    34.00 ± 0%   -15.00% (p=0.000 n=10)
Repeated/Unique/Bytes-10                                  88.00 ± 0%    73.00 ± 0%   -17.05% (p=0.000 n=10)
geomean                                                   13.84                     ?                       ¹ ²
¹ summaries must be >0 to compute geomean
² ratios must be >0 to compute geomean

@rodaine rodaine requested review from Alfus, jhump and pkwarren November 18, 2025 18:47
@rodaine
Copy link
Member Author

rodaine commented Nov 18, 2025

There's still a lot more to tackle; here's the allocations profile after these changes. Much of the allocations are within the cel-go library, which may require upstream changes to mitigate. Particularly, there are allocations for every list (or map) field which should be avoidable, or at least pool-able.

pprof002

@rodaine rodaine mentioned this pull request Nov 18, 2025
@github-actions
Copy link

The latest Buf updates on your PR. Results from workflow Buf / validate-protos (pull_request).

BuildFormatLintBreakingUpdated (UTC)
✅ passed✅ passed✅ passed⏩ skippedNov 18, 2025, 7:04 PM

@rodaine rodaine merged commit 15821df into main Nov 18, 2025
8 checks passed
@rodaine rodaine deleted the rodaine/bench-perf-improvements branch November 18, 2025 19:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants