Benchmark and performance improvements #289

rodaine · 2025-11-18T18:47:41Z

This patch updates our benchmarks to be more focused and "micro" which should make it easier to identify and address particular perf bottlenecks. Only a couple of benchmarks have been added so far, covering singular scalar fields, repeated scalar and message fields, and repeated fields including a unique rule. The benchmarks can be run consistently with make bench which has some args that may be customized (see the Makefile).

This patch also includes a handful of performance improvements, focused on heap usage (though there was a ~5% CPU time improvement):

We use cel-go's ReduceResiduals to minimize/optimize the CEL programs. This means the rule & rules globals variables used by standard and predefined CEL expressions can be eliminated from the final program (since the values we use from it are injected as constant literals in the reduced AST). However, these globals were persisted in the cel.Env which caused cel-go to allocate a composite Activation to make them accessible alongside the this variable. Instead of using CEL globals, this patch uses them as normal variables prior to computing residuals, and elides them during actual execution of the CEL program, avoiding the allocation.
In order to keep repeated.unique O(n), during validation we build up a map[T]struct{}{} to check for uniqueness in the list. This rule is particularly expensive, resulting in this map being allocated and thrown away on every validation. While this rule could avoid allocations altogether by making the comparison O(n^2) (effectively the CEL expression this.all(x, this.exists_one(y, x == y))), I instead opted to have the unique maps pull from a sync.Pool. Since the O(n^2) is only an issue for large lists, in the future we could either use a heuristic to swap between the CEL above or the map-based solution.
errors.As ends up allocating when you take the double-pointer to the target error, even when the source error is nil. (The escape analysis can't see that far, unfortunately). Since the majority of the time validation is successful, err is almost always nil. Performing a nil check before calls to errors.As eliminates this allocation (albeit small).
For every call to Validate, we construct a config struct that's drives the behavior of that single validation (things like fail-fast mode, filtering, and the now CEL function). Typically, these are set globally on the Validator instance itself, but can be overridden at validation time. However, even if they weren't set at validation time, we were still computing a new config object for every call, causing an extra allocation. Now, the config is constructed only once with the Validator and only copied and overwritten if validation time options are provided.

These changes resulted in the following improvements on the (admittedly limited) set of benchmarks added in d716bad:

→ benchstat .tmp/bench/2025-11-18:12:58:39.bench.txt .tmp/bench/2025-11-18:13:01:39.bench.txt 
goos: darwin
goarch: arm64:52:03.cpu.profile  2025-11-18:12:58:39.bench.txt    2025-11-18:12:58:39.mem.profile  2025-11-18:13:01:39.cpu.profile                                 
pkg: buf.build/go/protovalidate
cpu: Apple M1 Max
                          │ .tmp/bench/2025-11-18:12:58:39.bench.txt │ .tmp/bench/2025-11-18:13:01:39.bench.txt │
                          │                  sec/op                  │      sec/op        vs base               │
Scalar-10                                                421.8n ± 1%         396.0n ± 3%  -6.12% (p=0.000 n=10)
Repeated/Scalar-10                                       480.5n ± 1%         455.0n ± 2%  -5.30% (p=0.001 n=10)
Repeated/Message-10                                      607.0n ± 1%         561.2n ± 1%  -7.55% (p=0.000 n=10)
Repeated/Unique/Scalar-10                                735.4n ± 3%         686.2n ± 2%  -6.68% (p=0.000 n=10)
Repeated/Unique/Bytes-10                                 987.1n ± 4%         933.9n ± 3%  -5.39% (p=0.000 n=10)
geomean                                                  616.8n              578.5n       -6.21%

                          │ .tmp/bench/2025-11-18:12:58:39.bench.txt │ .tmp/bench/2025-11-18:13:01:39.bench.txt  │
                          │                   B/op                   │     B/op      vs base                     │
Scalar-10                                                 72.00 ± 0%      0.00 ± 0%  -100.00% (p=0.000 n=10)
Repeated/Scalar-10                                        192.0 ± 0%     120.0 ± 0%   -37.50% (p=0.000 n=10)
Repeated/Message-10                                       256.0 ± 0%     120.0 ± 0%   -53.12% (p=0.000 n=10)
Repeated/Unique/Scalar-10                                1064.0 ± 0%     536.0 ± 0%   -49.62% (p=0.000 n=10)
Repeated/Unique/Bytes-10                                2.398Ki ± 0%   1.743Ki ± 0%   -27.32% (p=0.000 n=10)
geomean                                                   391.9                      ?                       ¹ ²
¹ summaries must be >0 to compute geomean
² ratios must be >0 to compute geomean

                          │ .tmp/bench/2025-11-18:12:58:39.bench.txt │ .tmp/bench/2025-11-18:13:01:39.bench.txt │
                          │                allocs/op                 │  allocs/op   vs base                     │
Scalar-10                                                 3.000 ± 0%    0.000 ± 0%  -100.00% (p=0.000 n=10)
Repeated/Scalar-10                                        6.000 ± 0%    3.000 ± 0%   -50.00% (p=0.000 n=10)
Repeated/Message-10                                       8.000 ± 0%    3.000 ± 0%   -62.50% (p=0.000 n=10)
Repeated/Unique/Scalar-10                                 40.00 ± 0%    34.00 ± 0%   -15.00% (p=0.000 n=10)
Repeated/Unique/Bytes-10                                  88.00 ± 0%    73.00 ± 0%   -17.05% (p=0.000 n=10)
geomean                                                   13.84                     ?                       ¹ ²
¹ summaries must be >0 to compute geomean
² ratios must be >0 to compute geomean

rodaine · 2025-11-18T18:56:44Z

There's still a lot more to tackle; here's the allocations profile after these changes. Much of the allocations are within the cel-go library, which may require upstream changes to mitigate. Particularly, there are allocations for every list (or map) field which should be avoidable, or at least pool-able.

github-actions · 2025-11-18T19:04:21Z

The latest Buf updates on your PR. Results from workflow Buf / validate-protos (pull_request).

Build	Format	Lint	Breaking	Updated (UTC)
`✅ passed`	`✅ passed`	`✅ passed`	`⏩ skipped`	Nov 18, 2025, 7:04 PM

rodaine added 2 commits November 18, 2025 12:58

benchmark utility improvements

d716bad

performance improvements

6ee13b6

rodaine requested review from Alfus, jhump and pkwarren November 18, 2025 18:47

rodaine mentioned this pull request Nov 18, 2025

Memory consumption #286

Open

Alfus approved these changes Nov 18, 2025

View reviewed changes

pkwarren approved these changes Nov 18, 2025

View reviewed changes

rodaine merged commit 15821df into main Nov 18, 2025
8 checks passed

rodaine deleted the rodaine/bench-perf-improvements branch November 18, 2025 19:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Benchmark and performance improvements #289

Benchmark and performance improvements #289

Uh oh!

rodaine commented Nov 18, 2025

Uh oh!

rodaine commented Nov 18, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Nov 18, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Benchmark and performance improvements #289

Benchmark and performance improvements #289

Uh oh!

Conversation

rodaine commented Nov 18, 2025

Uh oh!

rodaine commented Nov 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Nov 18, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

rodaine commented Nov 18, 2025 •

edited

Loading