Benchmark and performance improvements #289
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This patch updates our benchmarks to be more focused and "micro" which should make it easier to identify and address particular perf bottlenecks. Only a couple of benchmarks have been added so far, covering singular scalar fields, repeated scalar and message fields, and repeated fields including a unique rule. The benchmarks can be run consistently with
make benchwhich has some args that may be customized (see the Makefile).This patch also includes a handful of performance improvements, focused on heap usage (though there was a ~5% CPU time improvement):
We use cel-go's
ReduceResidualsto minimize/optimize the CEL programs. This means therule&rulesglobals variables used by standard and predefined CEL expressions can be eliminated from the final program (since the values we use from it are injected as constant literals in the reduced AST). However, these globals were persisted in thecel.Envwhich caused cel-go to allocate a composite Activation to make them accessible alongside thethisvariable. Instead of using CEL globals, this patch uses them as normal variables prior to computing residuals, and elides them during actual execution of the CEL program, avoiding the allocation.In order to keep
repeated.uniqueO(n), during validation we build up amap[T]struct{}{}to check for uniqueness in the list. This rule is particularly expensive, resulting in this map being allocated and thrown away on every validation. While this rule could avoid allocations altogether by making the comparison O(n^2) (effectively the CEL expressionthis.all(x, this.exists_one(y, x == y))), I instead opted to have the unique maps pull from async.Pool. Since the O(n^2) is only an issue for large lists, in the future we could either use a heuristic to swap between the CEL above or the map-based solution.errors.Asends up allocating when you take the double-pointer to the target error, even when the source error is nil. (The escape analysis can't see that far, unfortunately). Since the majority of the time validation is successful, err is almost always nil. Performing a nil check before calls toerrors.Aseliminates this allocation (albeit small).For every call to
Validate, we construct a config struct that's drives the behavior of that single validation (things like fail-fast mode, filtering, and the now CEL function). Typically, these are set globally on the Validator instance itself, but can be overridden at validation time. However, even if they weren't set at validation time, we were still computing a new config object for every call, causing an extra allocation. Now, the config is constructed only once with the Validator and only copied and overwritten if validation time options are provided.These changes resulted in the following improvements on the (admittedly limited) set of benchmarks added in d716bad: