Benchmark and performance improvements (#289)

rodaine · web-flow · commit 15821df5881d · 2025-11-18T11:16:58.000-08:00
This patch updates our benchmarks to be more focused and "micro" which should make it easier to identify and address particular perf bottlenecks. Only a couple of benchmarks have been added so far, covering singular scalar fields, repeated scalar and message fields, and repeated fields including a unique rule. The benchmarks can be run consistently with `make bench` which has some args that may be customized (see the Makefile). --- This patch also includes a handful of performance improvements, focused on heap usage (though there was a ~5% CPU time improvement): - We use cel-go's `ReduceResiduals` to minimize/optimize the CEL programs. This means the `rule` & `rules` globals variables used by standard and predefined CEL expressions can be eliminated from the final program (since the values we use from it are injected as constant literals in the reduced AST). However, these globals were persisted in the `cel.Env` which caused cel-go to allocate a composite Activation to make them accessible alongside the `this` variable. Instead of using CEL globals, this patch uses them as normal variables prior to computing residuals, and elides them during actual execution of the CEL program, avoiding the allocation. - In order to keep `repeated.unique` `O(n)`, during validation we build up a `map[T]struct{}{}` to check for uniqueness in the list. This rule is particularly expensive, resulting in this map being allocated and thrown away on every validation. While this rule could avoid allocations altogether by making the comparison O(n^2) (effectively the CEL expression `this.all(x, this.exists_one(y, x == y))`), I instead opted to have the unique maps pull from a `sync.Pool`. Since the O(n^2) is only an issue for large lists, in the future we could either use a heuristic to swap between the CEL above or the map-based solution. - `errors.As` ends up allocating when you take the double-pointer to the target error, even when the source error is nil. (The escape analysis can't see that far, unfortunately). Since the majority of the time validation is successful, err is almost always nil. Performing a nil check before calls to `errors.As` eliminates this allocation (albeit small). - For every call to `Validate`, we construct a config struct that's drives the behavior of that single validation (things like fail-fast mode, filtering, and the now CEL function). Typically, these are set globally on the Validator instance itself, but can be overridden at validation time. However, even if they weren't set at validation time, we were still computing a new config object for every call, causing an extra allocation. Now, the config is constructed only once with the Validator and only copied and overwritten if validation time options are provided. These changes resulted in the following improvements on the (admittedly limited) set of benchmarks added in d716bad: ``` → benchstat .tmp/bench/2025-11-18:12:58:39.bench.txt .tmp/bench/2025-11-18:13:01:39.bench.txt goos: darwin goarch: arm64:52:03.cpu.profile 2025-11-18:12:58:39.bench.txt 2025-11-18:12:58:39.mem.profile 2025-11-18:13:01:39.cpu.profile pkg: buf.build/go/protovalidate cpu: Apple M1 Max │ .tmp/bench/2025-11-18:12:58:39.bench.txt │ .tmp/bench/2025-11-18:13:01:39.bench.txt │ │ sec/op │ sec/op vs base │ Scalar-10 421.8n ± 1% 396.0n ± 3% -6.12% (p=0.000 n=10) Repeated/Scalar-10 480.5n ± 1% 455.0n ± 2% -5.30% (p=0.001 n=10) Repeated/Message-10 607.0n ± 1% 561.2n ± 1% -7.55% (p=0.000 n=10) Repeated/Unique/Scalar-10 735.4n ± 3% 686.2n ± 2% -6.68% (p=0.000 n=10) Repeated/Unique/Bytes-10 987.1n ± 4% 933.9n ± 3% -5.39% (p=0.000 n=10) geomean 616.8n 578.5n -6.21% │ .tmp/bench/2025-11-18:12:58:39.bench.txt │ .tmp/bench/2025-11-18:13:01:39.bench.txt │ │ B/op │ B/op vs base │ Scalar-10 72.00 ± 0% 0.00 ± 0% -100.00% (p=0.000 n=10) Repeated/Scalar-10 192.0 ± 0% 120.0 ± 0% -37.50% (p=0.000 n=10) Repeated/Message-10 256.0 ± 0% 120.0 ± 0% -53.12% (p=0.000 n=10) Repeated/Unique/Scalar-10 1064.0 ± 0% 536.0 ± 0% -49.62% (p=0.000 n=10) Repeated/Unique/Bytes-10 2.398Ki ± 0% 1.743Ki ± 0% -27.32% (p=0.000 n=10) geomean 391.9 ? ¹ ² ¹ summaries must be >0 to compute geomean ² ratios must be >0 to compute geomean │ .tmp/bench/2025-11-18:12:58:39.bench.txt │ .tmp/bench/2025-11-18:13:01:39.bench.txt │ │ allocs/op │ allocs/op vs base │ Scalar-10 3.000 ± 0% 0.000 ± 0% -100.00% (p=0.000 n=10) Repeated/Scalar-10 6.000 ± 0% 3.000 ± 0% -50.00% (p=0.000 n=10) Repeated/Message-10 8.000 ± 0% 3.000 ± 0% -62.50% (p=0.000 n=10) Repeated/Unique/Scalar-10 40.00 ± 0% 34.00 ± 0% -15.00% (p=0.000 n=10) Repeated/Unique/Bytes-10 88.00 ± 0% 73.00 ± 0% -17.05% (p=0.000 n=10) geomean 13.84 ? ¹ ² ¹ summaries must be >0 to compute geomean ² ratios must be >0 to compute geomean ```
diff --git a/.gitignore b/.gitignore
@@ -2,3 +2,4 @@
 *.pprof
 *.svg
 cover.out
+*.test
diff --git a/.golangci.yml b/.golangci.yml
@@ -21,6 +21,7 @@ linters:
     - noinlineerr       # inline is fine
     - nonamedreturns    # usage of named returns should be selective
     - testpackage       # internal tests are fine
+    - thelper           # overzealous breaking of stack traces
     - wrapcheck         # don't _always_ need to wrap errors
     - wsl               # over-generous whitespace violates house style
     - wsl_v5            # over-generous whitespace violates house style
diff --git a/Makefile b/Makefile
@@ -6,7 +6,9 @@ SHELL := bash
 MAKEFLAGS += --warn-undefined-variables
 MAKEFLAGS += --no-builtin-rules
 MAKEFLAGS += --no-print-directory
-BIN := .tmp/bin
+TMP := .tmp
+BIN := $(TMP)/bin
+BENCH_TMP := $(TMP)/bench
 COPYRIGHT_YEARS := 2023-2025
 LICENSE_IGNORE := -e internal/testdata/
 # Set to use a different compiler. For example, `GO=go1.18rc1 make test`.
@@ -94,10 +96,26 @@ checkgenerate: generate
 	@# Used in CI to verify that `make generate` doesn't produce a diff.
 	test -z "$$(git status --porcelain | tee /dev/stderr)"
 
+
+BENCH ?= .
+BENCH_COUNT ?= 10
+BENCH_NAME ?= $(shell date +%F:%T)
+.PHONY: bench
+bench: $(BENCH_TMP)
+	go test -bench="$(BENCH)" -benchmem \
+		-memprofile "$(BENCH_TMP)/$(BENCH_NAME).mem.profile" \
+		-cpuprofile "$(BENCH_TMP)/$(BENCH_NAME).cpu.profile" \
+		-count $(BENCH_COUNT) \
+		| tee "$(BENCH_TMP)/$(BENCH_NAME).bench.txt"
+
+
 .PHONY: upgrade-go
 upgrade-go:
 	$(GO) get -u -t ./... && $(GO) mod tidy -v
 
+$(BENCH_TMP):
+	@mkdir -p $(BENCH_TMP)
+
 $(BIN):
 	@mkdir -p $(BIN)
 
diff --git a/ast.go b/ast.go
@@ -21,7 +21,6 @@ import (
 	"buf.build/gen/go/bufbuild/protovalidate/protocolbuffers/go/buf/validate"
 	pvcel "buf.build/go/protovalidate/cel"
 	"github.com/google/cel-go/cel"
-	"github.com/google/cel-go/interpreter"
 	"google.golang.org/protobuf/reflect/protoreflect"
 )
 
@@ -41,7 +40,7 @@ func (set astSet) Merge(other astSet) astSet {
 // either a true or empty string constant result, no compiledProgram is
 // generated for it. The main usage of this is to elide tautological expressions
 // from the final result.
-func (set astSet) ReduceResiduals(opts ...cel.ProgramOption) (programSet, error) {
+func (set astSet) ReduceResiduals(rules protoreflect.Message, opts ...cel.ProgramOption) (programSet, error) {
 	residuals := make(astSet, 0, len(set))
 	options := append([]cel.ProgramOption{
 		cel.EvalOptions(
@@ -52,17 +51,26 @@ func (set astSet) ReduceResiduals(opts ...cel.ProgramOption) (programSet, error)
 		),
 	}, opts...)
 
+	baseActivation := &variable{
+		Name: "rules",
+		Val:  rules.Interface(),
+	}
+
 	for _, ast := range set {
-		options := slices.Clone(options)
+		activation := baseActivation
 		if ast.Value.IsValid() {
-			options = append(options, cel.Globals(&variable{Name: "rule", Val: ast.Value.Interface()}))
+			activation = &variable{
+				Name: "rule",
+				Val:  ast.Value.Interface(),
+				Next: activation,
+			}
 		}
 		program, err := ast.toProgram(ast.Env, options...)
 		if err != nil {
 			residuals = append(residuals, ast)
 			continue
 		}
-		val, details, _ := program.Program.Eval(interpreter.EmptyActivation())
+		val, details, _ := program.Program.Eval(activation)
 		if val != nil {
 			switch value := val.Value().(type) {
 			case bool:
diff --git a/ast_test.go b/ast_test.go
@@ -91,7 +91,10 @@ func TestASTSet_ReduceResiduals(t *testing.T) {
 	)
 	require.NoError(t, err)
 	assert.Len(t, asts, 1)
-	set, err := asts.ReduceResiduals(cel.Globals(&variable{Name: "foo", Val: true}))
+	set, err := asts.ReduceResiduals(
+		(&validate.StringRules{}).ProtoReflect(),
+		cel.Globals(&variable{Name: "foo", Val: true}),
+	)
 	require.NoError(t, err)
 	assert.Empty(t, set)
 }
diff --git a/buf.gen.yaml b/buf.gen.yaml
@@ -4,6 +4,8 @@ managed:
   disable:
     - file_option: go_package
       module: buf.build/bufbuild/protovalidate
+    - file_option: go_package
+      module: buf.build/rodaine/protogofakeit
   override:
     - file_option: go_package_prefix
       value: buf.build/go/protovalidate/internal/gen
diff --git a/buf.lock b/buf.lock
@@ -4,3 +4,6 @@ deps:
   - name: buf.build/bufbuild/protovalidate
     commit: 52f32327d4b045a79293a6ad4e7e1236
     digest: b5:cbabc98d4b7b7b0447c9b15f68eeb8a7a44ef8516cb386ac5f66e7fd4062cd6723ed3f452ad8c384b851f79e33d26e7f8a94e2b807282b3def1cd966c7eace97
+  - name: buf.build/rodaine/protogofakeit
+    commit: 9caf0fc578d3413590962a1764b81b94
+    digest: b5:eeead7373f2f598ebc8f91aa3a68d6b50630076341d875b22dc6760126bc56c82cf1e98f5a2eff9815ba55fa48ab81745c93a5aeefd5e4697bf43c9ea4694735
diff --git a/buf.yaml b/buf.yaml
@@ -3,6 +3,7 @@ modules:
   - path: proto
 deps:
   - buf.build/bufbuild/protovalidate:v1.0.0
+  - buf.build/rodaine/protogofakeit
 lint:
   use:
     - STANDARD
diff --git a/cache.go b/cache.go
@@ -100,8 +100,7 @@ func (c *cache) Build(
 		return nil, err
 	}
 
-	rulesGlobal := cel.Globals(&variable{Name: "rules", Val: rules.Interface()})
-	set, err = asts.ReduceResiduals(rulesGlobal)
+	set, err = asts.ReduceResiduals(rules)
 	return set, err
 }
 
diff --git a/cel/library.go b/cel/library.go
@@ -22,6 +22,7 @@ import (
 	"slices"
 	"strconv"
 	"strings"
+	"sync"
 	"unicode/utf8"
 
 	"github.com/google/cel-go/cel"
@@ -49,7 +50,14 @@ var (
 // Using this function, you can create a CEL environment that is identical to
 // the one used to evaluate protovalidate CEL expressions.
 func NewLibrary() cel.Library {
-	return library{}
+	return &library{
+		uniqueScalarPool: sync.Pool{New: func() any {
+			return map[ref.Val]struct{}{}
+		}},
+		uniqueBytesPool: sync.Pool{New: func() any {
+			return map[string]struct{}{}
+		}},
+	}
 }
 
 // library is the collection of functions and settings required by protovalidate
@@ -59,9 +67,12 @@ func NewLibrary() cel.Library {
 //
 // All implementations of protovalidate MUST implement these functions and
 // should avoid exposing additional functions as they will not be portable.
-type library struct{}
+type library struct {
+	uniqueScalarPool sync.Pool
+	uniqueBytesPool  sync.Pool
+}
 
-func (l library) CompileOptions() []cel.EnvOption { //nolint:funlen,gocyclo
+func (l *library) CompileOptions() []cel.EnvOption { //nolint:funlen,gocyclo
 	return []cel.EnvOption{
 		cel.TypeDescs(protoregistry.GlobalFiles),
 		cel.DefaultUTCTimeZone(true),
@@ -375,15 +386,15 @@ func (l library) CompileOptions() []cel.EnvOption { //nolint:funlen,gocyclo
 	}
 }
 
-func (l library) ProgramOptions() []cel.ProgramOption {
+func (l *library) ProgramOptions() []cel.ProgramOption {
 	return []cel.ProgramOption{
 		cel.EvalOptions(
 			cel.OptOptimize,
 		),
 	}
 }
 
-func (l library) uniqueMemberOverload(itemType *cel.Type, overload func(lister traits.Lister) ref.Val) cel.FunctionOpt {
+func (l *library) uniqueMemberOverload(itemType *cel.Type, overload func(lister traits.Lister) ref.Val) cel.FunctionOpt {
 	return cel.MemberOverload(
 		itemType.String()+"_unique_bool",
 		[]*cel.Type{cel.ListType(itemType)},
@@ -398,15 +409,19 @@ func (l library) uniqueMemberOverload(itemType *cel.Type, overload func(lister t
 	)
 }
 
-func (l library) uniqueScalar(list traits.Lister) ref.Val {
+func (l *library) uniqueScalar(list traits.Lister) ref.Val {
 	size, ok := list.Size().Value().(int64)
 	if !ok {
 		return types.UnsupportedRefValConversionErr(list.Size().Value())
 	}
 	if size <= 1 {
 		return types.Bool(true)
 	}
-	exist := make(map[ref.Val]struct{}, size)
+	exist := l.uniqueScalarPool.Get().(map[ref.Val]struct{}) //nolint:errcheck // guaranteed to match
+	defer func() {
+		clear(exist)
+		l.uniqueScalarPool.Put(exist)
+	}()
 	for i := range size {
 		val := list.Get(types.Int(i))
 		if _, ok := exist[val]; ok {
@@ -421,24 +436,30 @@ func (l library) uniqueScalar(list traits.Lister) ref.Val {
 // compares bytes type CEL values. This function is used instead of uniqueScalar
 // as the bytes ([]uint8) type is not hashable in Go; we cheat this by converting
 // the value to a string.
-func (l library) uniqueBytes(list traits.Lister) ref.Val {
+func (l *library) uniqueBytes(list traits.Lister) ref.Val {
 	size, ok := list.Size().Value().(int64)
 	if !ok {
 		return types.UnsupportedRefValConversionErr(list.Size().Value())
 	}
 	if size <= 1 {
 		return types.Bool(true)
 	}
-	exist := make(map[any]struct{}, size)
+	exist := l.uniqueBytesPool.Get().(map[string]struct{}) //nolint:errcheck // guaranteed to match
+	defer func() {
+		clear(exist)
+		l.uniqueBytesPool.Put(exist)
+	}()
 	for i := range size {
 		val := list.Get(types.Int(i)).Value()
-		if b, ok := val.([]uint8); ok {
-			val = string(b)
+		b, ok := val.([]byte)
+		if !ok {
+			return types.NewErr("expected bytes, got %v", val)
 		}
-		if _, ok := exist[val]; ok {
+		str := string(b)
+		if _, ok := exist[str]; ok {
 			return types.Bool(false)
 		}
-		exist[val] = struct{}{}
+		exist[str] = struct{}{}
 	}
 	return types.Bool(true)
 }
diff --git a/error_utils.go b/error_utils.go
@@ -91,7 +91,7 @@ func fieldPath(field protoreflect.FieldDescriptor) *validate.FieldPath {
 // path is reversed. Rule paths are generally static, so this optimization isn't
 // applied for rule paths.
 func updateViolationPaths(err error, fieldSuffix *validate.FieldPathElement, rulePrefix []*validate.FieldPathElement) {
-	if fieldSuffix == nil && len(rulePrefix) == 0 {
+	if err == nil || (fieldSuffix == nil && len(rulePrefix) == 0) {
 		return
 	}
 	var valErr *ValidationError
@@ -117,6 +117,9 @@ func updateViolationPaths(err error, fieldSuffix *validate.FieldPathElement, rul
 // finalizeViolationPaths reverses all field paths in the error and populates
 // the deprecated string-based field path.
 func finalizeViolationPaths(err error) {
+	if err == nil {
+		return
+	}
 	var valErr *ValidationError
 	if errors.As(err, &valErr) {
 		for _, violation := range valErr.Violations {
@@ -161,6 +164,9 @@ func FieldPathString(path *validate.FieldPath) string {
 // markViolationForKey marks the provided error as being for a map key, by
 // setting the `for_key` flag on each violation within the validation error.
 func markViolationForKey(err error) {
+	if err == nil {
+		return
+	}
 	var valErr *ValidationError
 	if errors.As(err, &valErr) {
 		for _, violation := range valErr.Violations {
diff --git a/go.mod b/go.mod
@@ -5,7 +5,9 @@ go 1.24.0
 require (
 	buf.build/gen/go/bufbuild/protovalidate/protocolbuffers/go v1.36.10-20250912141014-52f32327d4b0.1
 	buf.build/go/hyperpb v0.1.3
+	github.com/brianvoe/gofakeit/v6 v6.28.0
 	github.com/google/cel-go v0.26.1
+	github.com/rodaine/protogofakeit v0.1.1
 	github.com/stretchr/testify v1.11.1
 	google.golang.org/protobuf v1.36.10
 )
diff --git a/go.sum b/go.sum
@@ -8,6 +8,8 @@ cel.dev/expr v0.24.0 h1:56OvJKSH3hDGL0ml5uSxZmz3/3Pq4tJ+fb1unVLAFcY=
 cel.dev/expr v0.24.0/go.mod h1:hLPLo1W4QUmuYdA72RBX06QTs6MXw941piREPl3Yfiw=
 github.com/antlr4-go/antlr/v4 v4.13.1 h1:SqQKkuVZ+zWkMMNkjy5FZe5mr5WURWnlpmOuzYWrPrQ=
 github.com/antlr4-go/antlr/v4 v4.13.1/go.mod h1:GKmUxMtwp6ZgGwZSva4eWPC5mS6vUAmOABFgjdkM7Nw=
+github.com/brianvoe/gofakeit/v6 v6.28.0 h1:Xib46XXuQfmlLS2EXRuJpqcw8St6qSZz75OUo0tgAW4=
+github.com/brianvoe/gofakeit/v6 v6.28.0/go.mod h1:Xj58BMSnFqcn/fAQeSK+/PLtC5kSb7FJIq4JyGa8vEs=
 github.com/creack/pty v1.1.9/go.mod h1:oKZEueFk5CKHvIhNR5MUki03XCEU+Q6VDXinZuGJ33E=
 github.com/davecgh/go-spew v1.1.0/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38=
 github.com/davecgh/go-spew v1.1.1 h1:vj9j/u1bqnvCEfJOwUhtlOARqs3+rkHYY13jYWTU97c=
@@ -30,6 +32,8 @@ github.com/pmezard/go-difflib v1.0.0 h1:4DBwDE0NGyQoBHbLQYPwSUPoCMWR5BEzIk/f1lZb
 github.com/pmezard/go-difflib v1.0.0/go.mod h1:iKH77koFhYxTK1pcRnkKkqfTogsbg7gZNVY4sRDYZ/4=
 github.com/protocolbuffers/protoscope v0.0.0-20221109213918-8e7a6aafa2c9 h1:arwj11zP0yJIxIRiDn22E0H8PxfF7TsTrc2wIPFIsf4=
 github.com/protocolbuffers/protoscope v0.0.0-20221109213918-8e7a6aafa2c9/go.mod h1:SKZx6stCn03JN3BOWTwvVIO2ajMkb/zQdTceXYhKw/4=
+github.com/rodaine/protogofakeit v0.1.1 h1:ZKouljuRM3A+TArppfBqnH8tGZHOwM/pjvtXe9DaXH8=
+github.com/rodaine/protogofakeit v0.1.1/go.mod h1:pXn/AstBYMaSfc1/RqH3N82pBuxtWgejz1AlYpY1mI0=
 github.com/stoewer/go-strcase v1.3.1 h1:iS0MdW+kVTxgMoE1LAZyMiYJFKlOzLooE4MxjirtkAs=
 github.com/stoewer/go-strcase v1.3.1/go.mod h1:fAH5hQ5pehh+j3nZfvwdk2RgEgQjAoM8wodgtPmh1xo=
 github.com/stretchr/objx v0.1.0/go.mod h1:HFkY916IF+rwdDfMAkV7OtwuqBVzrE8GR6GFx+wExME=
diff --git a/internal/gen/tests/example/v1/bench.pb.go b/internal/gen/tests/example/v1/bench.pb.go
diff --git a/internal/gen/tests/example/v1/bench_protoopaque.pb.go b/internal/gen/tests/example/v1/bench_protoopaque.pb.go
diff --git a/proto/tests/example/v1/bench.proto b/proto/tests/example/v1/bench.proto
diff --git a/validator.go b/validator.go
diff --git a/validator_bench_test.go b/validator_bench_test.go

-Original file line number
+Diff line change
 *.pprof
 *.svg
 cover.out
 +*.test
Original file line number	Diff line number	Diff line change
`@@ -91,7 +91,10 @@ func TestASTSet_ReduceResiduals(t *testing.T) {`
`91`	`91`	`)`
`92`	`92`	`require.NoError(t, err)`
`93`	`93`	`assert.Len(t, asts, 1)`
`94`		`- set, err := asts.ReduceResiduals(cel.Globals(&variable{Name: "foo", Val: true}))`
	`94`	`+ set, err := asts.ReduceResiduals(`
	`95`	`+ (&validate.StringRules{}).ProtoReflect(),`
	`96`	`+ cel.Globals(&variable{Name: "foo", Val: true}),`
	`97`	`+ )`
`95`	`98`	`require.NoError(t, err)`
`96`	`99`	`assert.Empty(t, set)`
`97`	`100`	`}`
Original file line number	Diff line number	Diff line change
`@@ -100,8 +100,7 @@ func (c *cache) Build(`
`100`	`100`	`return nil, err`
`101`	`101`	`}`
`102`	`102`
`103`		`- rulesGlobal := cel.Globals(&variable{Name: "rules", Val: rules.Interface()})`
`104`		`- set, err = asts.ReduceResiduals(rulesGlobal)`
	`103`	`+ set, err = asts.ReduceResiduals(rules)`
`105`	`104`	`return set, err`
`106`	`105`	`}`
`107`	`106`