eval: resolveIndex speedup #201

matheusd · 2022-12-08T20:09:03Z

Depends on #200

This improves the performance of resolveIndex calls by bringing back the structure cache access that was removed in commit be876de and by modifying the function's signature to receive either a reflect.Value or a string argument, depending on what the call sites already have. This brings down allocations for the most common execution paths back to zero.

Benchmarks shows that the hot path for executing templates involve allocations made when executing a {{ .Field }} action when the current context is a structure. This is one of the most fundamental building blocks for templates, and thus deserves a closer inspection on how to make performant.

Going over the history of this package, up to version 3.0.0, benchmarks showed this code path to have zero allocations, and further digging shows that commit be876de rewrote the resolveIndex function in such a way that it dropped support for a package-level cache of parsed structures, which avoided these allocations for commonly used structures.

This commit brings back access to that cache for the common case, while retaining the previous behavior of using the FieldByName() call (which causes one allocation).

Additionally, after analyzing the call sites for resolveIndex, it's clear that in many of them, the representation of the index to be resolved is already stored in a string variable and for the common case of structure fields, it ends up doing a roundtrip to a reflect.Value before being extracted again as a string, causing another unnecessary allocation.

Thus this commit also modifies the signature of the resolveIndex function to take either a reflect.Value or a string, depending on what is available on each call site without having to perform a conversion for the common cases, and handling the uncommon cases (when conversion is actually necessary) in a special way.

The net result is that execution of templates which consist only of field accesses become allocation-free, offering an alternative to write very memory efficient templates, even when they contain large numbers of accesses or items.

Following is a representative sample of benchmarks before and after the change:

name                      old time/op    new time/op    delta
RangeSimple-4               21.9µs ± 1%    15.1µs ± 2%   -31.07% (p=0.000 n=10+10)
RangeSimpleSet-4            31.6µs ± 1%    24.2µs ± 2%   -23.49% (p=0.000 n=10+10)
CustomRanger-4               346ns ± 1%     210ns ± 2%   -39.36% (p=0.000 n=10+10)
FieldAccess-4                414ns ±11%     290ns ±10%   -29.91% (p=0.000 n=10+10)

name                      old alloc/op   new alloc/op   delta
RangeSimple-4               1.21kB ± 0%    0.06kB ± 0%   -95.37% (p=0.000 n=9+10)
RangeSimpleSet-4            1.63kB ± 0%    0.48kB ± 0%   -70.61% (p=0.000 n=10+10)
CustomRanger-4               24.0B ± 0%      0.0B       -100.00% (p=0.000 n=10+10)
FieldAccess-4                24.0B ± 0%      0.0B       -100.00% (p=0.000 n=10+10)

name                      old allocs/op  new allocs/op  delta
RangeSimple-4                 98.0 ± 0%       2.0 ± 0%   -97.96% (p=0.000 n=10+10)
RangeSimpleSet-4               101 ± 0%         5 ± 0%   -95.05% (p=0.000 n=10+10)
CustomRanger-4                2.00 ± 0%      0.00       -100.00% (p=0.000 n=10+10)
FieldAccess-4                 2.00 ± 0%      0.00       -100.00% (p=0.000 n=10+10)

This fixes a bug when looping over an unindex ranger (i.e. a ranger that returns false for its ProvidesIndex() method). Previously, the context for the range was not properly set, so it made it impossible to use the context in such rangers. This also fixes a panic when attempting to use an unindexed ranger without a corresponding 'set' action.

This adds test cases to assert using a custom ranger interface works as expected. It also modifies the previous slice tests to make it clear it's using the index, unifying it with the other test case names.

This improves the performance of resolveIndex calls by bringing back the structure cache access that was removed in commit be876de and by modifying the function's signature to receive either a reflect.Value or a string argument, depending on what the call sites already have. This brings down allocations for the most common execution paths back to zero. Benchmarks shows that the hot path for executing templates involve allocations made when executing a {{ .Field }} action when the current context is a structure. This is one of the most fundamental building blocks for templates, and thus deserves a closer inspection on how to make performant. Going over the history of this package, up to version 3.0.0, benchmarks showed this code path to have zero allocations, and further digging shows that commit be876de rewrote the resolveIndex function in such a way that it dropped support for a package-level cache of parsed structures, which avoided these allocations for commonly used structures. This commit brings back access to that cache for the common case, while retaining the previous behavior of using the FieldByName() call (which causes one allocation). Additionally, after analyzing the call sites for resolveIndex, it's clear that in many of them, the representation of the index to be resolved is already stored in a string variable and for the common case of structure fields, it ends up doing a roundtrip to a reflect.Value before being extracted again as a string, causing another unnecessary allocation. Thus this commit also modifies the signature of the resolveIndex function to take either a reflect.Value or a string, depending on what is available on each call site without having to perform a conversion for the common cases, and handling the uncommon cases (when conversion is actually necessary) in a special way. The net result is that execution of templates which consist only of field accesses become allocation-free, offering an alternative to write very memory efficient templates, even when they contain large numbers of accesses or items. Following is a representative sample of benchmarks before and after the change: name old time/op new time/op delta RangeSimple-4 21.9µs ± 1% 15.1µs ± 2% -31.07% (p=0.000 n=10+10) RangeSimpleSet-4 31.6µs ± 1% 24.2µs ± 2% -23.49% (p=0.000 n=10+10) CustomRanger-4 346ns ± 1% 210ns ± 2% -39.36% (p=0.000 n=10+10) FieldAccess-4 414ns ±11% 290ns ±10% -29.91% (p=0.000 n=10+10) name old alloc/op new alloc/op delta RangeSimple-4 1.21kB ± 0% 0.06kB ± 0% -95.37% (p=0.000 n=9+10) RangeSimpleSet-4 1.63kB ± 0% 0.48kB ± 0% -70.61% (p=0.000 n=10+10) CustomRanger-4 24.0B ± 0% 0.0B -100.00% (p=0.000 n=10+10) FieldAccess-4 24.0B ± 0% 0.0B -100.00% (p=0.000 n=10+10) name old allocs/op new allocs/op delta RangeSimple-4 98.0 ± 0% 2.0 ± 0% -97.96% (p=0.000 n=10+10) RangeSimpleSet-4 101 ± 0% 5 ± 0% -95.05% (p=0.000 n=10+10) CustomRanger-4 2.00 ± 0% 0.00 -100.00% (p=0.000 n=10+10) FieldAccess-4 2.00 ± 0% 0.00 -100.00% (p=0.000 n=10+10)

sauerbraten · 2022-12-11T15:50:29Z

Very impressive, thank you! In my defense: when I wrote the linked commit, I was going for correctness first. Thanks for taking the time to benchmark and improve the code, I'm sure lots of jet users will love the result.

matheusd · 2022-12-12T10:42:24Z

Very impressive, thank you! In my defense: when I wrote the linked commit, I was going for correctness first. Thanks for taking the time to benchmark and improve the code, I'm sure lots of jet users will love the result.

No worries, I know how that goes! Unless we add specific test failures to lock down on performance behavior, it's not super unusual to have performance degrade once in a while during refactorings.

matheusd added 5 commits December 8, 2022 11:38

doc: Expand documentation for Ranger()

611cef9

eval/test: Add test cases for custom ranger

c3dfa5a

This adds test cases to assert using a custom ranger interface works as expected. It also modifies the previous slice tests to make it clear it's using the index, unifying it with the other test case names.

eval/test: Add benchmark for field access

03009eb

sauerbraten merged commit 3838f8b into CloudyKit:master Dec 11, 2022

matheusd deleted the speedup branch December 12, 2022 10:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

eval: resolveIndex speedup #201

eval: resolveIndex speedup #201

Uh oh!

matheusd commented Dec 8, 2022

Uh oh!

sauerbraten commented Dec 11, 2022

Uh oh!

matheusd commented Dec 12, 2022

Uh oh!

Uh oh!

eval: resolveIndex speedup #201

eval: resolveIndex speedup #201

Uh oh!

Conversation

matheusd commented Dec 8, 2022

Uh oh!

sauerbraten commented Dec 11, 2022

Uh oh!

matheusd commented Dec 12, 2022

Uh oh!

Uh oh!