Skip to content

Conversation

matheusd
Copy link
Contributor

@matheusd matheusd commented Dec 8, 2022

Depends on #200

This improves the performance of resolveIndex calls by bringing back the structure cache access that was removed in commit be876de and by modifying the function's signature to receive either a reflect.Value or a string argument, depending on what the call sites already have. This brings down allocations for the most common execution paths back to zero.

Benchmarks shows that the hot path for executing templates involve allocations made when executing a {{ .Field }} action when the current context is a structure. This is one of the most fundamental building blocks for templates, and thus deserves a closer inspection on how to make performant.

Going over the history of this package, up to version 3.0.0, benchmarks showed this code path to have zero allocations, and further digging shows that commit be876de rewrote the resolveIndex function in such a way that it dropped support for a package-level cache of parsed structures, which avoided these allocations for commonly used structures.

This commit brings back access to that cache for the common case, while retaining the previous behavior of using the FieldByName() call (which causes one allocation).

Additionally, after analyzing the call sites for resolveIndex, it's clear that in many of them, the representation of the index to be resolved is already stored in a string variable and for the common case of structure fields, it ends up doing a roundtrip to a reflect.Value before being extracted again as a string, causing another unnecessary allocation.

Thus this commit also modifies the signature of the resolveIndex function to take either a reflect.Value or a string, depending on what is available on each call site without having to perform a conversion for the common cases, and handling the uncommon cases (when conversion is actually necessary) in a special way.

The net result is that execution of templates which consist only of field accesses become allocation-free, offering an alternative to write very memory efficient templates, even when they contain large numbers of accesses or items.

Following is a representative sample of benchmarks before and after the change:

name                      old time/op    new time/op    delta
RangeSimple-4               21.9µs ± 1%    15.1µs ± 2%   -31.07% (p=0.000 n=10+10)
RangeSimpleSet-4            31.6µs ± 1%    24.2µs ± 2%   -23.49% (p=0.000 n=10+10)
CustomRanger-4               346ns ± 1%     210ns ± 2%   -39.36% (p=0.000 n=10+10)
FieldAccess-4                414ns ±11%     290ns ±10%   -29.91% (p=0.000 n=10+10)

name                      old alloc/op   new alloc/op   delta
RangeSimple-4               1.21kB ± 0%    0.06kB ± 0%   -95.37% (p=0.000 n=9+10)
RangeSimpleSet-4            1.63kB ± 0%    0.48kB ± 0%   -70.61% (p=0.000 n=10+10)
CustomRanger-4               24.0B ± 0%      0.0B       -100.00% (p=0.000 n=10+10)
FieldAccess-4                24.0B ± 0%      0.0B       -100.00% (p=0.000 n=10+10)

name                      old allocs/op  new allocs/op  delta
RangeSimple-4                 98.0 ± 0%       2.0 ± 0%   -97.96% (p=0.000 n=10+10)
RangeSimpleSet-4               101 ± 0%         5 ± 0%   -95.05% (p=0.000 n=10+10)
CustomRanger-4                2.00 ± 0%      0.00       -100.00% (p=0.000 n=10+10)
FieldAccess-4                 2.00 ± 0%      0.00       -100.00% (p=0.000 n=10+10)

This fixes a bug when looping over an unindex ranger (i.e. a ranger that
returns false for its ProvidesIndex() method).

Previously, the context for the range was not properly set, so it made
it impossible to use the context in such rangers.

This also fixes a panic when attempting to use an unindexed ranger
without a corresponding 'set' action.
This adds test cases to assert using a custom ranger interface works as
expected.

It also modifies the previous slice tests to make it clear it's using
the index, unifying it with the other test case names.
This improves the performance of resolveIndex calls by bringing back the
structure cache access that was removed in commit
be876de and by modifying the function's
signature to receive either a reflect.Value or a string argument,
depending on what the call sites already have. This brings down
allocations for the most common execution paths back to zero.

Benchmarks shows that the hot path for executing templates involve
allocations made when executing a {{ .Field }} action when the current
context is a structure. This is one of the most fundamental building
blocks for templates, and thus deserves a closer inspection on how to
make performant.

Going over the history of this package, up to version 3.0.0, benchmarks
showed this code path to have zero allocations, and further digging
shows that commit be876de rewrote the resolveIndex function in such a
way that it dropped support for a package-level cache of parsed
structures, which avoided these allocations for commonly used
structures.

This commit brings back access to that cache for the common case, while
retaining the previous behavior of using the FieldByName() call (which
causes one allocation).

Additionally, after analyzing the call sites for resolveIndex, it's
clear that in many of them, the representation of the index to be
resolved is already stored in a string variable and for the common case
of structure fields, it ends up doing a roundtrip to a reflect.Value
before being extracted again as a string, causing another unnecessary
allocation.

Thus this commit also modifies the signature of the resolveIndex
function to take either a reflect.Value or a string, depending on what
is available on each call site without having to perform a conversion
for the common cases, and handling the uncommon cases (when conversion
is actually necessary) in a special way.

The net result is that execution of templates which consist only of
field accesses become allocation-free, offering an alternative to write
very memory efficient templates, even when they contain large numbers of
accesses or items.

Following is a representative sample of benchmarks before and after the
change:

name                      old time/op    new time/op    delta
RangeSimple-4               21.9µs ± 1%    15.1µs ± 2%   -31.07% (p=0.000 n=10+10)
RangeSimpleSet-4            31.6µs ± 1%    24.2µs ± 2%   -23.49% (p=0.000 n=10+10)
CustomRanger-4               346ns ± 1%     210ns ± 2%   -39.36% (p=0.000 n=10+10)
FieldAccess-4                414ns ±11%     290ns ±10%   -29.91% (p=0.000 n=10+10)

name                      old alloc/op   new alloc/op   delta
RangeSimple-4               1.21kB ± 0%    0.06kB ± 0%   -95.37% (p=0.000 n=9+10)
RangeSimpleSet-4            1.63kB ± 0%    0.48kB ± 0%   -70.61% (p=0.000 n=10+10)
CustomRanger-4               24.0B ± 0%      0.0B       -100.00% (p=0.000 n=10+10)
FieldAccess-4                24.0B ± 0%      0.0B       -100.00% (p=0.000 n=10+10)

name                      old allocs/op  new allocs/op  delta
RangeSimple-4                 98.0 ± 0%       2.0 ± 0%   -97.96% (p=0.000 n=10+10)
RangeSimpleSet-4               101 ± 0%         5 ± 0%   -95.05% (p=0.000 n=10+10)
CustomRanger-4                2.00 ± 0%      0.00       -100.00% (p=0.000 n=10+10)
FieldAccess-4                 2.00 ± 0%      0.00       -100.00% (p=0.000 n=10+10)
@sauerbraten
Copy link
Collaborator

Very impressive, thank you! In my defense: when I wrote the linked commit, I was going for correctness first. Thanks for taking the time to benchmark and improve the code, I'm sure lots of jet users will love the result.

@sauerbraten sauerbraten merged commit 3838f8b into CloudyKit:master Dec 11, 2022
@matheusd matheusd deleted the speedup branch December 12, 2022 10:41
@matheusd
Copy link
Contributor Author

Very impressive, thank you! In my defense: when I wrote the linked commit, I was going for correctness first. Thanks for taking the time to benchmark and improve the code, I'm sure lots of jet users will love the result.

No worries, I know how that goes! Unless we add specific test failures to lock down on performance behavior, it's not super unusual to have performance degrade once in a while during refactorings.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants