-
Notifications
You must be signed in to change notification settings - Fork 116
eval: resolveIndex speedup #201
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This fixes a bug when looping over an unindex ranger (i.e. a ranger that returns false for its ProvidesIndex() method). Previously, the context for the range was not properly set, so it made it impossible to use the context in such rangers. This also fixes a panic when attempting to use an unindexed ranger without a corresponding 'set' action.
This adds test cases to assert using a custom ranger interface works as expected. It also modifies the previous slice tests to make it clear it's using the index, unifying it with the other test case names.
This improves the performance of resolveIndex calls by bringing back the structure cache access that was removed in commit be876de and by modifying the function's signature to receive either a reflect.Value or a string argument, depending on what the call sites already have. This brings down allocations for the most common execution paths back to zero. Benchmarks shows that the hot path for executing templates involve allocations made when executing a {{ .Field }} action when the current context is a structure. This is one of the most fundamental building blocks for templates, and thus deserves a closer inspection on how to make performant. Going over the history of this package, up to version 3.0.0, benchmarks showed this code path to have zero allocations, and further digging shows that commit be876de rewrote the resolveIndex function in such a way that it dropped support for a package-level cache of parsed structures, which avoided these allocations for commonly used structures. This commit brings back access to that cache for the common case, while retaining the previous behavior of using the FieldByName() call (which causes one allocation). Additionally, after analyzing the call sites for resolveIndex, it's clear that in many of them, the representation of the index to be resolved is already stored in a string variable and for the common case of structure fields, it ends up doing a roundtrip to a reflect.Value before being extracted again as a string, causing another unnecessary allocation. Thus this commit also modifies the signature of the resolveIndex function to take either a reflect.Value or a string, depending on what is available on each call site without having to perform a conversion for the common cases, and handling the uncommon cases (when conversion is actually necessary) in a special way. The net result is that execution of templates which consist only of field accesses become allocation-free, offering an alternative to write very memory efficient templates, even when they contain large numbers of accesses or items. Following is a representative sample of benchmarks before and after the change: name old time/op new time/op delta RangeSimple-4 21.9µs ± 1% 15.1µs ± 2% -31.07% (p=0.000 n=10+10) RangeSimpleSet-4 31.6µs ± 1% 24.2µs ± 2% -23.49% (p=0.000 n=10+10) CustomRanger-4 346ns ± 1% 210ns ± 2% -39.36% (p=0.000 n=10+10) FieldAccess-4 414ns ±11% 290ns ±10% -29.91% (p=0.000 n=10+10) name old alloc/op new alloc/op delta RangeSimple-4 1.21kB ± 0% 0.06kB ± 0% -95.37% (p=0.000 n=9+10) RangeSimpleSet-4 1.63kB ± 0% 0.48kB ± 0% -70.61% (p=0.000 n=10+10) CustomRanger-4 24.0B ± 0% 0.0B -100.00% (p=0.000 n=10+10) FieldAccess-4 24.0B ± 0% 0.0B -100.00% (p=0.000 n=10+10) name old allocs/op new allocs/op delta RangeSimple-4 98.0 ± 0% 2.0 ± 0% -97.96% (p=0.000 n=10+10) RangeSimpleSet-4 101 ± 0% 5 ± 0% -95.05% (p=0.000 n=10+10) CustomRanger-4 2.00 ± 0% 0.00 -100.00% (p=0.000 n=10+10) FieldAccess-4 2.00 ± 0% 0.00 -100.00% (p=0.000 n=10+10)
Very impressive, thank you! In my defense: when I wrote the linked commit, I was going for correctness first. Thanks for taking the time to benchmark and improve the code, I'm sure lots of jet users will love the result. |
No worries, I know how that goes! Unless we add specific test failures to lock down on performance behavior, it's not super unusual to have performance degrade once in a while during refactorings. |
Depends on #200
This improves the performance of resolveIndex calls by bringing back the structure cache access that was removed in commit be876de and by modifying the function's signature to receive either a reflect.Value or a string argument, depending on what the call sites already have. This brings down allocations for the most common execution paths back to zero.
Benchmarks shows that the hot path for executing templates involve allocations made when executing a {{ .Field }} action when the current context is a structure. This is one of the most fundamental building blocks for templates, and thus deserves a closer inspection on how to make performant.
Going over the history of this package, up to version 3.0.0, benchmarks showed this code path to have zero allocations, and further digging shows that commit be876de rewrote the resolveIndex function in such a way that it dropped support for a package-level cache of parsed structures, which avoided these allocations for commonly used structures.
This commit brings back access to that cache for the common case, while retaining the previous behavior of using the FieldByName() call (which causes one allocation).
Additionally, after analyzing the call sites for resolveIndex, it's clear that in many of them, the representation of the index to be resolved is already stored in a string variable and for the common case of structure fields, it ends up doing a roundtrip to a reflect.Value before being extracted again as a string, causing another unnecessary allocation.
Thus this commit also modifies the signature of the resolveIndex function to take either a reflect.Value or a string, depending on what is available on each call site without having to perform a conversion for the common cases, and handling the uncommon cases (when conversion is actually necessary) in a special way.
The net result is that execution of templates which consist only of field accesses become allocation-free, offering an alternative to write very memory efficient templates, even when they contain large numbers of accesses or items.
Following is a representative sample of benchmarks before and after the change: