-
Notifications
You must be signed in to change notification settings - Fork 82
Improve performance of functions with dynamic arguments #345
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Somehow I can't add reviewers to the PR. @mkustermann @eyebrowsoffire could you have a look? |
Maybe it's time to make a breaking release soon instead? Then I would remove all the dynamic arguments from all the classes, and we would also follow semver for #270 |
This change doesn't need to be a breaking change. We can add a deprecation to |
Yeah, it wouldn't be a breaking change right now. But if following semver, marking it as deprecated is just the first step towards a breaking change, when removing the deprecated method major should be bumped (https://semver.org/#how-should-i-handle-deprecating-functionality). I'm not too fond of all the |
Let's get a changelog in here and bump the minor (.X.) version |
This comment was marked as resolved.
This comment was marked as resolved.
@osa1 – by all means revert if you don't like the last commit |
f4bf7e3
to
cdbed22
Compare
I dropped your commit as it changed benchmark results and made them worse, and then I had to debug this because I wasn't able to reproduce the benchmark results in my PR description. (The "cleanup" change calls less efficient function from more efficient one and makes |
@mraleph @rakudrama @mkustermann I think I addressed all of the feedback. Could you PTAL? Start with the PR description, I just applied the same changes to other functions that take @kevmoo as far as I know still no one from the backend teams can approve reviews, can we give at least one person from each backend review rights? They already reviewed this PR, they just officially can't approve. (Same as SDK, it makes sense for changes to libraries to be reviewed by backend teams to catch any performance or binary size issues) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally LGTM with the following comments
- move stores up as far as possible
- access
[15]
first - have single implementation that all others delegate to
- make functions only have a single return & mark as
@pragma('dart2js:prefer-inline')
If the last point improves performance in dart2js similar to dart2wasm & vm, then do we get meaningful perf benefits from having the {translate,scale,...}By{Double,...}
operations?
Done. I haven't benchmarked this change in isolation.
Done. Doesn't affect performance when I inline everything.
Done. Improves dart2js performance.
NOT done: doesn't make any difference in dart2wasm and AOT, but makes dart2js perform worse.
|
} | ||
} | ||
} | ||
|
||
void main() { | ||
MatrixMultiplyBenchmark.main(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This structure of running each benchmark in turn is susceptible to JIT-effects where the code is initially specialized to the first benchmark, and then de-optimized and recompiled against less constrained inputs.
When this happens, the reported performance is dependent on the order of the benchmarks,
To counter this, I usually create a list of benchmarks, warm them all up, then do the actual measurements.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's do it separately. It's going to be a big change to update all of the benchmarks, and it can be done separately.
I'm worried about moving stores. There are constructors that take 'storage' as an input, so there can be aliasing via typed data views.
Can we make the statement check
|
lib/src/vector_math/matrix4.dart
Outdated
} | ||
throw ArgumentError(arg); | ||
return value; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it is worthwhile making the dynamic
versions faster like this.
The user has to do a dynamic call or cast to use the result.
I'd rather they were deprecated with proper versioning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But it's free performance, why not make it faster?
(It's free because I assume all calls sites will directly call it, so it'll always be inlined, and the type tests will always disappear as the argument types will also known at the call sites. So no binary size increase.)
I added deprecations.
Which function(s) do you mean that can have aliasing bugs? The code with stores moved up is this: void translateByDouble(double tx, double ty, double tz, double tw) {
final t1 = _m4storage[0] * tx +
_m4storage[4] * ty +
_m4storage[8] * tz +
_m4storage[12] * tw;
_m4storage[12] = t1;
final t2 = _m4storage[1] * tx +
_m4storage[5] * ty +
_m4storage[9] * tz +
_m4storage[13] * tw;
_m4storage[13] = t2;
final t3 = _m4storage[2] * tx +
_m4storage[6] * ty +
_m4storage[10] * tz +
_m4storage[14] * tw;
_m4storage[14] = t3;
final t4 = _m4storage[3] * tx +
_m4storage[7] * ty +
_m4storage[11] * tz +
_m4storage[15] * tw;
_m4storage[15] = t4;
} There's only one array being used here, so aliasing can't be an issue.
I reported benchmarks with this here: #345 (comment). It doesn't make any difference in any of: AOT, dart2js, dart2wasm. It's difficult to make it work with dart2wasm because wasm-opt doesn't track accessed locations and omit bounds checks based on that. As mentioned in the thread, I reported this use case to the wasm-opt team and but it's in the backlog with low priority. |
I think all feedback is addressed now. @kevmoo we can merge this. Make sure to squash and copy/paste PR description as the commit message. GitHub's UI messes with formatting when it automatically inserts PR description as the commit message when you click on "squash and merge". |
Bumped major version instead as this is a breaking change. (new membrers added to non-final class) |
This reverts commit 01daccc.
Per Martin's feedback I reverted the major version bump. Flutter uses this package, and when it migrates to the new major version other packages that are commonly used with Flutter that also use vector_math won't compile. To avoid this we bump minor version. The change is technically breaking (new members added to non-final class), but you shouldn't extend these classes anyway (kills performance). So hopefully no one's done it. |
Revisions updated by `dart tools/rev_sdk_deps.dart`. dartdoc (https://github.com/dart-lang/dartdoc/compare/95105e9..e4f9451): e4f9451a 2025-05-05 Jonas Finnemann Jensen Fix duplicate entries of elements in a category when re-exported. (dart-lang/dartdoc#4043) 876180bd 2025-05-01 dependabot[bot] Bump github/codeql-action from 3.28.13 to 3.28.16 in the github-actions group (dart-lang/dartdoc#4045) http (https://github.com/dart-lang/http/compare/63c477b..78d6114): 78d6114 2025-05-02 Brian Quinlan Add a new exception type `NSErrorClientException` (dart-lang/http#1763) 7a2e7d5 2025-05-01 Brian Quinlan Add a useful stringificiation to `WebSocketConnectionClosed` (dart-lang/http#1764) ccb6533 2025-05-01 dependabot[bot] Bump the github-actions group across 1 directory with 4 updates (dart-lang/http#1761) 7568b5c 2025-05-02 Alex Li [web_socket] Adds `WebSocketException.toString()` (dart-lang/http#1756) 3e4cceb 2025-05-01 Brian Quinlan Make response headers tests pass on firefox (dart-lang/http#1762) 5704b0c 2025-05-01 Brian Quinlan [cronet_http/cupertino_http]: Fixes bugs where cancelling `StreamedResponse.stream` did not sever the connection (dart-lang/http#1760) test (https://github.com/dart-lang/test/compare/c3755d8..55d1f9e): 55d1f9ed 2025-05-05 Fichtelcoder Fix typos in json_reporter documentation (dart-lang/test#2493) vector_math (https://github.com/google/vector_math.dart/compare/39cafd4..0279cb8): 0279cb8 2025-05-01 Ömer Sinan Ağacan Improve performance of functions with dynamic arguments (google/vector_math.dart#345) Change-Id: I9a67b997ebcf7ebe29162f8f524628013d53af5a Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/426581 Reviewed-by: Konstantin Shcheglov <scheglov@google.com> Commit-Queue: Devon Carew <devoncarew@google.com>
dart-lang/sdk@3b444c5 landed and I'm guessing this rolled to Google3. Should we do a release and get this rolled to Flutter? Should we do a verification pass on Flutter first? |
I think ideally for a package this small and simple we shouldn't need rolls for testing, we should be able to test it properly with just I don't know how comprehensive testing in this package is as I did just one commit to the package, but I see tests for all the changed functions: Given that the changes are simple, and we have unit tests covering each of the changed functions, I would just publish. |
/// Multiply this by a translation from the left. | ||
@pragma('wasm:prefer-inline') | ||
@pragma('vm:prefer-inline') | ||
@pragma('dart2js:prefer-inline') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get inlining the redirecting functions. But these are BIG functions. Should we really be inlining them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@osa1 – wondering if we should hold off on the 2nd level of inlining until we know it's a net win...maybe?
CC @rakudrama
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be counterproductive for JavaScript. I'd rather we didn't prefer-inline for dart2js.
JavaScript has its own JIT, and that can do inlining itself.
If this function is not inlined by dart2js, and super hot, it will be JIT-optimized.
Then all the call sites benefit, even if most of them are not hot.
Over-inlining can cause the host function less likely to be JIT-ed, even if it is hot.
Either is is simply too big to to attempt JIT optimization, or there are more opportunities for de-optimization, and eventually the JIT-optimizer declines to make further attempts.
Any function in JavaScript with more than a few operations is at risk of these counter-productive behaviors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #347
Matrix4.translate
is used by Flutter and it often appears as an overhead whenprofiling Flutter apps compiled to Wasm.
The function currently takes a
dynamic
argument and takes different codepaths based on the type.
This is inefficient when the call site already knows the argument's type.
In general, when we have a performance-critical generic function that checks
types of the arguments, it makes sense to introduce specialized versions of the
function based on precise argument types that it handles, so that callers can
call the more efficient specialized versions and avoid the type test overheads.
For each function with
dynamic
argument, this PR introduces specializedfunctions based on precise argument types that the functions handle, call those
specialized functions from the
dynamic
functions, and then inline thedynamic
functions.This allows compilers to eliminate the type tests when the types are known in a
call site, and call the efficient functions directly.
For example, for
Vector4.translate
, we introduce:Vector4.translateByDouble(double x, double y, double z)
Vector4.translateByVector3(Vector3 v3)
Vector4.translateByVector4(Vector4 v4)
Call sites that know the argument type can directly call these for performance.
Existing call sites will be automatically improved when updating the library,
as the type-testing functions will be inlined and type tests will be eliminated
in most (probably all cases).
dart2wasm benchmarks: (
-O2
)dart2js benchmarks: (
-O4
)AOT benchmarks:
JIT benchmarks:
Performance or other improved functions
scale
,scaled
,operator *
, andleftTranslate
should be improved similarly.