-
Notifications
You must be signed in to change notification settings - Fork 5.2k
[mono] Basic SIMD support for System.Numerics.Vector2 on arm64 #91659
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[mono] Basic SIMD support for System.Numerics.Vector2 on arm64 #91659
Conversation
This reverts commit 27d244b.
|
/azp run runtime-extra-platforms |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
Perf_Vector2 microbenchmarks on osx arm64 JIT-mini:
|
These are some impressive speedups, nice! As for |
|
/azp run runtime-extra-platforms |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
@LoopedBard3 - if the aot-llvm arm64 local testing script ready, please add a link to the documentation and @matouskozak you should try to get numbers for aot-llvm arm64 also if possible via that script. |
|
The test failures are tracked/unrelated to this PR. |
| const int t = get_type_size_macro (ins->inst_c1); | ||
| arm_neon_fdup_e (code, VREG_FULL, t, dreg, sreg1, 0); | ||
| if (ins->opcode == OP_EXPAND_R8) | ||
| arm_neon_fdup_e (code, VREG_FULL, t, dreg, sreg1, 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OP_EXPAND_R8 can be simplified to a mov dreg, sreg1 or nothing if dreg == sreg1.
fanyang-mono
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Re-created PR that adds basic SIMD support for
System.Numerics.Vector2on arm64. Equaling the current support forSystem.Numerics.Vector4. Renamevector2_methodstable tovector_2_3_4_methodsto better reflect its usage.Current SIMD support for
Vector2with mini/llvm:Vector2/floatscenario, will enable in the next PR)Future work on the missing intrinsic is tracked here #91394.
Contributes to: #73462
p.s. These getters currently use 128-bit code paths for emitting const values (
emit_xconst_v128) even for Vector2 (64-bit vector):Comment from @jandupej on the original PR:
You can use a fmov to flood the lower two floats with 1.0f. This gives you the fastest SN_get_One possible (there is a 64-bit variant of this, with q=0). To make SN_get_UnitX/Y you can shift the vector left or right as doubles by 32. Zeros are shifted in, so this will give you a (0.0f, 1.0f) or reverse. This will destroy the upper 64 bits of the register, but it shouldn't be a problem as only the lower 64 bits are of importance.