[mono][jit] Adding compare all/any intrinsics.#83515
Conversation
src/mono/mono/mini/simd-intrinsics.c
Outdated
| if (!(cfg->compile_aot && cfg->full_aot && !cfg->interp)) | ||
| return NULL; | ||
| #endif | ||
| //#ifdef TARGET_ARM64 |
There was a problem hiding this comment.
It would be nice to add some kind of option for this into utils/options-def.h:
DEFINE_BOOL(experimental_jit_simd, "experimental-jit-simd", FALSE,"")
There was a problem hiding this comment.
I like the idea. This is now tracked in #83587
|
|
src/mono/mono/mini/mini-arm64.c
Outdated
| { | ||
| switch (mode) { | ||
| case SIMD_EXTR_MAX8: | ||
| arm_neon_umaxv (code, VREG_FULL, TYPE_I8, FP_TEMP_REG, sreg1); |
There was a problem hiding this comment.
Is it possible to not hardcode the length of the vector to VREG_FULL. I am thinking of reusing for vector64 in the future.
There was a problem hiding this comment.
Yes, there is a 64-bit variant of this.
There was a problem hiding this comment.
OP_XEXTRACT now carries vector width (8 or 16) in ins->inst_c1.
src/mono/mono/mini/simd-intrinsics.c
Outdated
|
|
||
| if (!strcmp (m_class_get_name (cfg->method->klass), "Vector256")) | ||
| if (!strcmp (m_class_get_name (cfg->method->klass), "Vector256") || !strcmp (m_class_get_name (cfg->method->klass), "Vector512")) | ||
| return NULL; // TODO: Fix Vector256.WithUpper/WithLower |
There was a problem hiding this comment.
The TODO comment is no longer true here. Everything there needs to be tested, when supporting Vector256. Do you mind removing the comment in this PR?
|
Failures are related. Strangely, the results in |
…r names. Equality/Inequality are now also intrinsics.
…l are implemented.
This adds the following intrinsics:
Equals{All,Any},GreaterThan{All,Any},GreaterThanOrEqual{All,Any},LessThan{All,Any},LessThanOrEqual{All,Any}.This performs the vector compare operation, whose result is either all 1 or all 0 in the element depending on the result. Afterwards a horizontal maximum or minimum byte is found to determine if all (or any) are nonzero.
umaxv/uminvshould be fast on M1 with a latency of 3 and throughput of 4 per clock. (https://dougallj.github.io/applecpu/firestorm-simd.html)Necessary arm64 codegen macros are also fixed and converted to the parametrized form.
Contributes to #80566.