-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add tests for the generated assembly of mask related simd instructions. #121953
Add tests for the generated assembly of mask related simd instructions. #121953
Conversation
rustbot has assigned @Mark-Simulacrum. Use r? to explicitly pick a reviewer |
This comment has been minimized.
This comment has been minimized.
I don't think Mark reviews compiler stuff @rustbot label +T-compiler +A-simd |
r? @workingjubilee halp, lots of SIMD stuff |
Oh cool. Will try to review this tomorrow. |
Do all the pre-EVEX examples need AVX2? I thought vmaskmov was an AVX instruction? |
I'm thinking of rather adding tests for SSE2, since that is unfortunately still the baseline. Thank you for the review! I'll push an update shortly. |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! Things are much clearer now. Unfortunately that means I noticed things I didn't notice the first time! Apologies. Yes, CHECK-DAG is quite excellent for this sort of thing!
@jhorstmann I'm pretty happy with the state of the avx2/avx512 tests assembled here, so if you would rather, we could cut the aarch64 material, land the rest, and then reintroduce it in a later PR that also adds the sse2 tests you are thinking of. |
Also feel free to rebase this. I personally basically completely reread PRs on rereview which is why I often notice things on the second pass, rather than assuming the history is informative, though it does mean reviewing a huge PR takes forever. |
// aarch64-NEXT: ext | ||
// aarch64-NEXT: zip1 | ||
// aarch64-NEXT: addv | ||
// aarch64-NEXT: fmov |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks ok. Full listing including the constants:
5: .LCPI0_0:
6: .byte 1
7: .byte 2
8: .byte 4
9: .byte 8
10: .byte 16
11: .byte 32
12: .byte 64
13: .byte 128
14: .byte 1
15: .byte 2
16: .byte 4
17: .byte 8
18: .byte 16
19: .byte 32
20: .byte 64
21: .byte 128
22: .section .text.bitmask_m8x16,"ax",@progbits
23: .globl bitmask_m8x16
24: .p2align 2
25: .type bitmask_m8x16,@function
26: bitmask_m8x16:
27: .cfi_startproc
28: adrp x8, .LCPI0_0
29: cmlt v0.16b, v0.16b, #0
30: ldr q1, [x8, :lo12:.LCPI0_0]
31: and v0.16b, v0.16b, v1.16b
32: ext v1.16b, v0.16b, v0.16b, #8
33: zip1 v0.16b, v0.16b, v1.16b
34: addv h0, v0.8h
35: fmov w0, s0
36: ret
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool.
r=me with the fixups squashed out. |
✌️ @jhorstmann, you can now approve this pull request! If @workingjubilee told you to " |
fe08d2d
to
ce470fa
Compare
The tests show that the code generation currently uses the least significant bits of <iX x N> vector masks when converting to <i1 xN>. This leads to an additional left shift operation in the assembly for x86, since mask operations on x86 operate based on the most significant bit. On aarch64 the left shift is followed by a comparison against zero, which repeats the sign bit across the whole lane. The exception, which does not introduce an unneeded shift, is simd_bitmask, because the code generation already shifts before truncating. By using the "C" calling convention the tests should be stable regarding changes in register allocation, but it is possible that future llvm updates will require updating some of the checks. This additional instruction would be removed by the fix in rust-lang#104693, which uses the most significant bit for all mask operations.
ce470fa
to
e91f937
Compare
@bors rollup=always |
…kingjubilee Rollup of 12 pull requests Successful merges: - rust-lang#121754 ([bootstrap] Move the `split-debuginfo` setting to the per-target section) - rust-lang#121953 (Add tests for the generated assembly of mask related simd instructions.) - rust-lang#122081 (validate `builder::PATH_REMAP`) - rust-lang#122245 (Detect truncated DepGraph files) - rust-lang#122354 (bootstrap: Don't eagerly format verbose messages) - rust-lang#122355 (rustdoc: fix up old test) - rust-lang#122363 (tests: Add ui/attributes/unix_sigpipe/unix_sigpipe-str-list.rs) - rust-lang#122366 (Fix stack overflow with recursive associated types) - rust-lang#122377 (Fix discriminant_kind copy paste from the pointee trait case) - rust-lang#122378 (Properly rebuild rustbooks) - rust-lang#122380 (Fix typo in lib.rs of proc_macro) - rust-lang#122381 (llvm-wrapper: adapt for LLVM API changes) r? `@ghost` `@rustbot` modify labels: rollup
Rollup merge of rust-lang#121953 - jhorstmann:assembly-tests-for-masked-simd-instructions, r=workingjubilee Add tests for the generated assembly of mask related simd instructions. The tests show that the code generation currently uses the least significant bits of <iX x N> vector masks when converting to <i1 xN>. This leads to an additional left shift operation in the assembly for x86, since mask operations on x86 operate based on the most significant bit. The exception is simd_bitmask, which already uses the most-significant bit. This additional instruction would be removed by the changes in rust-lang#104693, which makes all mask operations consistently use the most significant bits. By using the "C" calling convention the tests should be stable regarding changes in register allocation, but it is possible that future llvm updates will require updating some of the checks.
// CHECK-LABEL: select_f64x8 | ||
#[no_mangle] | ||
pub unsafe extern "C" fn select_f64x8(mask: m64x8, a: f64x8, b: f64x8) -> f64x8 { | ||
// The parameter is a 256 bit vector which in the C abi is only valid for avx512 targets. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a typo? The comment says "256 bit vector" but the arguments are actually 512 bits large.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, good catch, that looks to be copy-paste mistake.
The tests show that the code generation currently uses the least significant bits of vector masks when converting to . This leads to an additional left shift operation in the assembly for x86, since mask operations on x86 operate based on the most significant bit.
The exception is simd_bitmask, which already uses the most-significant bit.
This additional instruction would be removed by the changes in #104693, which makes all mask operations consistently use the most significant bits.
By using the "C" calling convention the tests should be stable regarding changes in register allocation, but it is possible that future llvm updates will require updating some of the checks.