-
Notifications
You must be signed in to change notification settings - Fork 299
intrinsic-test: test intrinsics with patched core_arch
#1923
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
e5d81ff
to
a48dbcc
Compare
d904148
to
a457d9b
Compare
a457d9b
to
205e7b4
Compare
The problem analysis is correct here, I think: that test should be using the local build of stdarch. But we did all agree that #1919 makes a correct change according to the documentation. I'll see if I can actually compare the generated assembly for clang and rustc for some of the failing functions. |
I checked a simplified version of generated assembly of __attribute__((noinline)) void run_vreinterpret_u16_u8(uint8_t *a_vals,
uint16_t *b_vals) {
vst1_u16(b_vals, vreinterpret_u16_u8(vld1_u8(a_vals)));
} compiles to
with gcc and
with clang. According to https://developer.arm.com/documentation/102159/0400/Load-and-store---data-structures,
Reversing the byte order is reasonable. Moreover, GCC indeed did not generate any instructions for |
Right, I think we just missed some? e.g. |
Ah, no, I was using an old branch, and got confused. I actually think it might be an ABI difference that bites us here, consider: use core::arch::aarch64::{int32x4_t, uint8x16_t, vreinterpretq_u8_s32};
#[no_mangle]
#[target_feature(enable = "neon")]
pub unsafe extern "C" fn c(x: int32x4_t) -> uint8x16_t {
vreinterpretq_u8_s32(x)
}
#[no_mangle]
#[target_feature(enable = "neon")]
pub unsafe fn rust(x: int32x4_t) -> uint8x16_t {
vreinterpretq_u8_s32(x)
} ; Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
define <16 x i8> @c(<4 x i32> %x) unnamed_addr #0 !dbg !6 {
start:
#dbg_value(<4 x i32> %x, !32, !DIExpression(), !33)
#dbg_value(<4 x i32> %x, !34, !DIExpression(), !39)
%_0 = bitcast <4 x i32> %x to <16 x i8>, !dbg !41
ret <16 x i8> %_0, !dbg !42
}
; Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(argmem: readwrite)
define void @rust(ptr dead_on_unwind noalias noundef writable writeonly sret([16 x i8]) align 16 captures(none) dereferenceable(16) initializes((0, 16)) %_0, ptr dead_on_return noalias noundef readonly align 16 captures(none) dereferenceable(16) %x) unnamed_addr #1 !dbg !43 {
start:
#dbg_declare(ptr %x, !45, !DIExpression(), !46)
#dbg_declare(ptr %x, !47, !DIExpression(), !50)
%0 = load <4 x i32>, ptr %x, align 16, !dbg !52
store <4 x i32> %0, ptr %_0, align 16, !dbg !52
ret void, !dbg !53
} with the rust ABI a load and store are performed, while with |
It should be emphasized that although the assembly generated by GCC does not contain |
I guess Edit: see also https://llvm.org/docs/BigEndianNEON.html |
But in C, edit: the c++ code, using edit: I believe the error is in the generated rust files, we print the output of rust using edit: it seems that |
#[unsafe(no_mangle)]
extern "C" fn run_vreinterpret_u16_u8(a_vals: *mut u8, b_vals: *mut u16) {
use core::arch::aarch64::*;
unsafe {
vst1_u16(b_vals, vreinterpret_u16_u8(vld1_u8(a_vals)));
}
}
It's asm generated by |
Ok, so the problem seems to be in edit: Just realized, if that is the case, why has been our test suite passing at all?? |
it is probably cleanest to revert #1919, merge this PR, and then properly fix the intrinsics? unless the fix is simple? |
This reverts commit 24f89ca.
I'm surprised to find that
intrinsic-test
is testingcore::arch
, notcore_arch::arch
. This caused the failure that should have been detected in #1919 to go unnoticed until rust-lang/rust#146709.I guess the compiler's previous behavior was most likely correct, so we probably need to revert #1919. However, I really can't find any specification that describes the behavior of ARM NEON in big-endian mode.
cc @sayantn @adamgemmell @folkertdev @Jamesbarford