-
-
Notifications
You must be signed in to change notification settings - Fork 124
Fix NEON optimization on big-endian aarch64 #188
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Due to how the NeonMoveMask is constructed, the byteswap on BE does not appear to be needed. In fact, on my aarch64_be test system, with the byteswap many tests fail, but without it, all tests pass.
|
Thank you! How did you test it? Can we add it to CI? |
|
I tested it manually on a machine that runs an unofficial port of Alpine Linux for aarch64_be. More information can be found at https://alpine.sakamoto.pl/jn/. aarch64_be Linux runs on regular aarch64 hardware, either bare-metal or in a VM. Usermode emulation in QEMU also supports aarch64_be, but the compiler toolchain situation isn't great at the moment. The documented way to build a aarch64_be toolchain requires building a C/C++ toolchain first, because the Alpine port is unofficial. |
|
I am investigating |
|
Thanks for investigating. I'm hesitant to block this on getting CI working, especially if this is a correctness fix. But I don't have the bandwidth to dig into testing aarch64 big-endian myself right now. In lieu of that, would it be possible for you to share the |
|
Yes, sure! cargo test on aarch64_be, memchr 2.7.5cargo test on aarch64_be, memchr with this PRThe doctests cannot run at the moment, because I don't have rustdoc installed on the test machine, but I don't expect any big surprises there. |
|
Thanks! |
|
Thank you! That's good enough for me to get this merged and release. |
|
This PR is on crates.io in |
I'm not completely aware how the code works, but I've tested this fix on aarch64_be. Previously,
memchrproduced many wrong results, but with this fix it's good.aarch64 (little-endian) and other big-endian architectures should be unaffected.
This bugfix is necessary in order to build a working rustc, standard library and cargo.