Closed
Description
Steps to reproduce
- Clone https://github.com/hsivonen/encoding_rs
cd encoding_rs
git checkout simd
rustup default rustup default 1.32.0
rustup target add armv7-unknown-linux-gnueabihf
RUSTC_BOOTSTRAP=1 RUSTFLAGS='-C target_feature=+neon,+thumb-mode,+thumb2' cargo rustc --target armv7-unknown-linux-gnueabihf --features simd-accel --release -- --emit asm
find target | grep -c '\.s$'
rm -rf target
git checkout packed_simd
rustup default nightly
rustup target add thumbv7neon-unknown-linux-gnueabihf
cargo rustc --target thumbv7neon-unknown-linux-gnueabihf --features simd-accel --release -- --emit asm
find target | grep -c '\.s$'
Actual results
In the simd
+ Rust 1.32 case, there is one .s
file. In the packed_simd
+ Rust 1.34 case, encoding_rs
is split across 31 .s
files. These are all .rcgu.s
files. Examining these files suggests lesser inlining within encoding_rs
, although code from packed_simd
and core::arch
appears to have gotten inlined.
Expected result
Expected one .s
file with the same level of inlining in the packed_simd
+ Rust 1.34 case as with the simd
+ Rust 1.32 case.
Additional info
When building an actual binary from a different top-level crate (encoding_bench
), the packed_simd
+ Rust 1.34 case regresses performance relative to the simd
+ Rust 1.32 on Exynos 5.
Metadata
Metadata
Assignees
Labels
No labels