Open
Description
openedon May 19, 2020
I tried this code:
pub fn count_non_ascii_sse2(buffer: &[u8]) -> u64 {
let mut count = 0;
let (prefix, simd, suffix) = unsafe { buffer.align_to::<core::arch::x86_64::__m128i>() };
for &b in prefix {
if b >= 0x80 {
count += 1;
}
}
for &s in simd {
count += unsafe {core::arch::x86_64::_mm_movemask_epi8(s)}.count_ones() as u64;
}
for &b in suffix {
if b >= 0x80 {
count += 1;
}
}
count
}
I expected to see this happen: I expected the compiler to conclude that prefix.len() < 16
and, therefore, not emit autovectorization for the first scalar loop.
Instead, this happened: The compiler emitted an autovectorization for the first scalar loop even though the prefix is never long enough for the autovectorization to be useful.
Meta
rustc --version --verbose
:
rustc 1.45.0-nightly (a74d1862d 2020-05-14)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Metadata
Assignees
Labels
Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues.Area: SIMD (Single Instruction Multiple Data)Area: Autovectorization, which can impact perf or code sizeCategory: This is a bug.Issue: Problems and improvements with respect to performance of generated code.Relevant to the compiler team, which will review and decide on the PR/issue.