Skip to content

align_to prefix max length not taken into account in optimization #72356

Open

Description

I tried this code:

pub fn count_non_ascii_sse2(buffer: &[u8]) -> u64 {
    let mut count = 0;
    let (prefix, simd, suffix) = unsafe { buffer.align_to::<core::arch::x86_64::__m128i>() };
    for &b in prefix {
        if b >= 0x80 {
            count += 1;
        }
    }
    for &s in simd {
        count += unsafe {core::arch::x86_64::_mm_movemask_epi8(s)}.count_ones() as u64;
    }
    for &b in suffix {
        if b >= 0x80 {
            count += 1;
        }
    }
    count
}

Godbolt link

I expected to see this happen: I expected the compiler to conclude that prefix.len() < 16 and, therefore, not emit autovectorization for the first scalar loop.

Instead, this happened: The compiler emitted an autovectorization for the first scalar loop even though the prefix is never long enough for the autovectorization to be useful.

Meta

rustc --version --verbose:

rustc 1.45.0-nightly (a74d1862d 2020-05-14)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    A-LLVMArea: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues.A-SIMDArea: SIMD (Single Instruction Multiple Data)A-autovectorizationArea: Autovectorization, which can impact perf or code sizeC-bugCategory: This is a bug.I-slowIssue: Problems and improvements with respect to performance of generated code.T-compilerRelevant to the compiler team, which will review and decide on the PR/issue.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions