Skip to content

Missed autovectorization for slice.iter.fold, works for slice.iter.copied.fold #113789

Open
@jhorstmann

Description

@jhorstmann

I was looking into optimizing a function that checks that all values in a slice are in range. It is not that surprising that the version with all does not get optimized because returning early (although in theory rust should be allowed to read more elements from the slice before breaking), but it is surprising that adding copied before folding makes a difference in autovectorization.

Sample code (https://rust.godbolt.org/z/5eznWbMcf):

pub fn check_range_all(keys: &[u32], max: u32) -> bool {
    keys.iter().all(|x| *x < max)
}

pub fn check_range_fold(keys: &[u32], max: u32) -> bool {
    keys.iter().fold(true, |a, x| a && *x < max)
}

pub fn check_range_copied_fold(keys: &[u32], max: u32) -> bool {
    keys.iter().copied().fold(true, |a, x| a && x < max)
}
  • check_range_all compares one element per loop iteration, using copied does not change the assembly at all (both functions are merged)
  • check_range_fold unrolls the check 8 times, each iteration it branchless, but does not use any vector instructions
  • check_range_copied_fold uses avx instructions and checks 32 elements per loop iteration

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-autovectorizationArea: Autovectorization, which can impact perf or code sizeA-codegenArea: Code generationC-optimizationCategory: An issue highlighting optimization opportunities or PRs implementing suchI-heavyIssue: Problems and improvements with respect to binary size of generated code.I-slowIssue: Problems and improvements with respect to performance of generated code.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions