Skip to content

Slice::contains generates suboptimal assembly code #88204

Closed
@SadiinsoSnowfall

Description

@SadiinsoSnowfall

Given val: u8, slice: &[u8; 8] and arr: [u8; 8], I expected the following statements to compile down to the same thing :

// a
slice.contains(&val);

// b
slice[0] == val
  || slice[1] == val
  || slice[2] == val
  || slice[3] == val
  || slice[4] == val
  || slice[5] == val
  || slice[6] == val
  || slice[7] == val;
  
// c
arr.contains(&val);

// d
arr[0] == val
  || arr[1] == val
  || arr[2] == val
  || arr[3] == val
  || arr[4] == val
  || arr[5] == val
  || arr[6] == val
  || arr[7] == val;

However, the resulting assembly differs quite a lot:

  • The a statement compiles down to a loop, checking one element at a time, except for T = u8|i8 and N < 16 where it instead call fall on the fast path of memchr which gets optimized a little bit better.
  • The b statement compiles down to a unrolled-loop, checking one element at a time in a branchless fashion. Most of the time it doesn't give any SIMD instructions.
  • The c statement always compiles down to a loop, checking one element at a time, except for T = u8|i8 and N >= 16 where it instead call memchr_general_case
  • The d statement always compiles down to a few branchless SIMD instructions for any primitive type used and any array size.

Because the slice/array size is known at compile-time and the type checker guarantees that it will be respected by any calling function, I expected the compiler to take this into account while optimizing the resulting assembly. However, this information seems to be lost at some point when using the contains method.

arr.contains(&val) and slice.contains(&val) are simplified as arr.as_ref().iter().any(|e| *e == val) and slice.iter().any(|e| *e == val) if I'm not mistaken (which is wierd because for some N and T, they don't yield the same assembly). The compiler does not seem to be able to unroll this case.

godbolt links for
T=u8; N=8
T=u16; N=8
T=u32; N=8
T=u64; N=8

T=u8; N=16
T=u16; N=16
T=u32; N=16
T=u64; N=16

Metadata

Metadata

Assignees

No one assigned

    Labels

    I-slowIssue: Problems and improvements with respect to performance of generated code.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions