Description
Given val: u8
, slice: &[u8; 8]
and arr: [u8; 8]
, I expected the following statements to compile down to the same thing :
// a
slice.contains(&val);
// b
slice[0] == val
|| slice[1] == val
|| slice[2] == val
|| slice[3] == val
|| slice[4] == val
|| slice[5] == val
|| slice[6] == val
|| slice[7] == val;
// c
arr.contains(&val);
// d
arr[0] == val
|| arr[1] == val
|| arr[2] == val
|| arr[3] == val
|| arr[4] == val
|| arr[5] == val
|| arr[6] == val
|| arr[7] == val;
However, the resulting assembly differs quite a lot:
- The
a
statement compiles down to a loop, checking one element at a time, except forT = u8|i8
andN < 16
where it instead call fall on the fast path ofmemchr
which gets optimized a little bit better. - The
b
statement compiles down to a unrolled-loop, checking one element at a time in a branchless fashion. Most of the time it doesn't give any SIMD instructions. - The
c
statement always compiles down to a loop, checking one element at a time, except forT = u8|i8
andN >= 16
where it instead callmemchr_general_case
- The
d
statement always compiles down to a few branchless SIMD instructions for any primitive type used and any array size.
Because the slice/array size is known at compile-time and the type checker guarantees that it will be respected by any calling function, I expected the compiler to take this into account while optimizing the resulting assembly. However, this information seems to be lost at some point when using the contains
method.
arr.contains(&val)
and slice.contains(&val)
are simplified as arr.as_ref().iter().any(|e| *e == val)
and slice.iter().any(|e| *e == val)
if I'm not mistaken (which is wierd because for some N and T, they don't yield the same assembly). The compiler does not seem to be able to unroll this case.
godbolt links for
T=u8; N=8
T=u16; N=8
T=u32; N=8
T=u64; N=8