Open
Description
Consider the following functions:
use std::arch::x86_64::*;
const C: usize = 1024;
const VAL: u64 = 1337;
#[target_feature(enable = "avx2")]
pub unsafe fn very_fast(x: fn(&[__m256i; C / 2])) {
let slice = [_mm256_set_epi64x(0, 1337, 0, 1337); C / 2];
x(&slice);
}
#[target_feature(enable = "avx2")]
pub unsafe fn eqfast(x: fn(&[__m128i; C])) {
let slice = [_mm_set_epi64x(0, 1337); C];
x(&slice);
}
pub unsafe fn fast(x: fn(&[__m128i; C])) {
let slice = [_mm_set_epi64x(0, 1337); C];
x(&slice);
}
pub fn slow(x: fn(&[u128; C])) {
let slice = [VAL as u128; C];
x(&slice);
}
Currently (Rust 1.70.0), the following can be observed:
slow
uses scalar instructions. It could use SSE2 instructions, but doesn't.fast
andeqfast
use SSE2 instructions.eqfast
could use AVX instructions, but doesn't.very_fast
uses AVX instructions.
It is likely better if the following was true instead:
slow
andfast
use SSE2 instructions.eqfast
andvery_fast
use AVX instructions.