Description
This is a reduced example of a problem I've run into trying to make safe SIMD wrappers. The idea here is to have a newtype that can only be constructed when the capability is dynamically detected. However, the compiler seems to get confused about calling conventions when calling into code with target_feature
enabled from code that doesn't.
#[cfg(target_arch = "x86_64")]
use std::arch::x86_64::*;
#[target_feature(enable = "avx")]
unsafe fn avx_mul(a: __m256, b: __m256) -> __m256 {
_mm256_mul_ps(a, b)
}
#[target_feature(enable = "avx")]
unsafe fn avx_store(p: *mut f32, a: __m256) {
_mm256_storeu_ps(p, a)
}
#[target_feature(enable = "avx")]
unsafe fn avx_setr(a: f32, b: f32, c: f32, d: f32, e: f32, f: f32, g: f32, h: f32) -> __m256 {
_mm256_setr_ps(a, b, c, d, e, f, g, h)
}
#[target_feature(enable = "avx")]
unsafe fn avx_set1(a: f32) -> __m256 {
_mm256_set1_ps(a)
}
struct Avx(__m256);
fn mul(a: Avx, b: Avx) -> Avx {
unsafe { Avx(avx_mul(a.0, b.0)) }
}
fn set1(a: f32) -> Avx {
unsafe { Avx(avx_set1(a)) }
}
fn setr(a: f32, b: f32, c: f32, d: f32, e: f32, f: f32, g: f32, h: f32) -> Avx {
unsafe { Avx(avx_setr(a, b, c, d, e, f, g, h)) }
}
unsafe fn store(p: *mut f32, a: Avx) {
avx_store(p, a.0);
}
pub fn main() {
let mut result = [0.0f32; 8];
let a = mul(setr(0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0), set1(0.25));
unsafe { store(result.as_mut_ptr(), a)}
println!("{:?}", result);
}
Output:
[0.0, 5.0, 12.0, 21.0, 0.0, 0.0, 0.0, 0.0]
Errors:
Compiling playground v0.0.1 (file:///playground)
Finished release [optimized] target(s) in 0.59s
Running `target/release/playground`
In a debug build, the answer is [0.0, 0.25, 0.5, 0.75, 1.0, 1.25, 1.5, 1.75]
as expected. Notice that the first 3 values of the miscompiled version are [0, 1, 2, 3] * [4, 5, 6, 7], suggesting that the halves are getting scrambled (and this is confirmed by looking at the generated asm).
Also, this just crashes on Windows.
Same miscompilation happens if I move the Avx() newtype wrapper up into the top four functions.
It's possible I don't understand the rules for what's safe to do in SIMD. If that's the case, the limitations on passing values across function boundaries should be documented.