Code
#[cfg(target_arch = "x86_64")]
use std::arch::x86_64::*;
fn main() {
    unsafe {
        let f = _mm256_set_pd(2.0, 2.0, 2.0, 2.0);
        let r = _mm256_mul_pd(f, f);
        println!("{:?}", r);
    }
}
Output
The expected output is
__m256d(4.0, 4.0, 4.0, 4.0)
The actual output is
__m256d(4.0, 4.0, 0.0, 0.0)
Notes
- The code built in debug mode produces the expected result. This issue only occurs in a release build.
- Using _mminstead of_mm256results in correct output in both debug and release mode.
- Code using only _mmor_mm256yields the same performance (I have another piece of code for that. I can provide it if needed).
Versions
The issue can be reproduced with 1.30.0-nightly (2018-09-24), 1.30.0-beta.7, 1.29.0.