Description
Looking for potential violations of simd_*
intrinsic preconditions, I found this in stdarch:
/// Compute dot-product of BF16 (16-bit) floating-point pairs in a and b,
/// accumulating the intermediate single-precision (32-bit) floating-point elements
/// with elements in src, and store the results in dst using zeromask k
/// (elements are zeroed out when the corresponding mask bit is not set).
/// [Intel's documentation](https://software.intel.com/sites/landingpage/IntrinsicsGuide/#expand=1769,1651,1654,1657,1660&avx512techs=AVX512_BF16&text=_mm_maskz_dpbf16_ps)
#[inline]
#[target_feature(enable = "avx512bf16,avx512vl")]
#[unstable(feature = "stdarch_x86_avx512", issue = "111137")]
#[cfg_attr(test, assert_instr("vdpbf16ps"))]
pub fn _mm_maskz_dpbf16_ps(k: __mmask8, src: __m128, a: __m128bh, b: __m128bh) -> __m128 {
unsafe {
let rst = _mm_dpbf16_ps(src, a, b).as_f32x4();
let zero = _mm_set1_ps(0.0_f32).as_f32x4();
transmute(simd_select_bitmask(k, rst, zero))
}
}
simd_select_bitmask
is documented to require that all the "extra"/"padding" bits in the mask (not corresponding to a vector element) must be 0. Here, rst
and zero
are vectors of length 4, and the mask k
is a u8
, meaning there are 4 bits in k
that must be 0. However, nothing in the function actually ensures that.
I don't know the intended behavior of the intrinsic for that case (probably intel promises to just ignore the extra bits?), but this function recently got marked as safe (in rust-lang/stdarch#1714) and that is clearly in contradiction with our intrinsic docs. I assume the safety is correct as probably the intrinsic should have no precondition; in that case we have to
- either explicitly mask out the higher bits
- or figure out if we can remove the UB from
simd_select_bitmask