Open
Description
I suspect this is an issue in upstream LLVM. The sse2 version and the unsigned version (_mm256_mulhi_epu16
) show the same problem. If a wider register is available (xmm -> ymm -> zmm) that will be used instead of splitting the values between 2 different ones.
Code
https://godbolt.org/z/9Eqb45Keq
I tried this code:
pub unsafe fn bad(a: __m256i) -> __m256i {
let a = _mm256_and_si256(a, _mm256_set1_epi16(0x7FFF));
_mm256_mulhi_epi16(a, _mm256_set1_epi16(1000))
}
I expected to see this happen: more or less the same codegen as with a -1000 in multiplier
Instead, this happened: it looks like the vector is widened to i32 for no good reason.
Version it worked on
It most recently worked on: Rust 1.74
Version with regression
I checked on godbolt with 1.75-1.81 and whatever beta and nightly are today.
Metadata
Metadata
Assignees
Labels
Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues.Category: An issue highlighting optimization opportunities or PRs implementing suchCall for participation: An issue has been fixed and does not reproduce, but no test has been added.Issue: Problems and improvements with respect to performance of generated code.Medium priorityUntriaged performance or correctness regression.