Open
Description
Code
I tried this code:
use std::convert::TryInto;
pub fn mul3(previous: &[u8], current: &mut [u8]) {
let mut c_bpp = [0; 4];
for (chunk, b_bpp) in current.chunks_exact_mut(4).zip(previous.chunks_exact(4))
{
let new_chunk = [
chunk[0].wrapping_add(c_bpp[0]),
chunk[1].wrapping_add(c_bpp[1]),
chunk[2].wrapping_add(c_bpp[2]),
chunk[3].wrapping_add(c_bpp[3]),
];
*TryInto::<&mut [u8; 4]>::try_into(chunk).unwrap() = new_chunk;
c_bpp = b_bpp.try_into().unwrap();
}
}
I expected to see this happen: Function runs quickly thanks to auto-vectorization.
Instead, this happened: Function is 60% slower than before, because it now doesn't get vectorized
Godbolt comparison link: https://godbolt.org/z/8EhWdYc13
Version it worked on
It most recently worked on: rustc 1.86.0 (which uses LLVM version 19.1.7)
Version with regression
rustc --version --verbose
:
rustc 1.87.0 (17067e9ac 2025-05-09)
binary: rustc
commit-hash: 17067e9ac6d7ecb70e50f92c1944e545188d2359
commit-date: 2025-05-09
host: x86_64-unknown-linux-gnu
release: 1.87.0
LLVM version: 20.1.1
Other context
This is an attempted minimization of image-rs/image-png#598.
Metadata
Metadata
Assignees
Labels
Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues.Category: An issue highlighting optimization opportunities or PRs implementing suchIssue: Indicates that prioritization has been requested for this issue.Issue: Problems and improvements with respect to performance of generated code.This issue may need triage. Remove it if it has been sufficiently triaged.Performance or correctness regression from one stable version to another.