Open
Description
I've been looking at why some iterator fail to optimize well (#38038 & #80416) and it seems like it all boils down to having branches in the next()
call which breaks loop unrolling. To give some context, it seems like these are some of the most low-hanging fruit in the Iterator
API which almost always ends up generating ideal assembly.
This fails to unroll:
const LEN: usize = 100_000;
pub fn compute(vals0: &mut [f32; LEN / 2], vals1: &mut [f32; LEN / 2]) {
struct Iter<'a> {
vals: &'a mut [f32; LEN / 2],
i: usize,
}
let mut iter = Iter { vals: vals0, i: 0 };
let mut iters = Some(Iter { vals: vals1, i: 0 });
loop {
// Adding a likely hint here doesn't change the codegen.
if let Some(val) = iter.vals.get_mut(iter.i) {
*val = val.sqrt();
iter.i += 1;
} else {
if let Some(new_iter) = iters.take() {
iter = new_iter;
} else {
break;
}
}
}
}
... while this doesn't:
const LEN: usize = 100_000;
pub fn compute(vals0: &mut [f32; LEN / 2], vals1: &mut [f32; LEN / 2]) {
for vals in [vals0, vals1] {
for val in vals {
*val = val.sqrt();
}
}
}
The first case can be unrolled manually:
const AMOUNT: usize = 128;
if let Some(vals) = iter.vals.get_mut(iter.i..iter.i + AMOUNT) {
for val in vals {
*val = val.sqrt();
}
iter.i += AMOUNT;
}
... but this doesn't work with Iterator::next
.