Open
Description
Don't try to understand my first version too much, the second function is much simpler. It's simply comparing two 3x3x3 patches of data in a 11x11x11 image and returning a score. I refactored this function from
fn test_loop(data: &[f64], m: usize, n: usize, o: usize) -> f64 {
let pl = 3;
let br = 4;
let bl = 11;
let bl2 = bl * bl;
let mut sum = 0.0;
for a in 0..pl {
let idx1_a = (br + a) * bl2;
let idx2_a = (m + a) * bl2;
for b in 0..pl {
let idx1_b = (br + b) * bl;
let idx2_b = (n + b) * bl;
for c in 0..pl {
let idx1 = idx1_a + idx1_b + br + c;
let idx2 = idx2_a + idx2_b + o + c;
let diff = (data[idx1] - data[idx2]).powi(2);
sum += diff;
}
}
}
sum
}
to
fn test_slice(data: &Array3<f64>, m: usize, n: usize, o: usize) -> f64 {
let pl = 3;
let br = 4;
let s1 = data.slice(s![br..br+pl, br..br+pl, br..br+pl]);
let s2 = data.slice(s![m..m+pl, n..n+pl, o..o+pl]);
let mut sum = 0.0;
azip!(s1, s2 in { sum += (s1 - s2).powi(2) });
sum
}
Of course, I'm happy with the code quality now (!), but the clean version is surprisingly slow. I benched those 2 functions using the same data, with test_loop
using &data.as_slice().unwrap()
instead of &data
test bench_loop ... bench: 25 ns/iter (+/- 5)
test bench_slice ... bench: 142 ns/iter (+/- 1)
Are those results surprising to you? Both versions don't allocate, calculate the indices (in src or lib) and the sum, etc. I fail to see why the clean version is almost 6 times slower. Is slice
doing something really complex?