Description
As noted in #126 and in my own usage of the library, compile time is roughly factorial in the number of nested loops. For many applications, it is not too hard to guess a loop ordering that should be good enough. For example, consider this set of 6 nested loops:
for λx in 1:Kx, λy in 1:Ky, λvx in 1:Kvx, λvy in 1:Kvy
for αx in 1:Nx, αv in 1:Nv
ax = as_x[λx]
ay = as_y[λy]
avx = as_vx[λvx]
avy = as_vy[λvy]
# Do some work
end
end
The indices λx, λy
etc. are larger strides than the indices αx, αv
. So we probably shouldn't even consider loop orders where the alphas are outside. In terms of ordering the outer loops, I think that a human should be able to do a pretty good job just based on the way the data is laid out in memory. It would be nice to be able to take advantage of the automatic unrolling and vectorization capabilities of LV without incurring a compile time blowup.
Is this a feasible suggestion based on how the library is structured right now?