-
-
Couldn't load subscription status.
- Fork 6
Description
This line is the majority of run time here:
out .+= product .* view(u, J...)
(it will use runtime iteration in a loop internally)
But we usually know a lot about J at compile time. Instead of taking a view we should extract a StaticArray over static ranges, and move those ranges over the grid using a runtime offset. This can be very fast for small windows as the compiler can heavily optimize the unrolled maps over static arrays/ranges.
See Stencils.jl for an example of this in a similar context. Mostly due to StaticArrays windows, mapstencil outperforms ImageFiltering.jl view based approach by an order of magnitude (ignoring the gains from using KernelAbstractions.jl).
(I can PR this after #33 is in)