You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We should optimise code generation for transposes on Slate tensors by inlining the reversed indices into the local assembly kernel.
Lawrence words:
Right now, a Tensor(form) is translated by the slate compiler into something "opaque" I think. This is because compile_form spits back a loopy kernel object.
But think about what Transpose(Tensor(form)) does, it calls the generated loopy kernel to populate a tensor A (say), and then makes more gem that is CT(A[i, j], (j, i)).
If the gem that creates A were still around, we could inline that indexing transpose right into the kernel from tsfc. This would be morally equivalent to turning Transpose(Tensor(form)) into compile_form(form, transpose=True) (if compile_form had a transpose operation).
Perhaps this is already done because Transpose(Tensor(form)) can be turned into Tensor(adjoint(form)) and maybe you do this right now?
So what this is saying is that rather than:
1) gem_expr_for_form <- TSFC generates GEM from ufl form associated with Slate tensor
2) loopy_kernel_for_form <- TSFC generate loopy from gem_expr_for_form
3) gem_expr_for_slateops <- Slate compiler generates GEM for Slate operations
4) loopy_kernel_for_slateops <- TSFC generates loopy from gem_expr_for_slateops
where DiagonalTensors and Transposes are dealt with in step 3, we want to do
1) gem_expr_for_form <- TSFC generates GEM also from all modified terminals (Diagonals and Transposes e.g.)
2) loopy_kernel_for_form <- TSFC generate loopy from gem_expr_for_form
3) gem_expr_for_slateops <- Slate compiler generates GEM for Slate operations
4) loopy_kernel_for_slateops <- TSFC generates loopy from gem_expr_for_slateops
where DiagonalTensors and Transposes and so forth are dealt with in step 1 and step 3) and 4) only apply if there are more Slate operations on top of transposes.
The text was updated successfully, but these errors were encountered:
sv2518
changed the title
Slate optimisation
Slate codegen improvement
Oct 29, 2021
We should optimise code generation for transposes on Slate tensors by inlining the reversed indices into the local assembly kernel.
Lawrence words:
So what this is saying is that rather than:
where DiagonalTensors and Transposes are dealt with in step 3, we want to do
where
DiagonalTensors
andTransposes
and so forth are dealt with in step 1 and step 3) and 4) only apply if there are more Slate operations on top of transposes.The text was updated successfully, but these errors were encountered: