Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slate codegen improvement #2262

Open
sv2518 opened this issue Oct 29, 2021 · 0 comments
Open

Slate codegen improvement #2262

sv2518 opened this issue Oct 29, 2021 · 0 comments
Assignees

Comments

@sv2518
Copy link
Contributor

sv2518 commented Oct 29, 2021

We should optimise code generation for transposes on Slate tensors by inlining the reversed indices into the local assembly kernel.

Lawrence words:

Right now, a Tensor(form) is translated by the slate compiler into something "opaque" I think. This is because compile_form spits back a loopy kernel object.

But think about what Transpose(Tensor(form)) does, it calls the generated loopy kernel to populate a tensor A (say), and then makes more gem that is CT(A[i, j], (j, i)).

If the gem that creates A were still around, we could inline that indexing transpose right into the kernel from tsfc. This would be morally equivalent to turning Transpose(Tensor(form)) into compile_form(form, transpose=True) (if compile_form had a transpose operation).

Perhaps this is already done because Transpose(Tensor(form)) can be turned into Tensor(adjoint(form)) and maybe you do this right now?

So what this is saying is that rather than:


1) gem_expr_for_form <- TSFC generates GEM from ufl form associated with Slate tensor
2) loopy_kernel_for_form <- TSFC generate loopy from gem_expr_for_form
3) gem_expr_for_slateops <- Slate compiler generates GEM for Slate operations
4) loopy_kernel_for_slateops <- TSFC generates loopy from gem_expr_for_slateops

where DiagonalTensors and Transposes are dealt with in step 3, we want to do

1) gem_expr_for_form <- TSFC generates GEM also from all modified terminals (Diagonals and Transposes e.g.)
2) loopy_kernel_for_form <- TSFC generate loopy from gem_expr_for_form
3) gem_expr_for_slateops <- Slate compiler generates GEM for Slate operations
4) loopy_kernel_for_slateops <- TSFC generates loopy from gem_expr_for_slateops

where DiagonalTensors and Transposes and so forth are dealt with in step 1 and step 3) and 4) only apply if there are more Slate operations on top of transposes.

@sv2518 sv2518 changed the title Slate optimisation Slate codegen improvement Oct 29, 2021
@sv2518 sv2518 self-assigned this Oct 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant