Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Direct Connection] Group Contributions (probably) should not be summed #355

Open
kaushikcfd opened this issue Oct 13, 2022 · 2 comments
Open

Comments

@kaushikcfd
Copy link
Collaborator

I was looking at the generated expression for the direct connection expression and it is of the form:

        _pt_temp_1[idof_ensm0 + 4 * iel_ensm0 + 1024 * iface_ensm0] =
          (from_el_present[iel_ensm0 + 256 * iface_ensm0] ?
            normal_1_b_all[from_el_indices[iel_ensm0 + 256 * iface_ensm0]] * 0.5 * (_pt_part_ph_id_0[4 * from_el_indices[iel_ensm0 + 256 * iface_ensm0] + _pt_data_3[idof_ensm0 + 4 * iel_ensm0 + 1024 * iface_ensm0]] + -1.0 * _pt_part_ph_id_0[4 * from_el_indices[iel_ensm0 + 256 * iface_ensm0] + _pt_data_3[idof_ensm0 + 4 * iel_ensm0 + 1024 * iface_ensm0]])
            : 0.0)
          + (from_el_present_0[iel_ensm0 + 256 * iface_ensm0] ?
              _pt_part_ph_id_1[4 * from_el_indices_0[iel_ensm0 + 256 * iface_ensm0] + _pt_data_4[idof_ensm0 + 4 * iel_ensm0 + 1024 * iface_ensm0]] * normal_1_b_face_restr_interior[from_el_indices_0[iel_ensm0 + 256 * iface_ensm0]]
              : 0.0)
          + (from_el_present_1[iel_ensm0 + 256 * iface_ensm0] ?
              cse[4 * from_el_indices_2[iel_ensm0 + 256 * iface_ensm0] + _pt_data_6[idof_ensm0 + 4 * iel_ensm0 + 1024 * iface_ensm0]] * normal_1_b_BTAG_PARTITION[from_el_indices_2[iel_ensm0 + 256 * iface_ensm0]]
              : 0.0);

i.e. it is of the form (A if B else 0) + (C if D else 0) + (E if F else 0), but I think the optimized way of writing this would be A if B else (C if D else (E if F else 0)), notice how this could save us some conditional computation i.e. global memory reads.

@kaushikcfd
Copy link
Collaborator Author

kaushikcfd commented Oct 13, 2022

On some more thought I think the current way of summing the contributions is too global memory heavy, instead storing the mapping into a single array should be more efficient:

A[.., ...] if which_term[iel,idof]==0 else (B[..., ...] if which_term[iel,idof]==1 else 0)

This should significantly decrease the global memory footprint of the expression. (I think)

@inducer
Copy link
Owner

inducer commented Oct 14, 2022

I agree the sum is not lovely.

As long as none of the intermediates are materialized, the two things I can see wrong with it are

  • The from_el_present are likely avoidable
  • The from_el_indices_2 are bigger than they need to be

Is that your sense as well?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants