Skip to content

Conversation

@havogt
Copy link
Contributor

@havogt havogt commented Oct 9, 2025

Verified in blueline with mch_ch2.
Note: based on add_some_backend_customization for no particular reason.

  • No idea how GHex knows that the it should exchange the horizontal and replicates the that for the second (aka vertical) dimension
  • Check profile if we actually overlap anything and then enough of the exchange

@havogt havogt requested a review from msimberg October 9, 2025 19:59
@msimberg
Copy link
Contributor

msimberg commented Oct 14, 2025

TL;DR: This might be useful once python overheads are reduced, but not right now.

Adding some notes from trying this out for reference:

  • Passing the sliced top and bottom half of the vn fields breaks the caching of GHEX patterns (changes from https://github.com/C2SM/icon4py/pull/873/files#diff-9f03ac545e4d8a40eb657bdfb51101d652953ad6d8ee50c4c7373389ec5896f2) which made the halo exchanges even more expensive (caching is based on a new object created when slicing instead of the underlying allocation + strides/offsets?). I temporarily worked around this by initializing the pattern in the cache before entering the halo exchange region just to see the potential effect.
  • The program setup cost is now doubled and this is the primary reason this still leads to big gaps in the profile during halo exchanges.
  • I needed to add a second communication object to make sure the two half exchanges could actually be done independently.
  • I split up the halo exchange into "post recv + launch pack kernels" and "post send + launch unpack kernels". The first works well because it doesn't block on anything. The latter should likely be split up further to separate the send and unpack, but it was easier to do a quick test with them combined.
  • The slicing of the vn field to pass it to the exchange is also expensive so can't be done on demand.

Edit:

  • I disabled the thread pool used to schedule the exchanges asynchronously. The overhead from moving work to the pool and waiting for work from the pool was much higher than any of the other overheads.

Base automatically changed from add_some_backend_customization to main October 28, 2025 12:05
@github-actions
Copy link

Mandatory Tests

Please make sure you run these tests via comment before you merge!

  • cscs-ci run default

Optional Tests

To run benchmarks you can use:

  • cscs-ci run benchmark-bencher

To run tests and benchmarks with the DaCe backend you can use:

  • cscs-ci run dace

To run test levels ignored by the default test suite (mostly simple datatest for static fields computations) you can use:

  • cscs-ci run extra

For more detailed information please look at CI in the EXCLAIM universe.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants