Overlap halo exchanges on vertical split #900

havogt · 2025-10-09T19:58:49Z

Verified in blueline with mch_ch2.
Note: based on add_some_backend_customization for no particular reason.

No idea how GHex knows that the it should exchange the horizontal and replicates the that for the second (aka vertical) dimension
Check profile if we actually overlap anything and then enough of the exchange

…stomization

msimberg · 2025-10-14T07:17:15Z

TL;DR: This might be useful once python overheads are reduced, but not right now.

Adding some notes from trying this out for reference:

Passing the sliced top and bottom half of the vn fields breaks the caching of GHEX patterns (changes from https://github.com/C2SM/icon4py/pull/873/files#diff-9f03ac545e4d8a40eb657bdfb51101d652953ad6d8ee50c4c7373389ec5896f2) which made the halo exchanges even more expensive (caching is based on a new object created when slicing instead of the underlying allocation + strides/offsets?). I temporarily worked around this by initializing the pattern in the cache before entering the halo exchange region just to see the potential effect.
The program setup cost is now doubled and this is the primary reason this still leads to big gaps in the profile during halo exchanges.
I needed to add a second communication object to make sure the two half exchanges could actually be done independently.
I split up the halo exchange into "post recv + launch pack kernels" and "post send + launch unpack kernels". The first works well because it doesn't block on anything. The latter should likely be split up further to separate the send and unpack, but it was easier to do a quick test with them combined.
The slicing of the vn field to pass it to the exchange is also expensive so can't be done on demand.

Edit:

I disabled the thread pool used to schedule the exchanges asynchronously. The overhead from moving work to the pool and waiting for work from the pool was much higher than any of the other overheads.

This reverts commit b78106c.

github-actions · 2025-11-14T09:13:53Z

Mandatory Tests

Please make sure you run these tests via comment before you merge!

cscs-ci run default

Optional Tests

To run benchmarks you can use:

cscs-ci run benchmark-bencher

To run tests and benchmarks with the DaCe backend you can use:

cscs-ci run dace

To run test levels ignored by the default test suite (mostly simple datatest for static fields computations) you can use:

cscs-ci run extra

For more detailed information please look at CI in the EXCLAIM universe.

edopao and others added 17 commits October 1, 2025 13:22

switch to gt4py main

31e767b

Merge branch 'main' into blueline_integration

2f4692b

Enable custom backends for blueline

b4bd3e4

improve typing

055051d

fix diffusion

559106e

cleanup and fix allocator/backend

03e6a46

dace default, gtfn for vertically implicit

3b237a5

from measurement

6c6f9df

customize one

33fe48c

fix forwarding

4a3ad8f

cleanup

cde438c

Merge remote-tracking branch 'upstream/main' into add_some_backend_cu…

575cb1c

…stomization

fix log message

cede653

overlap experiment

fe21f91

swap exchange<->wait

3880d1a

fix comment

43cc58b

run exchange async

9554aad

havogt requested a review from msimberg October 9, 2025 19:59

Base automatically changed from add_some_backend_customization to main October 28, 2025 12:05

havogt added 10 commits November 10, 2025 10:14

Use experimental GHEX async scheduling

8085549

Merge remote-tracking branch 'upstream/main' into exchange_overlap

51099e9

cleanup

621517c

Merge remote-tracking branch 'upstream/main' into async_ghex

6a93895

Merge branch 'async_ghex' into exchange_overlap_mpi

e94560b

fix domain construction

64ba54c

fix domain

8b6ba50

cache half-fields

6cc3f53

try set_sync_marker

b78106c

fix return

ca23bb4

havogt added 2 commits November 11, 2025 23:21

Revert "try set_sync_marker"

4356e7a

This reverts commit b78106c.

Merge remote-tracking branch 'upstream/main' into exchange_overlap

293ab4e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Overlap halo exchanges on vertical split #900

Overlap halo exchanges on vertical split #900

Uh oh!

havogt commented Oct 9, 2025 •

edited

Loading

Uh oh!

msimberg commented Oct 14, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Nov 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Overlap halo exchanges on vertical split #900

Are you sure you want to change the base?

Overlap halo exchanges on vertical split #900

Uh oh!

Conversation

havogt commented Oct 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

msimberg commented Oct 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Nov 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

havogt commented Oct 9, 2025 •

edited

Loading

msimberg commented Oct 14, 2025 •

edited

Loading