Skip to content

Conversation

nicolasaunai
Copy link
Contributor

This PR removes a condition that when is met wrongly prevents the calculation of an overlap.
In RefineSchedule, the code loops over level source box which box may overlap the fill box, creating a test_mask to pass to the user calculateOverlap() method.

If the destination fill box and the transformed source box do not overlap, test mask is empty, but data could live on border in which case the code tries to grow by one cell the destination fill box to see if border data may overlap the source.

However, in the current code, this growth is only happening if:

1- the ghost width is 0
OR
2- the fill box is the same as the destination box.

I guess that if using a default PatchLevelFillPattern the fill box is the destination ghost box (at least that's what we see) and so in these case conditions 1 and 2 are the same.

We have ghost width of 2 cells, the condition is therefore not met.

This happens to be a problem in some (rare?) cases. In regridding, we see sometimes the destination ghost box border aligned with one of the old level box border and for face data the overlap is not calculated.

See the following situation

image

On the left are old level patches (11,12,13,14,15) in red. On the right some new level patches 16 and 18.
patch 16 has its ghost box (ghost width = 2) represented on both panels.

As you can see, most of 16's ghost box overlaps old level patches, except from some part at its bottom.
The right face of the ghost box aligns with the right face of old patches 13 and 14, and on the right face of patch 15.

Where it intersects 13 and 14, the right border of 16 is well copied from 13 and 14 face values, respectively.
However, when it intersects only the right face of 15, the overlap is not calculated.

The reason is that for 13 and 14 overlaps, the first test_mask is not empty, since a lot of 16 overlaps 13 and 14.
However, when intersecting 16 and 15, the test mask is empty. And while data lives on border, the destination fill box is not grown since we have a non-zero ghost cell width and our ghost destination fill box is not equal to the destination box.

Below is a zoom-in view on the nodes of interest :

Capture d’écran 2025-06-27 à 09 52 02

test mask thus remains empty and calculateOverlap() is not called.
As a result, these nodes remain in unfilled boxes, and will get their values from refinement.

However, on the new level neighbor patch 18, these nodes will be copied from old patch 15, resulting in shared border nodes having different values on new level patches 16 and 18.

Removing the inner condition as proposed in this PR allows the growth of the destination box, making a non empty source mask and calling calculateOverlap.

@nselliott
Copy link
Collaborator

nselliott commented Jul 31, 2025

I need to evaluate this some more and see if this breaks any of our tests. Your analysis of what happens on the nodes highlighted in the last figure looks correct to me. I thought there was something implemented years ago to address just this type of thing in a different way, but I can't find it in the code right now.

A workaround that is probably unsatisfactory would be to subsequently execute a RefineSchedule from the new level to itself with no coarser level, and with that any nodes that are colocated between one patch's interior and another patch's ghosts would be overwritten with the interior value and become consistent. But that adds a new communication step just to clean up some error that the previous schedule created.

@nicolasaunai
Copy link
Contributor Author

Thanks for this answer. For the moment we indeed do the workaround with the single level schedule indeed.

@nicolasaunai
Copy link
Contributor Author

Adding a single level Schedule on the new level is currently what we do and that indeed helps.
I must admit the condition I erased in this PR is a bit obscure to me anyway. Why would we want to try to grow the box only if the ghost width is zero or the destination is the same as the fill box (which for most cases is the same as gw=0)

@nicolasaunai
Copy link
Contributor Author

nicolasaunai commented Aug 9, 2025

After further analysis it seems running a single level schedule is not a viable workaround in general.
It only solves the problem when the discrepancy occurs between the interior node of a patch on the new level and overlapped neighbor ghost nodes. Indeed in this case, the ghost nodes will get neighbor interior values.

However, the issue can also arise in situations where the resulting mismatch occurs between two neighbor patch level ghost nodes. And our tests indicate this actually occurs pretty easily. See the following:

image

Above figures shows a zoom in the hierarchy before (left) and after (right) regridding.
It's a two level hierarchy, L0 and L1. Patch A is on L1, its neighbor patch E is too.
Patches appearing below E and A are on L0.
The white dashed lines is the portion of A's ghost box.
patch B is on L0.
Color indicates MPI rank.

Patch A's ghost box overlaps old level's patches D, and C only at its border.
Therefore, the two nodes represented by the red markers should be obtained by copy from C.
But SAMRAI (on develop branch) fails to give us that overlap for reasons explained above.
Thus these nodes are obtained from refinement from B.

From E's perspective however, the ghost box overlaps C more, so that for E, these two nodes are obtained from C.

These two nodes thus get different values for A and E.
But as you can see, on the new level, they are both level ghost nodes, thus running a single level schedule will not solve this.

Running a "regular" schedule, that copies and refines, could, however, make both E and A refining from L0 to agree on the value they get. But that would discard pristine old level values that E was right to copy.

The version PR-ed gives this overlap to our code and the mismatch does not occur.

@nselliott
Copy link
Collaborator

We are likely to accept a version of this change, but please be patient as we verify that it will not cause a problem for other users.

To answer a question about why this is there, I believe it stems from the original SAMRAI design assuming that regridding would only move data to the interiors of the new level's patches, and then ghosts would be filled using a full boundary-filling RefineSchedule. As you have demonstrated, this doesn't have to be the case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants