-
Notifications
You must be signed in to change notification settings - Fork 319
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add design doc for inactive top cells #811
base: ocean/develop
Are you sure you want to change the base?
Add design doc for inactive top cells #811
Conversation
@cbegeman, @dengwirda, @mark-petersen, @philipwjones, @mattdturner, I'm looking for feedback on this late this week or early next week if possible. |
|
||
* [ ] ``ocn_validate_state()`` | ||
|
||
... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This list is very much incomplete and I'll be fleshing it out in the coming days.
|
||
* [ ] ``ocn_GM_compute_Bolus_velocity()`` | ||
|
||
... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This list is also incomplete but less urgent.
|
||
* [ ] ``landIceFrictionVelocity`` | ||
|
||
... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll be adding surface variables like these to this list as I notice them.
* ``SMS.T62_oQU120_ais20.MPAS_LISIO_TEST.anvil_intel`` | ||
* ``<<help>>`` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mattdturner, @mark-petersen and @philipwjones, I could use recommendations on other E3SM tests to make sure we include. We need pretty good code coverage to make sure we don't break anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to see testing with all production resolutions and comp sets as well. I don't know the exact test setup lines, pinging @jonbob too for his input on tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vanroekel and @jonbob, could you help us make a list of tests we need to run? We'd like to do this testing this week and put in a PR?
@sbrus89, @vanroekel, @maltrud, @jonbob, mostly just giving you a heads up that we're planning to do a pretty disruptive thing here. But also welcome your feedback. Please ping anyone else that might be affected. |
@xylar thanks for the ping. I took a quick read and it all seems pretty reasonable to me. For me as long as you hit the BFB with no inactive cells and no performance hit I'm good with the changes. A broader comment though. I think I recall a conversation with @philipwjones that he has a preference to move away from maxLevelCell to a loop over full depth with a mask to discount the inactive cells. Phil, please correct me if I'm misremembering. I'm wondering @xylar if you've thought about a time evolving mask to hit the design requirements. |
@vanroekel, I'll wait to comment too much further until @philipwjones responds. But my assessment would be that a mask would not have too many advantages and would be harder to implement. We would still need the min and max indices to get the top and bottom of the ocean (without having to iterate though the mask). We would never want inactive cells in the middle of the water column for ice-shelf cavities and I don't think we would ever want this for any other purpose either. |
* How time-consuming will it be to call | ||
``ocn_init_routines_compute_min_max_level()`` each time ``minLevelCell`` has | ||
changed? Is there anything we want to do now to make sure it is efficient? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't think of any changes for the Phase 1 stage. I imagine for Phase 2 we could designate block(s) of cells that needs updating (likely in a separate regrid routine).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. This subroutine doesn't look as heavy as I was afraid. But we might still hesitate to run it every time step. I just don't want us to lose track of this long-term need.
first (potentially inactive) cell. Similarly, the distribution of surface | ||
fluxes into the water column must begin from the top with the first active | ||
cell, not the first (potentially inactive) cell. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This requirement is probably too simplistic. The actual requirement is that surface forcing is applied correctly in all cases. This may not be simply making sure it's applied at the top of the active column. When applying the forcing over a depth range, it's not clear what that depth range should be or whether we want the same depth range under ice shelves. Presumably the coupler takes care of part of this by attenuating/masking some fluxes at the ice/ocean interface. But the point is that it's not going to be just a simple shifting of forcing to the top active level and some thought is going to be required for some forcing options. This may also be true of some vertical mixing parameterizations and options.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@philipwjones, thanks for the comment. I think these concerns would still apply to the terrain-following top coordinate that we currently use. I agree with you that KPP and other parameterizations might deserve a different treatment under ice shelves, but I think we want to treat that as independent of the top index, and this is something we already have to handle. So it isn't clear to me how that would affect this design.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xylar for KPP there are a bunch of hard coded 1 indices in the construction of the depth coordinate and surface layer averaging (see here for example - https://github.com/MPAS-Dev/MPAS-Model/blob/ocean/develop/src/core_ocean/shared/mpas_ocn_vmix_cvmix.F#L506-L534). This could be problematic if the work is just switching loop bounds, but pretty easy to deal with in my view. Something like
do i=1,nEdgesOnCell(iCell)
iEdge = edgesOnCell(iCell)
deltaVelocitySquared(minLevelEdge(iEdge))
...
do kIndexOBL = minLevelEdge(iEdge)+1,maxLevelelCell(iCell)
So I do think the change is relevant to be aware of here as it is a bit separate from the current terrain following formulation considerations in KPP, but really easy to take care of in implementation. I think the same would hold for GM routines that have this type of structure.
I don't think we should make too many changes specific to ice shelf cavities, but either way that as you say @xylar is separate from this design.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vanroekel, yes, we definitely plan to make those changes like you are describing (not just loop bounds). That's fully within the scope of this current work (though KPP is a low priority within that project).
I wanted to have "surface" fields and fluxes as a separate requirement precisely because it's easy to miss them when you're searching for loop bounds. The KPP example you show is exactly a case like that and what you propose is exactly what I was thinking, too.
first (potentially inactive) cell. Similarly, the distribution of surface | ||
fluxes into the water column must begin from the top with the first active | ||
cell, not the first (potentially inactive) cell. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably also need a requirement that boundary conditions are applied correctly as well. Thinking here about higher-order vertical interpolants in routines like tracer advection. Not only are loop limits changed, but some new edge cases appear if max-min is not large enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a very good point, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is going to be hard to fully address in this particular part of the design. Some of the work on preventing pathological topography (e.g. 2 adjacent cells where each has the minimum allowed number of layers, say 3, but the layers don't overlap at all or don't overlap enough in the vertical to allow any transport across that edge) would have to be on the side of creating and maintaining the vertical coordinate itself, which is beyond the scope of this design.
Regarding performance: Recalculating masks and index limits every step is a relatively small cost. But dynamic indices might have other implications for both restarts and subcycling/iteration if you're going to do it every step. BTW, we also convert the current integer masks to double in the ocn_mesh module to save the (sometimes non-trivial) type conversion costs when used as a multiplicative mask in loops. We really need to get rid of the integer forms. And Luke is correct that the form:
is not ideal for either vector architectures or GPUs. Using a multiplicative mask is one approach and is already being used here and there in MPAS, namely:
This results in a nice tightly nested loop that can be collapsed for more parallelism on GPU and is easier to vectorize. Also better load balance since you're doing work everywhere. But it does result in some performance degradation on current CPUs due to the extra work being performed at non-ocean points, so we have to decide on a loop basis. As Xylar noted, the JM EOS routine already does this (without a mask - just computes everywhere) because we were able to gain enough performance in the loop itself so giving up a little performance for the extra work was fine. Matt Norman has found that one compromise is to place the conditional just inside the nested loop like:
which you might think should be equivalent to putting m**_level_cell in the loop limits, but compilers tend to treat better, especially when using directives for threading and GPUs. The loops can still be collapsed and create parallel threads and the compiler just throws away results where the condition is not satisfied. The conditional should be around the whole loop body to gain this advantage. Anyway, I guess the point is that having min/max as loop limits is the less ideal form. If you can get away with minimal performance impact for the mask form, that's probably best. But the conditional form may get the best compromise. Also, make nVertLevels a multiple of 4 or 8 or even 32/64 to match vector lengths, GPU warps, etc. |
@philipwjones, that's great stuff! Presumably, the form you recommended as a compromise (condition just inside the double loop) isn't necessarily going to produce bit-for-bit identical results to the current implementation, right? What would your approach be if that's the case? I've heard you and @mattdturner talk about compiling in debug mode to get bit-for-bit when that's not the case for optimized. What does that process look like? |
If I think code should be BFB, but it fails some BFB checks in the nightly test suite in MPAS standalone then I re-generate the baseline and test data with an executable where I compiled with Testing for BFB w/ debug has been helpful in a few cases where the changes that have been made to the code allow the compiler to make additional (or just different) optimizations. By removing all compiler optimizations, it gives a cleaner 1-to-1 comparison of the code before and after modification. |
I forgot to mention that I will also sometimes run BFB checks in E3SM w/ Debug enabled. I don't recall if E3SM has many baseline datasets for Debug configurations, so this might entail generating a baseline with |
Thanks, @mattdturner. I assume those situations give you confidence that you haven't changed anything meaningfully but that you still have to treat the final PR as non-bit-for-bit into E3SM, since it will ultimately be built with optimization? |
Correct. If it is BFB with debug, then I'm usually willing to say that the changes are correct and the only differences that we see in the output are from compiler optimization. As long as the errors are near round-off, I don't investigate any deeper. Yes, the final E3SM PR needs to be treated as non-bit-for-bit at that point. But the BFB w/ Debug gives me enough confidence that the changes are not climate-changing. |
Perfect, got it. |
@xylar The loop optimizations above are often b4b even when optimized (and certainly b4b in debug), but on occasion the compiler does see an optimization that causes roundoff-level changes, so it's not guaranteed. |
Okay, great, that's very good to know. |
``ocn_forcing_build_fraction_absorbed_array`` is currently only called once | ||
by ``ocn_init_routines`` and would need to be called multiple times to correctly | ||
distribute surface fluxes unless we use an alternative approach where the vertical | ||
index of transmissionCoeff is number of cells from minLevelCell rather than k-levels |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cbegeman, I'm seeing this function getting called every time step here:
https://github.com/MPAS-Dev/MPAS-Model/blob/ocean/develop/src/core_ocean/mode_forward/mpas_ocn_forward_mode.F#L574
So I think even in the future when minLevelCell
changes in time, this should work. Did I misunderstand something?
Also, I think we would definitely want to make the transmission coefficient 100% (absorption 0) for cells above the first valid cell because we use this infrastructure for fluxes under ice shelves.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xylar Thanks for catching that. So it looks like we'd be fine on that front.
"Transmitting" the fluxes through the inactive cells sounds like a good safeguard, though if we change the loop limits in ocn_thick_surface_flux_tend
we shouldn't need to use transmission coefficient values above minLevelCell.
fbd93aa
to
390d688
Compare
On #811 (comment) there were three options proposed. For 1 or 3 we could easily switch back and forth between
If the spacing is identical, a sed script could change the full code base between option 3 and option 1 automatically, so we could easily test on different machines and compilers. |
Have you considered 'partial top cells', analogous to 'partial bottom cells'? It would make no difference in the forward model code changes, as we have a prognostic layer thickness equation. However, it does make a difference in the construction of the initial conditions. When constructing your initial domain with variable minLevelCell, you should have an option for partial or full top cell. The eventual goal is always partial top cells, to model the ice draft more realistically. But it is very useful to have a full cell option in your standard test suite. If you initialize with constant T,S horizontally and no surface forcing, a zero initial velocity should remain exactly zero with full top and bottom cells. You can then move on to partial top and bottom cells, where layers are effectively tilted. You then get a pressure gradient error. Velocity will be nonzero but should remain small. This is similar to the sea mount test, where the velocity is a measure of the error. Thus I would recommend a progression of settings in the
Since the dynamics of these tests are simply no forcing, an advantage for development is that you can use these exact set of tests, begin with all terms turned off (using |
@mark-petersen, thanks for the comments. I have implemented a z-level version of I have modified the config options for partial bottom cells to also apply to partial top cells, so this is indeed already supported in my test case. I agree that your suggested progression for testing sounds useful. I guess that testing would be after loops are fully implemented and all terms are turned on? |
No. That's the beauty of it. Right now, you could start with |
From @cbegeman on slack:
If the loop already goes to nVertLevels and uses a mask or other zero in the computation, then changing nVertLevels to maxLevelCell is really outside of the scope of this project. It would also imply that the beginning should remain at 1, not minLevelCell, if masking is appropriate. In reality, of course, some of these 1,nVertLevels might be oversights or poorly designed. Probably best to leave them, make sure the upper index is masked correctly, and then flag them (put uniform greppable comment line before it) and give them to the performance team to decide. The code should definitely be reviewed by @mattdturner or @philipwjones , so we can ask them to consider those cases. For example, put
as you go through. That is easier than tagging each one in the PR, and they are easy to remove later. |
@mark-petersen , @cbegeman , @mattdturner - the loop limit/ordering issue is turning out to be more variable then my initial response above would indicate. I've been running a bunch of experiments to try and come up with some better criteria or approaches and was hoping to have something yesterday, but should have something soon. If you need to start now, just start with the min/max loop limits and we can make a pass to modify later - don't know that tagging the loops is necessary as a grep on "do k" will pick up most of them. |
I'm glad to hear that. I think this tagging is going to be pretty annoying. As an example, I really dislike the |
* ``PET_Ln9_P1024.ne30_oECv3_ICG.A_WCYCL1850S.<<machine>>_gnu`` | ||
* ``PEM_Ln9_P1024.ne30_oECv3_ICG.A_WCYCL1850S.<<machine>>_intel`` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These tests no longer exist. Asking @jonbob for recommended replacements.
* ``SMS.T62_oQU120_ais20.MPAS_LISIO_TEST.<<machine>>_intel`` | ||
* ``SMS.T62_oQU120_ais20.MPAS_LISIO_TEST.<<machine>>_gnu`` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These tests didn't catch E3SM-Project/E3SM#4152 so we probably want at least one additional (oECv3
?) test against a baseline.
…develop Add the option for inactive top cells #825 Inactive top cells are introduced to add flexibility to the vertical coordinate below ice shelves. This PR introduces phase 1 in the development documented in #811, which touches only the code needed to run a basic ice shelf test case. The variable minLevelCell and related minLevelEdge* and minLevelVertex* variables are introduced by analogy with maxLevel* variables. By default, there are no inactive top cells and minLevel* = 1. The main code changes are to k-loop limits and designating surface quantities as those at minLevel*.
…n/develop Enhance inactive top cells feature #840 Here, we extend the capability for Inactive top cells introduced by #825 to apply to additional model configurations besides those needed to run a basic ice shelf test case. More detail about the scope of this development is documented in #811. The variable minLevelCell and related minLevelEdge* and minLevelVertex* variables are added by analogy with maxLevel* variables. By default, there are no inactive top cells and minLevel* = 1. The main code changes are to k-loop limits and designating surface quantities as those at minLevel*.
…next (PR #4219) Update mpas-source: Enhance inactive top cells feature Brings in a new mpas-source submodule with changes only to the ocean core. It extends the capability for inactive top cells introduced by MPAS-Dev/MPAS-Model#825 to apply to additional model configurations besides those needed to run a basic ice shelf test case. More detail about the scope of this development is documented in MPAS-Dev/MPAS-Model#811. [BFB]
…4219) Update mpas-source: Enhance inactive top cells feature Brings in a new mpas-source submodule with changes only to the ocean core. It extends the capability for inactive top cells introduced by MPAS-Dev/MPAS-Model#825 to apply to additional model configurations besides those needed to run a basic ice shelf test case. More detail about the scope of this development is documented in MPAS-Dev/MPAS-Model#811. [BFB]
@cbegeman, would you like to make changes directly on this branch so we can finalize this design doc an get it merged? I'm happy to also make any changes you would like me to take care of but at this point it would seem to make a lot more sense for you to put the final touches on this, since you know the implementation better than anyone else. If you accept my invitation to collaborate, you should have permission to push. |
8f451ae
to
d57b09c
Compare
I just rebased to take care of the conflict. |
Thank you, @xylar. I made some updates to the design doc but haven't addressed the I have a Phase 3 added with the init mode changes. We don't have much detail there, specifically whether we should commit to the global ocean test case with |
@cbegeman, Okay, it sounds like we probably should wait until phase 3 happens to merge this. I do like the I would agree that, for now, we can leave out the |
We are proposing to add a top index throughout MPAS-Ocean,
minLevelCell
. This would be a big change and will be rolled out quickly, hopefully starting Feb. 16th, to avoid disrupting too much work for to much time.