Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add design doc for inactive top cells #811

Open
wants to merge 10 commits into
base: ocean/develop
Choose a base branch
from

Conversation

xylar
Copy link
Collaborator

@xylar xylar commented Feb 11, 2021

We are proposing to add a top index throughout MPAS-Ocean, minLevelCell. This would be a big change and will be rolled out quickly, hopefully starting Feb. 16th, to avoid disrupting too much work for to much time.

@xylar xylar added Ocean Documentation only Changes that impact code comments and documentation, but no executable code labels Feb 11, 2021
@xylar
Copy link
Collaborator Author

xylar commented Feb 11, 2021

@cbegeman, @dengwirda, @mark-petersen, @philipwjones, @mattdturner, I'm looking for feedback on this late this week or early next week if possible.


* [ ] ``ocn_validate_state()``

...
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This list is very much incomplete and I'll be fleshing it out in the coming days.


* [ ] ``ocn_GM_compute_Bolus_velocity()``

...
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This list is also incomplete but less urgent.


* [ ] ``landIceFrictionVelocity``

...
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll be adding surface variables like these to this list as I notice them.

Comment on lines 437 to 438
* ``SMS.T62_oQU120_ais20.MPAS_LISIO_TEST.anvil_intel``
* ``<<help>>``
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mattdturner, @mark-petersen and @philipwjones, I could use recommendations on other E3SM tests to make sure we include. We need pretty good code coverage to make sure we don't break anything.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to see testing with all production resolutions and comp sets as well. I don't know the exact test setup lines, pinging @jonbob too for his input on tests.

Copy link
Collaborator Author

@xylar xylar Mar 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vanroekel and @jonbob, could you help us make a list of tests we need to run? We'd like to do this testing this week and put in a PR?

@xylar
Copy link
Collaborator Author

xylar commented Feb 11, 2021

@sbrus89, @vanroekel, @maltrud, @jonbob, mostly just giving you a heads up that we're planning to do a pretty disruptive thing here. But also welcome your feedback. Please ping anyone else that might be affected.

@vanroekel
Copy link
Contributor

@xylar thanks for the ping. I took a quick read and it all seems pretty reasonable to me. For me as long as you hit the BFB with no inactive cells and no performance hit I'm good with the changes.

A broader comment though. I think I recall a conversation with @philipwjones that he has a preference to move away from maxLevelCell to a loop over full depth with a mask to discount the inactive cells. Phil, please correct me if I'm misremembering. I'm wondering @xylar if you've thought about a time evolving mask to hit the design requirements.

@xylar
Copy link
Collaborator Author

xylar commented Feb 11, 2021

@vanroekel, I'll wait to comment too much further until @philipwjones responds. But my assessment would be that a mask would not have too many advantages and would be harder to implement. We would still need the min and max indices to get the top and bottom of the ocean (without having to iterate though the mask). We would never want inactive cells in the middle of the water column for ice-shelf cavities and I don't think we would ever want this for any other purpose either.

Comment on lines +336 to +607
* How time-consuming will it be to call
``ocn_init_routines_compute_min_max_level()`` each time ``minLevelCell`` has
changed? Is there anything we want to do now to make sure it is efficient?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't think of any changes for the Phase 1 stage. I imagine for Phase 2 we could designate block(s) of cells that needs updating (likely in a separate regrid routine).

Copy link
Collaborator Author

@xylar xylar Feb 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. This subroutine doesn't look as heavy as I was afraid. But we might still hesitate to run it every time step. I just don't want us to lose track of this long-term need.

first (potentially inactive) cell. Similarly, the distribution of surface
fluxes into the water column must begin from the top with the first active
cell, not the first (potentially inactive) cell.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This requirement is probably too simplistic. The actual requirement is that surface forcing is applied correctly in all cases. This may not be simply making sure it's applied at the top of the active column. When applying the forcing over a depth range, it's not clear what that depth range should be or whether we want the same depth range under ice shelves. Presumably the coupler takes care of part of this by attenuating/masking some fluxes at the ice/ocean interface. But the point is that it's not going to be just a simple shifting of forcing to the top active level and some thought is going to be required for some forcing options. This may also be true of some vertical mixing parameterizations and options.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@philipwjones, thanks for the comment. I think these concerns would still apply to the terrain-following top coordinate that we currently use. I agree with you that KPP and other parameterizations might deserve a different treatment under ice shelves, but I think we want to treat that as independent of the top index, and this is something we already have to handle. So it isn't clear to me how that would affect this design.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xylar for KPP there are a bunch of hard coded 1 indices in the construction of the depth coordinate and surface layer averaging (see here for example - https://github.com/MPAS-Dev/MPAS-Model/blob/ocean/develop/src/core_ocean/shared/mpas_ocn_vmix_cvmix.F#L506-L534). This could be problematic if the work is just switching loop bounds, but pretty easy to deal with in my view. Something like

do i=1,nEdgesOnCell(iCell)
  iEdge = edgesOnCell(iCell)
  deltaVelocitySquared(minLevelEdge(iEdge))
  ...
  do kIndexOBL = minLevelEdge(iEdge)+1,maxLevelelCell(iCell)

So I do think the change is relevant to be aware of here as it is a bit separate from the current terrain following formulation considerations in KPP, but really easy to take care of in implementation. I think the same would hold for GM routines that have this type of structure.

I don't think we should make too many changes specific to ice shelf cavities, but either way that as you say @xylar is separate from this design.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vanroekel, yes, we definitely plan to make those changes like you are describing (not just loop bounds). That's fully within the scope of this current work (though KPP is a low priority within that project).

I wanted to have "surface" fields and fluxes as a separate requirement precisely because it's easy to miss them when you're searching for loop bounds. The KPP example you show is exactly a case like that and what you propose is exactly what I was thinking, too.

first (potentially inactive) cell. Similarly, the distribution of surface
fluxes into the water column must begin from the top with the first active
cell, not the first (potentially inactive) cell.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably also need a requirement that boundary conditions are applied correctly as well. Thinking here about higher-order vertical interpolants in routines like tracer advection. Not only are loop limits changed, but some new edge cases appear if max-min is not large enough.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a very good point, thanks!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to be hard to fully address in this particular part of the design. Some of the work on preventing pathological topography (e.g. 2 adjacent cells where each has the minimum allowed number of layers, say 3, but the layers don't overlap at all or don't overlap enough in the vertical to allow any transport across that edge) would have to be on the side of creating and maintaining the vertical coordinate itself, which is beyond the scope of this design.

@philipwjones
Copy link
Contributor

philipwjones commented Feb 11, 2021

Regarding performance:

Recalculating masks and index limits every step is a relatively small cost. But dynamic indices might have other implications for both restarts and subcycling/iteration if you're going to do it every step. BTW, we also convert the current integer masks to double in the ocn_mesh module to save the (sometimes non-trivial) type conversion costs when used as a multiplicative mask in loops. We really need to get rid of the integer forms.

And Luke is correct that the form:

do iCell = 1,nCells
   do k=min_level_cell(iCell), max_level_cell(iCell)
      stuff
   end do
end do

is not ideal for either vector architectures or GPUs. Using a multiplicative mask is one approach and is already being used here and there in MPAS, namely:

do iCell=1,nCells
do k=1,nVertLevels
   stuff*cellMask(k,iCell)
end do
end do

This results in a nice tightly nested loop that can be collapsed for more parallelism on GPU and is easier to vectorize. Also better load balance since you're doing work everywhere. But it does result in some performance degradation on current CPUs due to the extra work being performed at non-ocean points, so we have to decide on a loop basis. As Xylar noted, the JM EOS routine already does this (without a mask - just computes everywhere) because we were able to gain enough performance in the loop itself so giving up a little performance for the extra work was fine.

Matt Norman has found that one compromise is to place the conditional just inside the nested loop like:

do iCell=1,nCells
do k=1,nVertLevels
   if (k >= min_level_cell(iCell) .and. k <= max_level_cell(iCell)) then
      stuff
   endif
end do
end do

which you might think should be equivalent to putting m**_level_cell in the loop limits, but compilers tend to treat better, especially when using directives for threading and GPUs. The loops can still be collapsed and create parallel threads and the compiler just throws away results where the condition is not satisfied. The conditional should be around the whole loop body to gain this advantage.

Anyway, I guess the point is that having min/max as loop limits is the less ideal form. If you can get away with minimal performance impact for the mask form, that's probably best. But the conditional form may get the best compromise.

Also, make nVertLevels a multiple of 4 or 8 or even 32/64 to match vector lengths, GPU warps, etc.

@xylar
Copy link
Collaborator Author

xylar commented Feb 11, 2021

@philipwjones, that's great stuff! Presumably, the form you recommended as a compromise (condition just inside the double loop) isn't necessarily going to produce bit-for-bit identical results to the current implementation, right? What would your approach be if that's the case? I've heard you and @mattdturner talk about compiling in debug mode to get bit-for-bit when that's not the case for optimized. What does that process look like?

@mattdturner
Copy link
Collaborator

I've heard you and @mattdturner talk about compiling in debug mode to get bit-for-bit when that's not the case for optimized. What does that process look like?

If I think code should be BFB, but it fails some BFB checks in the nightly test suite in MPAS standalone then I re-generate the baseline and test data with an executable where I compiled with DEBUG=true. On the rare occasion, I actually have to modify the Makefile to use -O0 for the particular target I am building in order to disable all optimizations (not all build targets specify -O0 for debug).

Testing for BFB w/ debug has been helpful in a few cases where the changes that have been made to the code allow the compiler to make additional (or just different) optimizations. By removing all compiler optimizations, it gives a cleaner 1-to-1 comparison of the code before and after modification.

@mattdturner
Copy link
Collaborator

I forgot to mention that I will also sometimes run BFB checks in E3SM w/ Debug enabled. I don't recall if E3SM has many baseline datasets for Debug configurations, so this might entail generating a baseline with master + Debug and then merging your code and running w/ Debug and compare to the baseline that you generated.

@xylar
Copy link
Collaborator Author

xylar commented Feb 11, 2021

Thanks, @mattdturner. I assume those situations give you confidence that you haven't changed anything meaningfully but that you still have to treat the final PR as non-bit-for-bit into E3SM, since it will ultimately be built with optimization?

@mattdturner
Copy link
Collaborator

Correct. If it is BFB with debug, then I'm usually willing to say that the changes are correct and the only differences that we see in the output are from compiler optimization. As long as the errors are near round-off, I don't investigate any deeper.

Yes, the final E3SM PR needs to be treated as non-bit-for-bit at that point. But the BFB w/ Debug gives me enough confidence that the changes are not climate-changing.

@xylar
Copy link
Collaborator Author

xylar commented Feb 11, 2021

Perfect, got it.

@philipwjones
Copy link
Contributor

@xylar The loop optimizations above are often b4b even when optimized (and certainly b4b in debug), but on occasion the compiler does see an optimization that causes roundoff-level changes, so it's not guaranteed.

@xylar
Copy link
Collaborator Author

xylar commented Feb 11, 2021

Okay, great, that's very good to know.

Comment on lines 325 to 328
``ocn_forcing_build_fraction_absorbed_array`` is currently only called once
by ``ocn_init_routines`` and would need to be called multiple times to correctly
distribute surface fluxes unless we use an alternative approach where the vertical
index of transmissionCoeff is number of cells from minLevelCell rather than k-levels
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cbegeman, I'm seeing this function getting called every time step here:
https://github.com/MPAS-Dev/MPAS-Model/blob/ocean/develop/src/core_ocean/mode_forward/mpas_ocn_forward_mode.F#L574
So I think even in the future when minLevelCell changes in time, this should work. Did I misunderstand something?

Also, I think we would definitely want to make the transmission coefficient 100% (absorption 0) for cells above the first valid cell because we use this infrastructure for fluxes under ice shelves.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xylar Thanks for catching that. So it looks like we'd be fine on that front.

"Transmitting" the fluxes through the inactive cells sounds like a good safeguard, though if we change the loop limits in ocn_thick_surface_flux_tend we shouldn't need to use transmission coefficient values above minLevelCell.

@xylar xylar force-pushed the ocean/min_level_cell_design_doc branch from fbd93aa to 390d688 Compare February 16, 2021 15:21
@mark-petersen
Copy link
Contributor

On #811 (comment) there were three options proposed. For 1 or 3 we could easily switch back and forth between

do iCell=1,nCells
do k=1,nVertLevels   ! vertical loop do
   if (k >= min_level_cell(iCell) .and. k <= max_level_cell(iCell)) then    ! vertical loop if
      stuff
   endif  ! vertical loop endif
end do
end do

If the spacing is identical, a sed script could change the full code base between option 3 and option 1 automatically, so we could easily test on different machines and compilers.

@mark-petersen
Copy link
Contributor

Have you considered 'partial top cells', analogous to 'partial bottom cells'? It would make no difference in the forward model code changes, as we have a prognostic layer thickness equation. However, it does make a difference in the construction of the initial conditions.

When constructing your initial domain with variable minLevelCell, you should have an option for partial or full top cell. The eventual goal is always partial top cells, to model the ice draft more realistically. But it is very useful to have a full cell option in your standard test suite.

If you initialize with constant T,S horizontally and no surface forcing, a zero initial velocity should remain exactly zero with full top and bottom cells. You can then move on to partial top and bottom cells, where layers are effectively tilted. You then get a pressure gradient error. Velocity will be nonzero but should remain small. This is similar to the sea mount test, where the velocity is a measure of the error.

Thus I would recommend a progression of settings in the sub_ice_shelf_2D test. All have no surface forcing.

  1. T,S uniform in 3D. Full top and bottom cells. Verify min and max T,S remain exactly constant, velocity remains exactly zero.
  2. T,S horizontally uniform, stably stratified in vertical. Disable vertical tracer mixing. Full top and bottom cells. Verify T,S remains unchanged, velocity remains exactly zero.
  3. Same as (2) but partial top and bottom cells. Maximum velocity is error and should grow slowly at top and bottom (e.g. 1e-6m/s after a day).

Since the dynamics of these tests are simply no forcing, an advantage for development is that you can use these exact set of tests, begin with all terms turned off (using disable flags), and turn each term on as you add minLevelCell. Expected results will be the same (except rate of error growth in (3).

@xylar
Copy link
Collaborator Author

xylar commented Feb 16, 2021

@mark-petersen, thanks for the comments. I have implemented a z-level version of sub_ice_shelf_2D Here:
https://github.com/xylar/compass/tree/add_z_level_sub_ice_shelf_2D
and here
https://github.com/xylar/MPAS-Model/tree/ocean/add_z_level_sub_ice_shelf_2D

I have modified the config options for partial bottom cells to also apply to partial top cells, so this is indeed already supported in my test case.
https://github.com/xylar/MPAS-Model/blob/ocean/add_z_level_sub_ice_shelf_2D/src/core_ocean/mode_init/Registry.xml#L26-L39

https://github.com/xylar/MPAS-Model/blob/ocean/add_z_level_sub_ice_shelf_2D/src/core_ocean/mode_init/mpas_ocn_init_vertical_grids.F#L660-L712

I agree that your suggested progression for testing sounds useful. I guess that testing would be after loops are fully implemented and all terms are turned on?

@mark-petersen
Copy link
Contributor

I guess that testing would be after loops are fully implemented and all terms are turned on?

No. That's the beauty of it. Right now, you could start with z-level sub_ice_shelf_2D, disable ALL terms in all three equations (momentum, thickness, tracer), and it will pass tests 1, 2, 3 above and T,S,u remain constant. Then you add minLevelCell to one term: say del2 tracer. You enable that single term. It will again pass 1, 2, 3 with T,S,u constant in time. As you add each term, enable it, run these three tests! Test 3 will show non-zero velocity only when you turn on the pressure gradient, using a nonlinear eos.

@xylar xylar added the Ocean label Feb 18, 2021
@mark-petersen
Copy link
Contributor

From @cbegeman on slack:

We haven’t discussed whether we ought to fix k-limits (e.g., 1:nVertLevels to minLevelCell:maxLevelCell) whenever it seems appropriate to limit computations or only when it is necessary to avoid errors in the solution. I’m guessing the answer is the latter, in which case, should I add comments to flag these instances of potentially unnecessary computations?

If the loop already goes to nVertLevels and uses a mask or other zero in the computation, then changing nVertLevels to maxLevelCell is really outside of the scope of this project. It would also imply that the beginning should remain at 1, not minLevelCell, if masking is appropriate.

In reality, of course, some of these 1,nVertLevels might be oversights or poorly designed. Probably best to leave them, make sure the upper index is masked correctly, and then flag them (put uniform greppable comment line before it) and give them to the performance team to decide. The code should definitely be reviewed by @mattdturner or @philipwjones , so we can ask them to consider those cases.

For example, put

! nVertLevels loop, performance team review requested
for k = 1,nVertLevels

as you go through. That is easier than tagging each one in the PR, and they are easy to remove later.

@philipwjones
Copy link
Contributor

@mark-petersen , @cbegeman , @mattdturner - the loop limit/ordering issue is turning out to be more variable then my initial response above would indicate. I've been running a bunch of experiments to try and come up with some better criteria or approaches and was hoping to have something yesterday, but should have something soon. If you need to start now, just start with the min/max loop limits and we can make a pass to modify later - don't know that tagging the loops is necessary as a grep on "do k" will pick up most of them.

@xylar
Copy link
Collaborator Author

xylar commented Feb 18, 2021

don't know that tagging the loops is necessary as a grep on "do k" will pick up most of them.

I'm glad to hear that. I think this tagging is going to be pretty annoying. As an example, I really dislike the !{{{ !}}} all over for vim folds. (I also really dislike vim folds!)

Comment on lines 757 to 758
* ``PET_Ln9_P1024.ne30_oECv3_ICG.A_WCYCL1850S.<<machine>>_gnu``
* ``PEM_Ln9_P1024.ne30_oECv3_ICG.A_WCYCL1850S.<<machine>>_intel``
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests no longer exist. Asking @jonbob for recommended replacements.

Comment on lines 750 to 751
* ``SMS.T62_oQU120_ais20.MPAS_LISIO_TEST.<<machine>>_intel``
* ``SMS.T62_oQU120_ais20.MPAS_LISIO_TEST.<<machine>>_gnu``
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests didn't catch E3SM-Project/E3SM#4152 so we probably want at least one additional (oECv3?) test against a baseline.

mark-petersen added a commit that referenced this pull request Mar 22, 2021
…develop

Add the option for inactive top cells #825

Inactive top cells are introduced to add flexibility to the vertical
coordinate below ice shelves.

This PR introduces phase 1 in the development documented in #811, which
touches only the code needed to run a basic ice shelf test case.

The variable minLevelCell and related minLevelEdge* and minLevelVertex*
variables are introduced by analogy with maxLevel* variables. By
default, there are no inactive top cells and minLevel* = 1. The main
code changes are to k-loop limits and designating surface quantities as
those at minLevel*.
mark-petersen added a commit that referenced this pull request Apr 14, 2021
…n/develop

Enhance inactive top cells feature #840

Here, we extend the capability for Inactive top cells introduced by #825
to apply to additional model configurations besides those needed to run
a basic ice shelf test case. More detail about the scope of this
development is documented in #811.

The variable minLevelCell and related minLevelEdge* and minLevelVertex*
variables are added by analogy with maxLevel* variables. By default,
there are no inactive top cells and minLevel* = 1. The main code changes
are to k-loop limits and designating surface quantities as those at
minLevel*.
jonbob added a commit to E3SM-Project/E3SM that referenced this pull request Apr 20, 2021
…next (PR #4219)

Update mpas-source: Enhance inactive top cells feature

Brings in a new mpas-source submodule with changes only to the ocean
core. It extends the capability for inactive top cells introduced by
MPAS-Dev/MPAS-Model#825 to apply to additional model configurations
besides those needed to run a basic ice shelf test case. More detail
about the scope of this development is documented in
MPAS-Dev/MPAS-Model#811.

[BFB]
jonbob added a commit to E3SM-Project/E3SM that referenced this pull request Apr 21, 2021
…4219)

Update mpas-source: Enhance inactive top cells feature

Brings in a new mpas-source submodule with changes only to the ocean
core. It extends the capability for inactive top cells introduced by
MPAS-Dev/MPAS-Model#825 to apply to additional model configurations
besides those needed to run a basic ice shelf test case. More detail
about the scope of this development is documented in
MPAS-Dev/MPAS-Model#811.

[BFB]
@xylar
Copy link
Collaborator Author

xylar commented Apr 21, 2021

@cbegeman, would you like to make changes directly on this branch so we can finalize this design doc an get it merged? I'm happy to also make any changes you would like me to take care of but at this point it would seem to make a lot more sense for you to put the final touches on this, since you know the implementation better than anyone else.

If you accept my invitation to collaborate, you should have permission to push.

@xylar xylar force-pushed the ocean/min_level_cell_design_doc branch from 8f451ae to d57b09c Compare April 21, 2021 15:35
@xylar
Copy link
Collaborator Author

xylar commented Apr 21, 2021

I just rebased to take care of the conflict.

@cbegeman
Copy link

Thank you, @xylar. I made some updates to the design doc but haven't addressed the isomip_plus test case I ran because I'm not sure how relevant it is at this point.

I have a Phase 3 added with the init mode changes. We don't have much detail there, specifically whether we should commit to the global ocean test case with minLevelCell=2 and the number of active nVertLevels unchanged. I feel that it's worth doing as it's the best test I can think of for uncovering bugs. Open to suggestions though.

@xylar
Copy link
Collaborator Author

xylar commented Apr 22, 2021

@cbegeman, Okay, it sounds like we probably should wait until phase 3 happens to merge this.

I do like the minLevelCell=2 idea, that seems fine for catching a number of issues.

I would agree that, for now, we can leave out the isomip_puls result. It gave is some confidence that we can run with minLevelCell > 1 but isn't a super solid proof of concept yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Documentation only Changes that impact code comments and documentation, but no executable code in progress Ocean
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants