Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

allocate c and cd var in evp, reduce number of "if grid_ice". #778

Merged
merged 7 commits into from
Nov 17, 2022

Conversation

TillRasmussen
Copy link
Contributor

@TillRasmussen TillRasmussen commented Oct 22, 2022

For detailed information about submitting Pull Requests (PRs) to the CICE-Consortium,
please refer to: https://github.com/CICE-Consortium/About-Us/wiki/Resource-Index#information-for-developers

PR checklist

  • Short (1 sentence) summary of your PR:
  • Initialization of dynamics are now in one call, init_evp, init_eap or init_vp. Allocations are moved to the relevant subroutines within the driver
  • Removed if (first_call) in update_stress_function. Content moved to init_eap
  • Only allocate c and cd grid variables when needed. B variables are for consistency moved to be allocatable as well.
  • fld2, fld3 and fld4 are only allocated when necessary
  • init_dyn is renamed to init_dyn_shared
  • skip read of restart when variables are not allocated
  • Developer(s):
    @TillRasmussen
  • Suggest PR reviewers from list in the column to the right.
  • Please copy the PR test results link or provide a summary of testing completed below.

results.log

  • How much do the PR code changes differ from the unmodified code?
    • bit for bit
    • different at roundoff level
    • more substantial
  • Does this PR create or have dependencies on Icepack or any other models?
    • Yes
    • No
  • Does this PR update the Icepack submodule? If so, the Icepack submodule must point to a hash on Icepack's main branch.
    • Yes
    • No
  • Does this PR add any new test cases?
    • Yes
    • No
  • Is the documentation being updated? ("Documentation" includes information on the wiki or in the .rst files from doc/source/, which are used to create the online technical docs at https://readthedocs.org/projects/cice-consortium-cice/. A test build of the technical docs will be performed as part of the PR testing.)
    • Yes
    • No, does the documentation need to be updated at a later time?
      • Yes
      • No
  • Please provide any additional information or relevant details below:
    Rename of variables in stress subroutines (added _loc) in order to avoid potential conflict with modul variables with same name

[edited to remove spaces from checkboxes so that they appear correctly]

Copy link
Contributor

@eclare108213 eclare108213 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks okay to me, at first glance. There must not really be a conflict between the _loc variables and the nonlocal ones, since it's BFB -- is that change necessary or helpful for 1Devp? I'll approve but would like others to take a look too.

How close is this to just having separate subroutines (or even modules) for B, C and CD grids?

@apcraig
Copy link
Contributor

apcraig commented Oct 27, 2022

Do we know how much this improves performance?

@TillRasmussen
Copy link
Contributor Author

I did not check performance and it may not be significant, however:
Allocation of the c and c grid data when running a b grid simulation seemed to be waste of memory.
Some of the variables were allocated and deallocated at each time step. This is removed.
In order to have a separate function allocate the variables two choices exist. Either pass through subroutine arguments or move to be module parameters. The latter was chosen. The disadvantage is that the same names were used in the stress funtion. This was why I renamed them to *.loc. The same issue occurs in the EAP solution (not changed here).
The re-arrangement of if statements was a step towards separating the B grid from the C-CD grid driver. I think that there are pro's and con's of doing this in full, however for this part I think that it becomes easier to read (subjective opinion).
There will soon be commits more dedicated to the 1d solver.

Copy link
Contributor

@apcraig apcraig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all seems reasonable.

One comment. Rather than create a new method (alloc_dyn_evp), why don't we put the alloc calls in init_dyn. That way, we don't have to modify all the drivers. To be honest, for consistency, as part of that, I would migrate alloc_dyn_eap into init_eap too and get rid of alloc_dyn_eap completely. That moves the alloc into the "init" phase of all the dynamics options, where there are already alloc statements, makes things consistent, and ultimately reduces the number of calls and code changes that might have to be made in the drivers in the long term.

This also raises the question of whether init_dyn should be renamed init_evp and moved into dyn_evp and whether we need to review what data is in the dyn_shared module vs dyn_evp/eap/vp, what's common and what's not. We could add an init_evp subroutine and the init_evp/eap/vp could call init_dyn for common data allocation rather than having that call be in the drivers. For now, I also think we can add "if kdyn == 1" in init_dyn if needed as a first step. Overall, I'd prefer if we migrate towards the drivers calling "init_evp/eap/vp" only and then having those methods take care of allocs as well as calling init_dyn in the shared method. Can we at least not add alloc_dyn_evp now? Thoughts?

@TillRasmussen
Copy link
Contributor Author

That makes sense. I will update and move the allocation to dyn_init.

@TillRasmussen
Copy link
Contributor Author

A reflection on this.
Ice_dyn_shared can not call functions or use public variables from ice_dyn_{evp.eap,vp} as this will result in circular dependencies.
Therefore I think that the correct as you suggest to create a init_evp, init_eap and init_vp and let them call init_dyn. This will leave the if else statement in CICE_Init

@apcraig
Copy link
Contributor

apcraig commented Oct 28, 2022

Right, any data allocated in ice_dyn_shared should live in ice_dyn_shared. Data in evp/eap/vp should live in those modules and be allocated in init_evp/eap/vp. At the top level, having something like this

if (kdyn == 1) then
  call init_evp
elseif (kdyn==2) then
  call init_eap
elseif (kdyn==3) then
  call init_vp
endif

And then each init_evp/eap/vp would/could call init_dyn_shared which lives in ice_dyn_shared. That makes sense to me and seems cleaner than what we have now. thanks!

@apcraig
Copy link
Contributor

apcraig commented Nov 7, 2022

@TillRasmussen, I think we're waiting for a round of updates. If you need me to help, just let me know. No rush, just checking in. Thanks.

@TillRasmussen
Copy link
Contributor Author

EAP and EVP are bit for bit.
I removed if first_call in EAP as this might as the constant might as
Uploading results.log…
well be called in.
For the vp Picard it is bit for bit when comparing to main but not internally within the test. This is the same result I get for main. Does that agree with others experience?

@TillRasmussen
Copy link
Contributor Author

results.log

@apcraig
Copy link
Contributor

apcraig commented Nov 10, 2022

@TillRasmussen, just trying to understand what was done in your testing. I assume you ran a test suite and added -s dynpicard to turn on vp for all tests? And you created a baseline first "picardrefac" that didn't have your mods in it? What are the bfbcomp results for the picardrefac version before your changes? If they are also failing, then we need to look into the dynpicard a bit more. If those were passing, then there must be a problem with the process.

@TillRasmussen
Copy link
Contributor Author

TillRasmussen commented Nov 11, 2022 via email

@apcraig
Copy link
Contributor

apcraig commented Nov 11, 2022

This looks fine to me. Several things were moved around, but I think it works well.

I don't love the "_loc" variables in the stressC* routines. I don't think there is a problem using the module variable name for a local variable. The compiler will keep it straight. The other thing is that maybe all that module data should be moved into subroutine evp and allocated like uocnU (which would not select for C or CD grid). I know this is also not ideal. Or it could be allocated on first_call. I know you don't like this, but it's more consistent with the allocation of uocnU. To make things even more consistent, I would treat uocnU and the other variables in that group the same way as the C/CD variables, make them allocatable and alllocate on first call or make them module data and allocate in init_evp. Or we could remove those variables from the calling arguments and just access the module data. Many options to clean this up further. This could also be a next step after this PR.

I will run a test suite on cheyenne. But as I understand it, there still may be some problems with bit-for-bit on different pe counts and decomps for vp. As a general rule, we don't run test suites with the -s option, so we don't know which of those work or don't. And it's possible some of those are not setup properly or some of those are actually failing. We do have a test in omp_suite for dynpicard that should be picking up these bfb failures for different decomps, but maybe it's not comprehensive enough. Let me run some additional checks just to confirm.

@apcraig
Copy link
Contributor

apcraig commented Nov 11, 2022

I encourage @eclare108213 @phil-blain @JFLemieux73 to have a look at these mods, it would be a step towards revising the dynamics driver, although largely just deals with the initialization.

@apcraig
Copy link
Contributor

apcraig commented Nov 11, 2022

I ran a full test suite and everything passes except the sumchk and gridavgchk unit tests. Those unit tests have a CICE_InitMod.F90 that needs to be updated. Please update that code.

I did not test the decomp_suite with dynpicard but will try that next. I assume I'll get some failures and we can create an issue for them if needed.

Stepping back a bit, it would be really nice to reduce the amount of redundant code in the various drivers, pushing more of the init and run sequence out of the drivers. But that may not be an easy task. Maybe we should create an issue for that too though.

@TillRasmussen
Copy link
Contributor Author

@apcraig and others: Failing test: Is it prefered to call the alloc_dyn_shared from CICE_Init or from init_dyn_shared ?
I prefer the latter, however I dont mind doing the first.
A different solution than *.loc is preferred. I was just not 100% sure how it is handled by the compiler

With regards to allocation then I prefer to do it in the init_evp as this is a "first call" functon. There is no need to add additional if statements.

@apcraig
Copy link
Contributor

apcraig commented Nov 12, 2022

I think we want to leave CICE_init as it is now. It looks clean now, we don't want to add more calls to that.

The second issue is if we want to initialize the arrays in a different routine than evp, then the data has to be module data. If we initialize the data in evp via first_call, then that data can live in the subroutine. I don't have a strong preference. I am not worried at all about a single "first_call" if test in evp. It's executed once per timestep, not once per subcycling or once per block or once per gridcell. The cost would be zero to do that if test.

But, I'm also fine leaving the module data and init subroutine as it is now. Lets remove the _loc strings and verify it doesn't create any problems.

A final option which I was reluctant to suggest before would be to just get rid of arguments that exist as module data in evp. We really don't need to pass those down, they exist in the module already. But I think that'll create some asymetry in the B and C grid implementations that I'm not that keen on.

So, I would say

  • fix the unit tests
  • get rid of the _loc strings
  • refactor the local B arrays (which are also used for C and CD) so they are also allocated like the C/CD arrays. Right now they are subroutine data that is automatically allocated on entry as local data. But for consistency, I'd move them to module data and allocate them explicitly in the init method.

@TillRasmussen
Copy link
Contributor Author

@apcraig
I tried to run the following test with the main branch
./cice.setup --suite unittest_suite -m freya --testid unitrefac2 --bgen unitrefac2 -e intel,gnu
It gives me the following failed
freyja-6 CICE/testsuite.unitrefac2> ./results.csh | grep FAIL
FAIL freya_intel_unittest_gx3_4x1x25x29x4_sumchk test
FAIL freya_intel_unittest_tx1_8x1_sumchk test
FAIL freya_gnu_unittest_gx3_4x1x25x29x4_sumchk test
FAIL freya_gnu_unittest_tx1_8x1_sumchk test
Can you reproduce that?

@apcraig
Copy link
Contributor

apcraig commented Nov 13, 2022

On several machines, the sumchk fails in those tests because MPI_16 isn't implemented correctly. Have a look at the end of cice.runlog file for a summary of the tests. My guess is that it's failing on the "lsum16" tests. If this is happening, it's not unusual. It passes on many machines, fails on others, must depend on the MPI implementation. What you'll see above the final results are errors like this,

 real sum + real mask                lsum16  0.33325338363647461        0.12
 **** ERROR   0.12490401628404557        3.0000000000000000     

indicating it's the lsum16 option that's failing in the sumchk.

So you should get the same results with your updated code and that's fine. I think it's worth testing the lsum16 option even if it fails on some machines. Feedback should be provided to the MPI implementation team.

@TillRasmussen
Copy link
Contributor Author

TillRasmussen commented Nov 13, 2022

*loc removed
bug fix should be removed. Still errors in the unit test but except one they look like the one described by @apcraig .
This check is not performed in the new version - maybe due to a slight version difference
freya_intel_unittest_gx3_1x1x25x29x16_sumchk
dble sum vector 8.0000000 16.000000 24.000000
PASS dble sum vector
alloc_dyn_shared moved to init_dyn_shared.
NOTE I AM NOT SURE WHETHER THIS IS OF IMPORTANCE OF NUOPC/CMEP. Who is responsible for this?
Move of B grid variables to allocatable

Copy link
Contributor

@eclare108213 eclare108213 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all looks good to me. I definitely like making the structure of the evp/eap/vp modules/routines more similar, reusing code where possible, moving 'if (first_call)' tasks to init routines, and pulling if statements out of loops where possible. Thanks also for cleaning up column alignments, capitalization, etc.

The 'Conversation' in this PR is long and a little confusing -- some of what's listed in the initial PR template is no longer true, and other things changed that are mentioned later. So my main suggestion at this point is to make a concise list of what's actually in this PR, and edit the first (template) entry to include that list, e.g. under "additional information".

cicecore/cicedynB/dynamics/ice_dyn_evp.F90 Outdated Show resolved Hide resolved
@eclare108213
Copy link
Contributor

Beautiful, thank you. This can be merged with approval from @apcraig or @phil-blain or @JFLemieux73.

@apcraig
Copy link
Contributor

apcraig commented Nov 14, 2022

Retested on cheyenne, new problem. All boxslotcyl tests all fail. Looks like it's seg faulting here,

cice               000000000081F7CD  ice_restart_drive         222  ice_restart_driver.F90
cice               0000000000409E1E  cice_runmod_mp_ci          85  CICE_RunMod.F90
cice               000000000040438D  MAIN__                     49  CICE.F90

My guess is that something isn't being allocated that was before? A simple test to debug should be

smoke gbox80 1x1 boxslotcyl

boxslotcyl seems to be the only test with "kdyn=-1". I think that means the dyn initialization is not being called and moving some of those arrays to allocatable is causing the seg fault. I guess the question is how to fix this. My preference would be that if kdyn=-1, work1 at line 222 is set to 1 everywhere and then written. Or maybe not write iceumask if kdyn=-1 and then add a query_field for iceumask in the read restart subroutine.

@phil-blain
Copy link
Member

The runtime of the CI suite has gone over 2 hours in this version and is still running. A previous run (https://github.com/CICE-Consortium/CICE/actions/runs/3457289294) finished in 4 hours (!). Might be worth investigating?

@phil-blain
Copy link
Member

OK, the previous run was indeed 4 hours but it was the "download input data" step that took 4 hours: https://github.com/CICE-Consortium/CICE/actions/runs/3457289294/jobs/5770617947.

But for the current one (https://github.com/CICE-Consortium/CICE/actions/runs/3462290692/jobs/5780984937) it's really the "run suite" step that is at 2 hours and counting...

@apcraig
Copy link
Contributor

apcraig commented Nov 14, 2022

I restarted GHActions. I think it must be a cloud problem, but will try to keep an eye on it.

@TillRasmussen
Copy link
Contributor Author

Retested on cheyenne, new problem. All boxslotcyl tests all fail. Looks like it's seg faulting here,

cice               000000000081F7CD  ice_restart_drive         222  ice_restart_driver.F90
cice               0000000000409E1E  cice_runmod_mp_ci          85  CICE_RunMod.F90
cice               000000000040438D  MAIN__                     49  CICE.F90

My guess is that something isn't being allocated that was before? A simple test to debug should be

smoke gbox80 1x1 boxslotcyl

boxslotcyl seems to be the only test with "kdyn=-1". I think that means the dyn initialization is not being called and moving some of those arrays to allocatable is causing the seg fault. I guess the question is how to fix this. My preference would be that if kdyn=-1, work1 at line 222 is set to 1 everywhere and then written. Or maybe not write iceumask if kdyn=-1 and then add a query_field for iceumask in the read restart subroutine.

Alternatively always allocate iceUmask, iceEmask, iceNmask through init_dyn_shared. I think that these are the only arrays that are used outside kdyn, which originates from ice dynamics

@apcraig
Copy link
Contributor

apcraig commented Nov 14, 2022

I think the problem is that init_dyn_shared will not be called if kdyn=-1. iceUmask is already allocated anytime init_dyn_shared is called and that's the first array that is trapped as unallocated later in the run.

Either treat iceUmask like iceEmask and iceNmask in the restart code. Or check for kdyn=-1 in the restart code. Or revert how iceUmask is allocated (but I prefer to continue forward if possible).

@TillRasmussen
Copy link
Contributor Author

Do we want to account for these changes when using restartfile_v4?

@apcraig
Copy link
Contributor

apcraig commented Nov 14, 2022

I think we can leave restart_v4 as is. It's not part of our testing coverage and should probably be deprecated at some point. In addition, restart_v4 should be fine unless someone tries to restart with a v4 file AND kdyn=-1, not likely to happen.

@apcraig
Copy link
Contributor

apcraig commented Nov 15, 2022

One other thing, for information. dynpicard is not bit-for-bit with different pe counts and decomps unless you also turn on bit-for-bit global sums. So you might want to run a test suite forcing dynpicard with

-s dynpicard,reprosum

I believe this will pass bfbcomp tests above that failed with just -s dynpicard.

@TillRasmussen
Copy link
Contributor Author

TillRasmussen commented Nov 15, 2022

writes restart files including iceumask, icenmask and iceemask when these fields are not allocated. Primarily for binary read of files. Skip reading when they dont exist.
gridsys and omp suites were included. Passed except for a few test that failed due to walltime

@eclare108213
Copy link
Contributor

dynpicard is not bit-for-bit with different pe counts and decomps unless you also turn on bit-for-bit global sums. So you might want to run a test suite forcing dynpicard with

-s dynpicard,reprosum

Is this fact noted in the documentation, e.g. in the testing section and/or Troubleshooting?

@phil-blain
Copy link
Member

I added a note about it in the "developer guide", in the section about global sums: https://cice-consortium-cice.readthedocs.io/en/main/developer_guide/dg_other.html#reproducible-sums

@TillRasmussen
Copy link
Contributor Author

dynpicard is not bit-for-bit with different pe counts and decomps unless you also turn on bit-for-bit global sums. So you might want to run a test suite forcing dynpicard with
-s dynpicard,reprosum

Is this fact noted in the documentation, e.g. in the testing section and/or Troubleshooting?

Thanks, they are now bit for bit

@apcraig
Copy link
Contributor

apcraig commented Nov 16, 2022

Ran a test suite on cheyenne, pio is hanging. I think it's the query_field subroutine. Let me try to debug a bit. More soon.

@apcraig
Copy link
Contributor

apcraig commented Nov 16, 2022

OK, I think we need one more modification. This is my fault, query_field was never tested in pio. That subroutine in cicecore/cicedynB/infrastructure/io/io_pio2/ice_restart.F90 needs to be changed

original:

-      if (my_task == master_task) then
-         status = pio_inq_varid(File,trim(vname),vardesc)
-         if (status == PIO_noerr) query_field = .true.
-      endif
-      call broadcast_scalar(query_field,master_task)

new:

+      status = pio_inq_varid(File,trim(vname),vardesc)
+      if (status == PIO_noerr) query_field = .true.

Basically, the pio_inq_varid should not be inside a master_task if check and then there doesn't need to be a broadcast. @TillRasmussen, could you make this change please. Thanks!

@apcraig
Copy link
Contributor

apcraig commented Nov 16, 2022

Thanks @TillRasmussen! I think this is all working well now. I want to run a few more tests, then I think we can merge. I think this is a nice cleanup.

Copy link
Contributor

@apcraig apcraig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Testing looks good.

@apcraig apcraig merged commit 99daf69 into CICE-Consortium:main Nov 17, 2022
@TillRasmussen TillRasmussen deleted the evp-refactor branch November 18, 2023 20:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants