-
Notifications
You must be signed in to change notification settings - Fork 340
ctsm5.3.051: Update submodules to cesm3_0_beta06 + MEGAN namelist (answer change) #3125
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ctsm5.3.051: Update submodules to cesm3_0_beta06 + MEGAN namelist (answer change) #3125
Conversation
… Derecho for intel, to the one used in cesm3_0_alpha06d
…u with mpi-serial
…ks for the failing tests
…fig branch with derecho_gnu fixes
|
aux_clm testing on Izumi is as expected. No differences to baseline and all expected tests pass. On Derecho however the following are unexpected: Three compare different to baseline:The following 22 tests are listed as pending, where they were submitted, but didn't seem to execute while running. Which is odd:The following three failed unexpectedly:For the tests with differences:The difference in the first two nvhpc tests is probably because the nvhpc build was changed to use: cray-libsci/24.03.0 which it didn't use before. The ISSP245Clm50BgcCrop compset changes answers because of the update to the CDEPS tag where ISSP cases turn on anomaly forcing out of the box. So answer changes are all expected. For the pending testsI resubmitted one and it acted the same, returning quickly. So I'll need to look into this further. The three failsFUNIT and MKSURF fails with:So there's probably some simple adjustments to the latest submodules to recognize GPU_TYPE. Which shouldn't be hard to fix. The datm_ssp126_anom_forc testmod probably fails, because it needs #2686 |
|
The list of pending tests seem to legitimately fail early. And they give little sign of where it's dying and no traceback. I turned on PET files and upped the ESMF verbosity level to the max, and it didn't give me more information. And still nothing helpful. It does look like it's failing in the land initialization somewhere. These cases are also failing with both gnu and intel compilers and with DEBUG on and off. The commonality is that they are all mpi-serial. But, I didn't see a difference in the build better the previous working version #3111 and this one. So next thing I will try is to do an incremental update of cesm alpha tag submodules, so testing alpha06e, and alpha06f to see where this behavior happens. This likely means that the problem is in either CMEPS or CDEPS maybe? So tests to try:
The problem occurs between alpha06e and alpha06f. The difference in submodules is: -fxtag = ccs_config_cesm1.0.32
+fxtag = ccs_config_cesm1.0.40
-fxtag = cime6.1.72
+fxtag = cime6.1.93
-fxtag = cmeps1.0.42
+fxtag = cmeps1.0.47
-fxtag = cdeps1.0.65
+fxtag = cdeps1.0.73
-fxtag = MPIserial_2.5.1
+fxtag = MPIserial_2.5.4One way to divide it up is to put the build things together: ccs_config, cime, and mpi-serial, and the code things together: cmeps and cdeps. It failed with leaving the code behind and updating the build. But, then passed when cime and ccs_config were backed off a bit. Updating ccs_config it still passes. And then logically it was between cime6.1.87 and cime6.1.93 which I could use git-bisect to find the commit in cime with the problem. It had to do with how much memory to ask for in the batch system. |
The mpi-serial case fails here.
This PASSes for mpi-serial This was something that was in a CESM commit to .gitmodules
This passes. Which shows the problem is between cime6.1.87 and cime6.1.93 so should be able to be solved with git-bisect.
Now, with a cime branch to fig the mpi-serial issue, update submodules back up to cesm3_0_beta06 versions. I ran a list of tests that worked, but now will run aux_clm again as well as ctsm_sci.
…m_ssp126_anom_forc test because it no longer works, and the changes in ESCOMP#2686 handle it
…change a spinup test to Clm60
Remove MKSURFDATAESMF from prealpha testing. Switch the prealpha plain ne30 test to ctsm_sci Add a FATES NoComp test to prebeta
Remove two Clm45 tests from prealpha and aux_cime_baselines. As well as two till tests from prebeta Replace with Clm60 tests with ciso, izumi_nag, nldas, Fates, and DEBUG off mpi-serial for prebeta and the last for prealpha.
…lp with identifying mpi-serial build issues in the nightly testing with submodules updated
|
OK, submitted all the testing to Izumi and Derecho: aux_clm, ctsm_sci, and fates. We'll see that shows tomorrow morning... |
|
I got a few unexpected fails, but I have a fix for a couple of them, and will make one into an issue that we can probably fix on b4b-dev. The FATES cases seem to be due to not enough memory for partial nodes. There's a simple fix for them, I'll increase memory asked for per task for FATES cases. I'll file an issue for the FUNITCTSM problem, it's probably something in the cime update beyond cime6.1.100 where I last tested FUNITCTSM on Izumi. |
This is needed on Derecho for single processor FATES cases.
|
There are a few baselines on Izum for FATES tests that I can't compare to because of permissions: ERS_D_Ld30.f45_f45_mg37.HIST_DATM%CRUv7_CLM50%FATES_SICE_SOCN_SROF_SGLC_SWAV_SESP.izumi_nag.clm-FatesColdLandUse (BASELINE) But, outside of that, answers are as expected. Only megan fields for non-FATES tests, otherwise identical on izumi. |
|
On Derecho answer changes are as expected: derecho_intel and derecho_nvhpc change answers as expected |
Fix Linux Podman; prefer Linux Docker; update docs docs Conflicts: doc/ChangeLog doc/ChangeSum
slevis-lmwg
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @ekluzek
Description of changes
Update submodules to cesm3_0_beta06
This starts from #3111 which brings in the answer changes for derecho_intel, by updating the compiler to use the intel-oneapi backend.
Contributors other than yourself, if any:
CTSM Issues Fixed (include github issue #):
Fixes #2710
Fixes #2476
Fixes #3135
Fixes #3108
Address some things in #3156
Are answers expected to change (and if so in what way)? Yes
derecho_intel and derecho_nvhpc
Any User Interface Changes (namelist or namelist defaults changes)? No
Does this create a need to change or add documentation? Did you do so? No No
Testing performed, if any: Running regular testing ctsm_sci and fates test lists