Skip to content

Merge b4bdev 20260326#3894

Open
slevis-lmwg wants to merge 59 commits intoESCOMP:masterfrom
slevis-lmwg:merge-b4bdev-20260326
Open

Merge b4bdev 20260326#3894
slevis-lmwg wants to merge 59 commits intoESCOMP:masterfrom
slevis-lmwg:merge-b4bdev-20260326

Conversation

@slevis-lmwg
Copy link
Copy Markdown
Contributor

@slevis-lmwg slevis-lmwg commented Mar 26, 2026

Description of changes


  • Update the ChangeLog based on the information above

Specific notes

Contributors other than yourself, if any:
@huiqi-wang @ekluzek @ijaguirre @mvdebolskiy

CTSM Issues Fixed (include github issue #):
Resolves #3256
Resolves #3234
Resolves #3703
Resolves #2997
Resolves #3541
Resolves #3507

Are answers expected to change (and if so in what way)?
No

Any User Interface Changes (namelist or namelist defaults changes)?
No

Does this create a need to change or add documentation? Did you do so?
Much of this work involves updates to the documentation

Testing performed, if any:

  • make black and lint
  • ./run_ctsm_py_tests -u and -s
  • ./build-namelist_test.pl
  • ./run_sys_tests -s aux_clm -c ctsm5.4.028 -g ctsm5.4.029 on derecho
  • ./run_sys_tests -s aux_clm -c ctsm5.4.028 -g ctsm5.4.029 on izumi

ijaguirre and others added 30 commits July 6, 2025 14:07
Merge b4b-dev to master

Purpose and description of changes
----------------------------------

Merge b4b-dev to master:

Important things coming in:

 - mizuRoute! (NOTE: mizuRoute is a River model (ROF in the CESM context) that can be run in place of MOSART or RTM)
 - Documentation updates
 - Move a few namelist parameters to the parameter file
 - move of X components for nuopc
 - mksurfdata_esmf for Gnu compiler
 - scripts for CRUJRA forcing to handle Antarctica/Greenland
 - SpinupStablity script update to handle FATES and SE grids
 - Update cdeps for access to CMIP7 CO2
 - Tests for FATES Tree Recruitment Scheme (TRS)

Main grids to use with mizuRoute:

  Default mizuRoute grids will use the half degree land-only mizuRoute grid that is the same resolution as the MOSART grid.

  - 5x5_amazon_r05 - is the small Amazon region for testing
  - 5x5_amazon_rHDMA - is the small Amazon region using the HDMA for mizuRoute
  - nldas2_nldas2_rUSGS_mnldas2 - is the Continental US grid with USGS Geospatial Fabric
  - f09_f09_rHDMA_mg17 - is the 2 degree grid with the medium resolution HDMA grid
  - f09_f09_rHDMAlk_mg17 - is the 2 degree grid with the medium resolution HDMA grid that includes lakes
  - hcru_hcru_rMERIT_mt13 - is the half degree grid with the high resolution MERIT grid

Standard case to run with mizuRoute:

  grid=f09_f09_rHDMAlk_mg17 compset=I2000Clm60SpMizGs
slevis resolved conflicts:
tools/contrib/README.md
bfb: Bugfix for FatesSp compiled with intel
In tools/contrib remove popden.ncl and run_clmtowers; update the corresponding README
See if this removes the error in line 81
@slevis-lmwg

This comment was marked as outdated.

@slevis-lmwg slevis-lmwg requested a review from ekluzek March 27, 2026 00:37
@slevis-lmwg
Copy link
Copy Markdown
Contributor Author

slevis-lmwg commented Mar 27, 2026

Numerous unexpected failures on derecho unfortunately:

1 ERI_Ld41.f10_f10_mg37.I2000Clm60BgcCrop.derecho_gnu.clm-default COMPARE_branch_hybrid
2 SMS_D_Ld5.5x5_amazon_rHDMA.I2000Clm60SpMizGs.derecho_gnu.mizuroute-default NLCOMP
3 SMS_D_Ld5.5x5_amazon_rHDMA.I2000Clm60SpMizGs.derecho_gnu.mizuroute-default BASELINE ctsm5.4.028: DIFF
4 ERI_D_Ld9.f45_f45_mg37.I2000Clm60FatesSpCruRsGs.derecho_intel.clm-FatesColdSatPhenCamLndTuningMode BASELINE ctsm5.4.028: ERROR BFAIL some baseline files were missing
5 ERP_P64x2_D_Ld5.f10_f10_mg37.I1850Clm50Bgc.derecho_intel.clm-ciso--clm-matrixcnOn_ignore_warnings RUN
6 ERS_D_Ld7_Mmpi-serial.1x1_smallvilleIA.IHistClm50BgcCropRs.derecho_intel.clm-decStart1851_noinitial RUN
7 ERS_D_Mmpi-serial_Ld5.5x5_amazon.I2000Clm60FatesRs.derecho_intel.clm-FatesCold RUN
8 ERS_Ly5_P128x1.f10_f10_mg37.IHistClm60BgcCrop.derecho_intel.clm-cropMonthOutput--clm-matrixcnOn_ignore_warnings RUN
9 PET_P64x2_D.f10_f10_mg37.I1850Clm50BgcCrop.derecho_intel.clm-default--clm-matrixcnOn_ignore_warnings RUN
10 SMS_D_Ld5.5x5_amazon_rHDMA.I2000Clm60SpMizGs.derecho_intel.mizuroute-default NLCOMP
11 SMS_D_Ld5.5x5_amazon_rHDMA.I2000Clm60SpMizGs.derecho_intel.mizuroute-default BASELINE ctsm5.4.028: DIFF
12 SMS_D_Ly6_Mmpi-serial.1x1_smallvilleIA.IHistClm45BgcCropQianRs.derecho_intel.clm-cropMonthOutput RUN
13 SMS_Ld10_D_Mmpi-serial.CLM_USRDAT.I1PtClm60Fates.derecho_intel.clm-FatesFireLightningPopDens--clm-NEON-FATES-NIWO RUN
14 SMS_Lm3_D_Mmpi-serial.1x1_brazil.I2000Clm50FatesCruRsGs.derecho_intel.clm-FatesColdHydro RUN
15 SSPMATRIXCN_Ly5_Mmpi-serial.1x1_numaIA.I2000Clm60BgcCropQianRs.derecho_intel.clm-ciso_monthly BASELINE ctsm5.4.028: DIFF

Based on this PR's changes, I attribute the failures to the changes in

  • .gitmodules OR
  • SatellitePhenologyMod.F90 OR
  • I should rerun the failures in case they pass next time...

UPDATE: ALL CONCERNS RESOLVED:

The first failure in the list has these diffs in this file ERI_Ld41.f10_f10_mg37.I2000Clm60BgcCrop.derecho_gnu.clm-default.GC.0326-172426de_gnu.mosart.h0a.2004-01.nc.branch.cprnc.out AND it has occurred since ctsm5.3.078:

 RMS time                             8.9375E+00            NORMALIZED  4.4757E-01
 RMS time_bounds                      1.2640E+01            NORMALIZED  6.3297E-01
 RMS DIRECT_DISCHARGE_TO_OCEAN_ICE    2.7467E+00            NORMALIZED  8.8485E+00
 RMS DIRECT_DISCHARGE_TO_OCEAN_LIQ    4.2832E+00            NORMALIZED  1.3171E+00
 RMS RIVER_DISCHARGE_OVER_LAND_LIQ    3.7048E+03            NORMALIZED  4.7858E+00
 RMS RIVER_DISCHARGE_TO_OCEAN_LIQ     5.4230E+02            NORMALIZED  5.3018E+01
 RMS TOTAL_DISCHARGE_TO_OCEAN_ICE     2.7467E+00            NORMALIZED  8.8485E+00
 RMS TOTAL_DISCHARGE_TO_OCEAN_LIQ     4.7998E+02            NORMALIZED  4.2900E+01

The second/third has these diffs in SMS_D_Ld5.5x5_amazon_rHDMA.I2000Clm60SpMizGs.derecho_gnu.mizuroute-default.GC.0326-172426de_gnu.*.2000-01-06-00000.nc.cprnc.out AND passed on second attempt:

 RMS TWS                              1.3186E+00            NORMALIZED  4.4715E-04
 RMS VOLR                             1.2424E+08            NORMALIZED  2.8238E-01
 RMS VOLRMCH                          1.2424E+08            NORMALIZED  2.8238E-01

 RMS lndExp_Flrr_volr                 1.3186E-03            NORMALIZED  2.8207E-01
 RMS lndExp_Flrr_volrmch              1.3186E-03            NORMALIZED  2.8207E-01
 RMS rofImp_Flrr_volr                 2.7567E+00            NORMALIZED  3.4586E+01
 RMS rofImp_Flrr_volrmch              2.7567E+00            NORMALIZED  3.4586E+01
 RMS rofImp_Forr_rofl                 5.3859E+02            NORMALIZED  2.5794E+00

 RMS basinID                                 NaN            NORMALIZED  0.0000E+00
 RMS reachID                                 NaN            NORMALIZED  0.0000E+00
 RMS time                             5.0000E-01            NORMALIZED  1.1765E-01
 RMS basRunoff                        4.4759E-05            NORMALIZED  8.3166E-01

The fifth passed upon the first manual ./case.submit
6, 7, 12, 13, 14 say forrtl: error (73): floating divide by zero in init 115 ch4FInundatedStreamType.F90 and the error repeated in 6, 12 with the first manual ./case.submit and I didn't bother reproducing in 7, 13, 14 AND the problem first appears in 024 as documented in the ChangeLog
8 has run out of wallclock three times, submitting now for 4 hrs... PASS
9 says ERROR in SparseMatrixMultiplyMod.F90 at line 973 and passed with the second manual ./case.submit
10/11 has these diffs in SMS_D_Ld5.5x5_amazon_rHDMA.I2000Clm60SpMizGs.derecho_intel.mizuroute-default.GC.0326-172426de_int.*.2000-01-06-00000.nc.cprnc.out AND passed on second attempt:

RMS TWS                              1.3186E+00            NORMALIZED  4.4715E-04
RMS VOLR                             1.2424E+08            NORMALIZED  2.8238E-01
RMS VOLRMCH                          1.2424E+08            NORMALIZED  2.8238E-01

RMS lndExp_Flrr_volr                 1.3186E-03            NORMALIZED  2.8207E-01
RMS lndExp_Flrr_volrmch              1.3186E-03            NORMALIZED  2.8207E-01
RMS rofImp_Flrr_volr                 2.7567E+00            NORMALIZED  3.4586E+01
RMS rofImp_Flrr_volrmch              2.7567E+00            NORMALIZED  3.4586E+01
RMS rofImp_Forr_rofl                 5.3859E+02            NORMALIZED  2.5794E+00

RMS basinID                                 NaN            NORMALIZED  0.0000E+00
RMS reachID                                 NaN            NORMALIZED  0.0000E+00
RMS time                             5.0000E-01            NORMALIZED  1.1765E-01
RMS basRunoff                        4.4759E-05            NORMALIZED  8.3166E-01

15 has missing baselines

@slevis-lmwg
Copy link
Copy Markdown
Contributor Author

slevis-lmwg commented Mar 27, 2026

Two unexpected failures on izumi:

  • One passed upon the first manual ./case.submit
  • The other is on the second manual ./case.submit...
  • chmod a+r -R ctsm5.4.029

@slevis-lmwg
Copy link
Copy Markdown
Contributor Author

slevis-lmwg commented Mar 27, 2026

@ekluzek summarizing the list of failures above that we need to discuss before further action:
The tests that I numbered 1, 2/3, and 10/11 because they end up with DIFFs.

UPDATE: Erik and I met for 5 minutes to go over this, and we will follow up on Monday.

@ekluzek
Copy link
Copy Markdown
Collaborator

ekluzek commented Mar 28, 2026

@slevis-lmwg I looked over your test cases, mostly to see if your submodules were out of sync or something like that. They aren't so it's not something that simple.

I think a next step would be to rerun the tests that are showing differences from the ctsm5.4.028 in a vanilla checkout of ctsm5.4.028 -- to see if that differs from the baselines I made. There might be something wrong with the baselines, so it would be good to double check that. I did a little checking on them, but didn't spot anything as problematic. But, rerunning a few of these tests would be a good verification.

The updates in the submodules, are pretty limited. It's only ccs_config, cime, and pio. cime and pio don't usually change answers at this point. There was a change to the ERI test for cime, but it looked benign. The changes in ccs_config were also innocuous in this case, so I don't think it's that.

But, the failing ERI test might be due to the cime update. So you could try it without the cime update. The change for ERI was in cime6.1.157, so you could also just remove that change to the eri test. Or do something like try cime6.1.156

If the mizuRoute baselines look fine, you should try them without the submodule updates and then maybe we track down which of the updates is causing the difference.

So a few things to try and think about...

@slevis-lmwg
Copy link
Copy Markdown
Contributor Author

slevis-lmwg commented Mar 30, 2026

The three tests with diffs, now testing in vanilla 025:

1 ./create_test ERI_Ld41.f10_f10_mg37.I2000Clm60BgcCrop.derecho_gnu.clm-default -c /glade/campaign/cgd/tss/ctsm_baselines/ctsm5.4.025
2 ./create_test SMS_D_Ld5.5x5_amazon_rHDMA.I2000Clm60SpMizGs.derecho_gnu.mizuroute-default -c /glade/campaign/cgd/tss/ctsm_baselines/ctsm5.4.025
3 ./create_test SMS_D_Ld5.5x5_amazon_rHDMA.I2000Clm60SpMizGs.derecho_intel.mizuroute-default -c /glade/campaign/cgd/tss/ctsm_baselines/ctsm5.4.025

The first has diffs in this file ERI_Ld41.f10_f10_mg37.I2000Clm60BgcCrop.derecho_gnu.clm-default.C.20260330_112110_gvjhce.mosart.h0a.2004-01.nc.base.cprnc.out as in my b4b-dev test AND it has occurred since ctsm5.3.078:

 RMS time                             8.9375E+00            NORMALIZED  4.4757E-01
 RMS time_bounds                      1.2640E+01            NORMALIZED  6.3297E-01
 RMS DIRECT_DISCHARGE_TO_OCEAN_ICE    2.7467E+00            NORMALIZED  8.8485E+00
 RMS DIRECT_DISCHARGE_TO_OCEAN_LIQ    4.2832E+00            NORMALIZED  1.3171E+00
 RMS RIVER_DISCHARGE_OVER_LAND_LIQ    3.7048E+03            NORMALIZED  4.7858E+00
 RMS RIVER_DISCHARGE_TO_OCEAN_LIQ     5.4230E+02            NORMALIZED  5.3018E+01
 RMS TOTAL_DISCHARGE_TO_OCEAN_ICE     2.7467E+00            NORMALIZED  8.8485E+00
 RMS TOTAL_DISCHARGE_TO_OCEAN_LIQ     4.7998E+02            NORMALIZED  4.2900E+01

The second and third,

  • both running in vanilla 025 and 028 do NOT have diffs relative to the 025 and 028 baselines, respectively
  • both running in the branch update_submodules_to_cesm30a08l do NOT have diffs relative to the 025 baseline
  • both in this PR but with cime6.1.156 fail in SETUP though that was suggested for the ERI test, so irrelevant
  • both with the locked b4b-dev branch do NOT have diffs relative to the 025 baseline
  • both in this PR to reproduce the original failures: I'm NOT reproducing the failures, so I reran them to generate corresponding baselines.

@slevis-lmwg
Copy link
Copy Markdown
Contributor Author

@ekluzek this is ready for approval.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bfb bit-for-bit

Projects

Status: In Progress

6 participants