Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bring rpointer changes to BLOM #513

Merged
merged 17 commits into from
Mar 20, 2025

Conversation

mvdebolskiy
Copy link
Collaborator

@mvdebolskiy mvdebolskiy commented Mar 12, 2025

Added rpointer-timestamps handling for BLOM with NUOPC.
MCT and standalone should be backwards compatible.

ERR_D_Ld5.TL319_tn14.NOIIAJRAOC.betzy_intel and ERI_D_Ld5.TL319_tn14.NOIIAJRAOC.betzy_intel.

Tests to be ran apart from aux_blom_noresm test-suite:

  • standalone
  • ERR,ERI tests with MCT (mct not available for testing or case-creation with current cime)
    • ERI test does not work for noresm2_3_alpha03. I was able to run after unzipping files manually
  • ERR, ERI tests with multiple blom instances.
  • Run a hybrid run from an older blom run with old rpointers.
  • Hybrid case with multiple blom instances starting from a case with one instance (old rpointers and new)

@TomasTorsvik @matsbn Can you help with the rest of the tests? Running NINST>0 even with v1.7.1.4. gives me errors

@mvertens
Copy link
Contributor

Is there still the expectation that BLOM needs to support MCT in these development versions? I thought that moving forwards we were going drop support for MCT as of some blom tag. Can someone please clarify where things stand with this?

@mvdebolskiy mvdebolskiy mentioned this pull request Mar 12, 2025
12 tasks
@mvdebolskiy
Copy link
Collaborator Author

@mvertens noresm2.3 is ran with mct and I assume, next blom version is still going into noresm2.3

@mvertens
Copy link
Contributor

@mvertens noresm2.3 is ran with mct and I assume, next blom version is still going into noresm2.3

Yes - that is right. I just realized that a full discussion of this is here #462. So my understanding is that support for mct will be dropped in v1.9.

@TomasTorsvik
Copy link
Contributor

@mvdebolskiy - I tested NorESM2_3_alpha03 + dev1.7.1.4 for ERR and ERI. ERR test pass and ERI fails with an error in the ref2 case:

ERROR: 
 GETFIL: FAILED to get ERI.f19_tn14.N1850.betzy_intel.G.20250312_202651_n0j661.r
 ef1.cam.i.0001-01-04-00000.nc

This looks strange to me, because I see there are restart files in ref1, so I don't understand how this goes wrong.

@matsbn
Copy link
Contributor

matsbn commented Mar 12, 2025

@mvertens noresm2.3 is ran with mct and I assume, next blom version is still going into noresm2.3

Yes - that is right. I just realized that a full discussion of this is here #462. So my understanding is that support for mct will be dropped in v1.9.

Hi @mvertens! The MCT support was discussed at a BLOM-core meeting today. There is still some reluctance, including from me, to give up on the MCT support, also after the NorESM2.3 release. There are some potential needs for updated BLOM coupled to older versions of NorESM. Should be discussed further.

@TomasTorsvik
Copy link
Contributor

@mvdebolskiy - I tested NorESM2_3_alpha03 + dev1.7.1.4 for ERR and ERI. ERR test pass and ERI fails with an error in the ref2 case:

ERROR: 
 GETFIL: FAILED to get ERI.f19_tn14.N1850.betzy_intel.G.20250312_202651_n0j661.r
 ef1.cam.i.0001-01-04-00000.nc

This looks strange to me, because I see there are restart files in ref1, so I don't understand how this goes wrong.

The restart files are compressed .gz files, while ERI test is looking for uncompressed input files.
For the ERR case, some of the restart files are stored uncompressed.

@mvdebolskiy , @gold2718 - it seems that ERI is not set up correctly for noresm2_3_develop branch of cime.

@mvdebolskiy
Copy link
Collaborator Author

@TomasTorsvik I've run just the noresm2_3_develop checkout and can confirm that ERI tests do not work (hybrid case fails).
Though, if you just follow the instructions on noresm-docs for setting up simulation and don't set GET_REFCASE, and instead unzip files - hybrid and branch runs should work.

@mvertens
Copy link
Contributor

@mvertens noresm2.3 is ran with mct and I assume, next blom version is still going into noresm2.3

Yes - that is right. I just realized that a full discussion of this is here #462. So my understanding is that support for mct will be dropped in v1.9.

Hi @mvertens! The MCT support was discussed at a BLOM-core meeting today. There is still some reluctance, including from me, to give up on the MCT support, also after the NorESM2.3 release. There are some potential needs for updated BLOM coupled to older versions of NorESM. Should be discussed further.

Hi @matsbn! I'm happy to be able to finally work a bit this week. I'd love to be part of further discussions if I will be working when those occur. I understand that there are competing needs. However, I think that continuing to keep the MCT code support in development versions will make it much harder to incorporate the type of new capabilities that I was hoping to bring in - such as run time interpolation of forcing datasets (e.g. riverine input) where you can just use a native resolution of the data and interpolate at run time to the BLOM grid. These are all outline in #462 for v1.9. I think the code will continue to get more difficult to maintain and test as new capabilities are brought in. There is a possible workflow that maintains MCT support for older versions of BLOM without restricting new development that require NUOPC. A discussion of that was started in #439. Is this still a possible planned path forwards?

@TomasTorsvik
Copy link
Contributor

@mvdebolskiy - I get an error when I try to build noresm2_5_alpha09a with your changes in BLOM

/cluster/projects/nn9560k/tomast/NorESM/src/hybrid_fatessp.2_5a09a.BLOMrpointers-update-blom/components/blom/drivers/nuopc/ocn_comp_nuopc.F90(40): error #6580: Name in only-list does not exist or is not accessible.   [SHR_GET_RPOINTER_NAME]
                                alarmInit, shr_get_rpointer_name
-------------------------------------------^
/cluster/projects/nn9560k/tomast/NorESM/src/hybrid_fatessp.2_5a09a.BLOMrpointers-update-blom/components/blom/drivers/nuopc/ocn_comp_nuopc.F90(470): error #6632: Keyword arguments are invalid without an explicit interface.   [RC]
      call shr_get_rpointer_name(gcomp, 'ocn', ymd, tod, rpfile, 'write', rc=rc)
--------------------------------------------------------------------------^
compilation aborted for /cluster/projects/nn9560k/tomast/NorESM/src/hybrid_fatessp.2_5a09a.BLOMrpointers-update-blom/components/blom/drivers/nuopc/ocn_comp_nuopc.F90 (code 1)

@mvdebolskiy
Copy link
Collaborator Author

mvdebolskiy commented Mar 14, 2025

@TomasTorsvik You will get an error with just noresm2_5alpha09a since that one point to a different version of share.
I will push couple of commits in a minute, that fix ERI tests for nuopc.
Also, @gold2718 needs to approve share/cmeps/cdeps PRs before I do a draft PR to NorESMhub/NorESM.
Had to split blom_init into two routines. I will make an issue to refactor namelist reading within blom, so the whole thing is more flexible. Config variables reading and multi-processor controls are set up in a separate routine, that does not involve any reading of netcdf-files.

@jmaerz jmaerz mentioned this pull request Mar 14, 2025
Copy link
Contributor

@mvertens mvertens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The initial file for hybrid runs should be read in from ICFILE which is not being set correctly for hybrid runs. A hyrid run is an initial run where continue_run is false - so you will never be reading the rpointer file for that.
I have summarized the issue in #518.

@mvdebolskiy mvdebolskiy requested a review from mvertens March 15, 2025 12:51
@mvdebolskiy
Copy link
Collaborator Author

mvdebolskiy commented Mar 17, 2025

@TomasTorsvik @matsbn this is basically ready to go. Since ci workflows path, I assume standalone-standalone runs are ok.
I fixed gnu compiler, and the only thing I am waiting for is WW3 update. Plus these:

  • Add/switch gnu tests
  • Add/switch ERI_C2_ test

@TomasTorsvik
Copy link
Contributor

@mvdebolskiy - The CI workflow checks standalone, so that should be OK.

I set up some tests using the GNU compiler, see PR #519 , but so far I didn't get them to work.

What do you mean by "ERI_C2"?

@TomasTorsvik
Copy link
Contributor

@matsbn , @milicak - Is there a preference, if this should be merged before or after PR #515 ?

@mvdebolskiy
Copy link
Collaborator Author

@TomasTorsvik You will need an updated ccs_config for gnu to work. THere was a typo in the cprnc path

@TomasTorsvik
Copy link
Contributor

@mvdebolskiy - I manage to get the GNU test working when changing

<env name="CPRNC_PREF">/cluster/shared/noresm/tools/foss2023b</env>

The tests with WW3 are failing at the build stage for the "wav" component.

@matsbn
Copy link
Contributor

matsbn commented Mar 18, 2025

@matsbn , @milicak - Is there a preference, if this should be merged before or after PR #515 ?

I do not anticipate major merge conflicts either way, so I do not have a strong preference.

Additionally, I can confirm that standalone BLOM with MNP2 grid and "ben02syn" expcnf still works well after the recent commits to this PR.

@TomasTorsvik
Copy link
Contributor

@mvdebolskiy - Is this ready to merge, or should we still wait for WW3 update? Also, I didn't get what the "ERI_C2_test" should be.

@TomasTorsvik TomasTorsvik self-assigned this Mar 18, 2025
@TomasTorsvik TomasTorsvik linked an issue Mar 18, 2025 that may be closed by this pull request
@mvdebolskiy
Copy link
Collaborator Author

adding _C<x>_ to the test makes it run with <x> instances.
I will wait a little before merging for ww3, until tomorrow's meeting.

@TomasTorsvik
Copy link
Contributor

adding _C<x>_ to the test makes it run with <x> instances. I will wait a little before merging for ww3, until tomorrow's meeting.

OK, thanks for explaining. You now have write access to the BLOM repository, so you can push it when you think it is ready.

@mvdebolskiy
Copy link
Collaborator Author

@TomasTorsvik I am getting ERP test fails at COMPARE_BASE_REST.
Is this supposed to happen?

@TomasTorsvik
Copy link
Contributor

@mvdebolskiy - yes, we are getting fail on ERP_Ld3_P256.T62_tn14.NOINYOC tests. I haven't figured out why, but we have had this since we started testing with ERP.

@jmaerz
Copy link
Collaborator

jmaerz commented Mar 19, 2025

@TomasTorsvik is there an issue on this (and some description?) - is it #510?

@mvdebolskiy
Copy link
Collaborator Author

OK, I am adding an ExpectedTestFails.xml to use with -x option in cs.status.

@TomasTorsvik
Copy link
Contributor

@TomasTorsvik is there an issue on this (and some description?) - is it #510?

No, #510 is something else, that is a new test that I added recently. There is a description in #501 .
We have had two tests failing regularly, the ERP and the one using the T62_tn21 grid. The grid issue should be resolved in noresm3_0_alpha01.

@mvdebolskiy
Copy link
Collaborator Author

@TomasTorsvik Go ahead and merge/tag this.
New baselines are in /cluster/shared/noresm/noresm_baselines/blom_develop/noresm3_0_alpha01.
When you run Tests next time, use -x <path-to-blom>/cime_config/testdefs/ExpectedTestFails.xml to check if tests unexpectedly pass/fail.

@TomasTorsvik TomasTorsvik merged commit 55514ab into NorESMhub:master Mar 20, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
5 participants