Skip to content

make sure that ocn bgc baselines are captured #253

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Apr 28, 2025

Conversation

jedwards4b
Copy link
Contributor

Fixes #249 tested using ./create_test SMS_Ld3.TL319_t232.G1850MARBL_JRA.derecho_intel --baseline-root $SCRATCH/bltest/ --generate tryagain

@jedwards4b jedwards4b requested a review from mnlevy1981 April 25, 2025 15:19
@jedwards4b jedwards4b self-assigned this Apr 25, 2025
Copy link
Collaborator

@mnlevy1981 mnlevy1981 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy to make a few changes based on my comments here and push them to your branch

Comment on lines 16 to 17
<hist_file_ext_regex>h.bgc.z(\._\d*)?</hist_file_ext_regex>
<hist_file_ext_regex>h.bgc.native(\._\d*)?</hist_file_ext_regex>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a cleaner fix would be to update the line above to capture both string.string and string.string.string:

     <!-- match filenames of the form
          ic.(date string)[._optional instance number].nc[.optional tile number] -->
     <hist_file_extension>ic.[-\d+]+(\._\d*)?\.nc(\.\d*)?$</hist_file_extension>
-    <hist_file_ext_regex>\w+\.\w+(\._\d*)?</hist_file_ext_regex>
+    <hist_file_ext_regex>\w+\.\w+(\.\w+)?(\._\d*)?</hist_file_ext_regex>
     <rpointer>
       <rpointer_file>rpointer.ocn$NINST_STRING.$DATENAME</rpointer_file>
       <rpointer_content>$CASE.mom6$NINST_STRING.r.$DATENAME.nc</rpointer_content>

<hist_file_ext_regex>\w+\.\w+(\._\d*)?</hist_file_ext_regex>
<rpointer>
<rpointer_file>rpointer.ocn$NINST_STRING.$DATENAME</rpointer_file>
<rpointer_content>$CASE.mom6$NINST_STRING.r.$DATENAME.nc</rpointer_content>
</rpointer>
<test_file_names>
<tfile disposition="copy">rpointer.ocn.1976-01-01-00000</tfile>
<tfile disposition="move">casename.mom6.h.1976-01-01-00000.nc</tfile>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MOM doesn't generate a mom6.h stream, and I suspect a file of this name would not be handled by archiver correctly. The non-BGC daily streams MOM6 generates in a CESM test

mom6.h.native.0001-01-05.nc
mom6.h.rho2.0001-01-05.nc
mom6.h.sfc.0001-01-05.nc
mom6.h.z.0001-01-05.nc

and there are also a few files without date stamps:

mom6.h.ocean_geometry.nc
mom6.h.static.nc

Lastly, it looks like we also compare mom6.ic.0001-01-01.nc though it's unclear how that file gets added to the baseline directory since it won't match hist_file_ext_regex.

@jedwards4b
Copy link
Contributor Author

@mnlevy1981 Feel free to push your changes.

Now streams that match string.string OR string.string.string will get archived,
so we don't need to explicitly list h.bgc.z and h.bgc.native

Also cleaned up the test_file_names section to include a restarts split across
two restart files and the current expected stream files.
@mnlevy1981
Copy link
Collaborator

mnlevy1981 commented Apr 25, 2025

14b181e fits my needs:

  1. both mom6.h.bgc streams show up in the baselines directory
  2. ./case.st_archive --test-all passes as well

However, I was looking at a recent test directory and noticed the IC file is split over multiple files:

$ ls *.mom6.ic.*.nc
SMS.TL319_t232.G1850MARBL_JRA.derecho_intel.GC.20250425_113802_a47vtv.mom6.ic.0001-01-01_1.nc
SMS.TL319_t232.G1850MARBL_JRA.derecho_intel.GC.20250425_113802_a47vtv.mom6.ic.0001-01-01_2.nc
SMS.TL319_t232.G1850MARBL_JRA.derecho_intel.GC.20250425_113802_a47vtv.mom6.ic.0001-01-01.nc

So I tried adding

diff --git a/cime_config/config_archive.xml b/cime_config/config_archive.xml
index c45149e..72a630a 100644
--- a/cime_config/config_archive.xml
+++ b/cime_config/config_archive.xml
@@ -31,6 +31,7 @@
         <tfile disposition="move">casename.mom6.h.static.nc</tfile>
         <tfile disposition="move">casename.mom6.h.z.1976-01-01-00000.nc</tfile>
         <tfile disposition="move">casename.mom6.ic.1976-01-01-00000.nc</tfile>
+        <tfile disposition="move">casename.mom6.ic.1976-01-01-00000_1.nc</tfile>
       </test_file_names>
   </comp_archive_spec>
 </components>

and got an error from ./case.st_archive --test-all:

Checking testfile casename.mom6.ic.1976-01-01-00000_1.nc with disposition move

ERROR: Failed to move file casename.mom6.ic.1976-01-01-00000_1.nc to archive

@jedwards4b
Copy link
Contributor Author

In your test you have 00000_1.nc, the _1 is not allowed. If you think that it should be allowed I suggest you remove this test for now and add an issue and we will address it in the near future then readd the test.

@mnlevy1981
Copy link
Collaborator

It's interesting, because that _1 seems okay in the restart files:

        <tfile disposition="copy">casename.mom6.r.1976-01-01-00000.nc</tfile>
        <tfile disposition="copy">casename.mom6.r.1976-01-01-00000_1.nc</tfile>

But I'll approve this and we can think about the multiple IC files later. It does seem to be an issue: /glade/derecho/scratch/mlevy/archive/g.e30a5e.G1850MARBL_JRA.TL319_t232.test_latest_fms_mom6_stochphysics/ocn/hist has g.e30a5e.G1850MARBL_JRA.TL319_t232.test_latest_fms_mom6_stochphysics.mom6.ic.0001-01-01.nc but /glade/derecho/scratch/mlevy/g.e30a5e.G1850MARBL_JRA.TL319_t232.test_latest_fms_mom6_stochphysics/run has g.e30a5e.G1850MARBL_JRA.TL319_t232.test_latest_fms_mom6_stochphysics.mom6.ic.0001-01-01_1.nc and g.e30a5e.G1850MARBL_JRA.TL319_t232.test_latest_fms_mom6_stochphysics.mom6.ic.0001-01-01_2.nc (they were not archived)

@mnlevy1981
Copy link
Collaborator

mnlevy1981 commented Apr 25, 2025

Testing a possible fix for #254 --

@@ -11,8 +11,8 @@
          h.bgc.*[._optional instance number].nc -->
     <hist_file_extension>h\.bgc\..*?.?[_\d+]+.nc$</hist_file_extension>
     <!-- match filenames of the form
-         ic.(date string)[._optional instance number].nc[.optional tile number] -->
-    <hist_file_extension>ic.[-\d+]+(\._\d*)?\.nc(\.\d*)?$</hist_file_extension>
+         ic.(date string[_optional id for splitting date over many files)[._optional instance number].nc[.optional tile number] -->
+    <hist_file_extension>ic.[-\d(_\d)?+]+(\._\d*)?\.nc(\.\d*)?$</hist_file_extension>
     <hist_file_ext_regex>\w+\.\w+(\.\w+)?(\._\d*)?</hist_file_ext_regex>
     <rpointer>
       <rpointer_file>rpointer.ocn$NINST_STRING.$DATENAME</rpointer_file>
@@ -31,6 +31,7 @@
         <tfile disposition="move">casename.mom6.h.static.nc</tfile>
         <tfile disposition="move">casename.mom6.h.z.1976-01-01-00000.nc</tfile>
         <tfile disposition="move">casename.mom6.ic.1976-01-01-00000.nc</tfile>
+        <tfile disposition="move">casename.mom6.ic.1976-01-01-00000_1.nc</tfile>
       </test_file_names>
   </comp_archive_spec>
 </components>

passes the case.st_archiver --test-all, running an SMS test to see if all three mom6.ic files get copied to the baseline directory

@mnlevy1981
Copy link
Collaborator

@jedwards4b The above fix (also in 004ff1a) appears to archive all mom6.ic files (that's good!) but only copies the last file to the baseline directory (mom6.ic.0001-01-01_2.nc, in my testing). Is there something we can do ensure all mom6.ic files get copied to baselines rather than just one?

@jedwards4b
Copy link
Contributor Author

I'm not sure, We'll need to look into how to do that.

@mnlevy1981
Copy link
Collaborator

I think we have three options:

  1. Merge this as-is, and only compare the last mom6.ic file until we can find a mix
  2. Merge 14b181e and continue to only archive one mom6.ic file until we can find a fix
  3. Keep this PR open until a fix is found

I'm leaning towards (1); for baseline testing I think it's okay to only compare a subset of the tracer initial conditions and it's more important to ensure all IC files are archived properly as we start doing more runs with the MARBL tracers (the only situation I know of where mom6.ic spills into multiple files). @alperaltuntas, what do you think? Can we talk about it on Monday?

@alperaltuntas alperaltuntas self-assigned this Apr 28, 2025
@alperaltuntas alperaltuntas added this to the cesm3_0_beta06 milestone Apr 28, 2025
@alperaltuntas alperaltuntas merged commit 59ed733 into ESCOMP:main Apr 28, 2025
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Not all MARBL streams are archived
3 participants