Skip to content

Modified config_archive.xml to archive CAM+DART files. #1302

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: cam_development
Choose a base branch
from

Conversation

kdraeder
Copy link

This would be a fix for issue #1301
I'll provide background or details as needed.

Comment on lines -7 to -8
<hist_file_extension>h\d*.*\.nc$</hist_file_extension>
<hist_file_extension>i\..*\.nc$</hist_file_extension>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering about the implications of removing this line: <hist_file_extension>i\..*\.nc$</hist_file_extension> on other CAM jobs. I will admit that I am by no means an expert on this file and archiving in general, but it appears that we'd no longer be archiving intantaneous history files? Should this line perhaps remain?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That line was replaced by a <rest_file_extension>i\..* line.
This line handles the CAM "initial file", which is the only '.i.' file that I know of.
There may be '.i[a-z0-9]+.' files (someday), but they would be handled differently.
But I would like to hear the opinion of CAM developers about this.
DART needs to have the .i. files archived with the restarts because that's the file
containing the model state which DART needs to use to start CAM at the beginning of each DA cycle.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to be sure we are talking about the same thing.
instantaneous files are the .h0i., .h1i., ... which is different from the initial file .i..

Where will these files go?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that the .h[0-9]i. files will still be handled by the xlm line:
<hist_file_extension>h\d*.*\.nc(\.gz)?$</hist_file_extension>
which I parse as "h+any_digit+any_number_of_any_characters".

The '.i.' file is currently handled by the line:
<hist_file_extension>i\.\d.*\.nc(\.gz)?$</hist_file_extension>
I believe that the parsing code adds '.' before the regex listed in these .xml files,
so this line only applies to '.i.' and not to the .h[0-9]i. files.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto @cecilehannay . We need to know if this change moves the location of history output and breaks everyone's analysis scripts. Fixable, yes, but it would be good to know in advance.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my test the .h0i. files ended up in archive/atm/hist, archive/rest/$date, and in $rundir.
I only requested 1 CAM history file.
The .i. files ended up in archive/rest/$date and $rundir.
That's what I intended, but someone else should test it too, especially if I failed to create a CAM file type.
The archive directory is /glade/derecho/scratch/raeder/St_BHISTC_LTso-SE_st-arch/archive.
Thanks for looking into this!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cecilehannay - Would you be able to checkout @kdraeder's branch and make a quick run similar to ones you are currently making and make sure all that all the history files end up where they should?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cecilehannay - I have two CAM checkouts for you to test:

  • /glade/derecho/scratch/cacraig/cam6_4_089 - A straight checkout of cam6_4_089
  • /glade/derecho/scratch/cacraig/cam6_4_089_raeder - A checkout of cam6_4_089 with the DART changes

Please let us know what your runs indicate for both the history and the initial condition files (and any other files for that matter if they change)

@cecilehannay
Copy link
Collaborator

I did a few runs to test the new code and I see an issue with the code when more frequent initial conditions are required (for instance I use: inithist = 'MONTHLY')

First I did a run with the default inithist, and everything seems fine:

  • In: /glade/derecho/scratch/hannay/archive/f.e30_cam6_4_089.FHISTC_LTso.ne30.baseline156.001
    the *.i.* are both rest in atm/hist.
  • while in:
    /glade/derecho/scratch/hannay/archive/f.e30_cam6_4_089_raeder.FHISTC_LTso.ne30.baseline156.001
    the *.i.* are only rest. I am fine with this

But if I use inithist = 'MONTHLY'

  • In: /glade/derecho/scratch/hannay/archive/f.e30_cam6_4_089.FHISTC_LTso.ne30.baseline156.002
    the extra *.i.* are in atm/hist.
  • while in:
    /glade/derecho/scratch/hannay/archive/f.e30_cam6_4_089_raeder.FHISTC_LTso.ne30.baseline156.002
    the extra *.i.* are not moved to the archive directory in they stay in the run directory. This is an issue.

@cacraigucar
Copy link
Collaborator

I did a few runs to test the new code and I see an issue with the code when more frequent initial conditions are required (for instance I use: inithist = 'MONTHLY')

First I did a run with the default inithist, and everything seems fine:

  • In: /glade/derecho/scratch/hannay/archive/f.e30_cam6_4_089.FHISTC_LTso.ne30.baseline156.001
    the *.i.* are both rest in atm/hist.
  • while in:
    /glade/derecho/scratch/hannay/archive/f.e30_cam6_4_089_raeder.FHISTC_LTso.ne30.baseline156.001
    the *.i.* are only rest. I am fine with this

But if I use inithist = 'MONTHLY'

  • In: /glade/derecho/scratch/hannay/archive/f.e30_cam6_4_089.FHISTC_LTso.ne30.baseline156.002
    the extra *.i.* are in atm/hist.
  • while in:
    /glade/derecho/scratch/hannay/archive/f.e30_cam6_4_089_raeder.FHISTC_LTso.ne30.baseline156.002
    the extra *.i.* are not moved to the archive directory in they stay in the run directory. This is an issue.

@kdraeder - Are you going to supply a fix for this? If not, do you no longer need this PR to go into CAM?

@kdraeder
Copy link
Author

@cacraigucar I was just working on this!

@cecilehannay Thanks for running those tests.

In the first pair (inithist = default)

  • the last .i. file is still in $rundir and in rest/1996-01-01-00000, which is good for DART.
  • The default code also copies it to atm/hist, while the raeder code does not.
  • In the default code the intermediate file (1995-01-01) is not in rundir, but is in the rest/1995-01-01-00000 directory. It is also in atm/hist. So it's treated as a restart and a history file.

In the last test (inithist = MONTHLY, raeder's archive mods, run for 4 years instead of 2)

  • the .i. files from the restart time (the last date of the run) are copied to the rest archive, and are left in $rundir. This is the functionality that DART needs.
  • As Cecile points out, the intermediate .i. files are not copied or moved.

How are the intermediate .i. files used?
If it's only to start runs, then it seems that atm/hist is not the right place for them.
If they're also treated like history files to be statistically analyzed like .h#. files, then the best place to store them is ambiguous, and there's a conflict with DART's use of them.
When inithist is not MONTHLY, then they are archived in 2 places, which satisfies all the hoped-for uses.
So the question seems to be "Can we make MONTHLY (and shorter) behave the same way?"
If someone can point me to the code where this is handled, it would speed up my attempt to find the way.

@cacraigucar
Copy link
Collaborator

@cacraigucar I was just working on this!

@cecilehannay Thanks for running those tests.

In the first pair (inithist = default)

  • the last .i. file is still in $rundir and in rest/1996-01-01-00000, which is good for DART.
  • The default code also copies it to atm/hist, while the raeder code does not.
  • In the default code the intermediate file (1995-01-01) is not in rundir, but is in the rest/1995-01-01-00000 directory. It is also in atm/hist. So it's treated as a restart and a history file.

In the last test (inithist = MONTHLY, raeder's archive mods, run for 4 years instead of 2)

  • the .i. files from the restart time (the last date of the run) are copied to the rest archive, and are left in $rundir. This is the functionality that DART needs.
  • As Cecile points out, the intermediate .i. files are not copied or moved.

How are the intermediate .i. files used? If it's only to start runs, then it seems that atm/hist is not the right place for them. If they're also treated like history files to be statistically analyzed like .h#. files, then the best place to store them is ambiguous, and there's a conflict with DART's use of them. When inithist is not MONTHLY, then they are archived in 2 places, which satisfies all the hoped-for uses. So the question seems to be "Can we make MONTHLY (and shorter) behave the same way?" If someone can point me to the code where this is handled, it would speed up my attempt to find the way.

@brian-eaton and/or @jedwards4b - Can one of you direct @kdraeder to the code location where he might need to make his mods?

@jedwards4b
Copy link

Have you tried adding .i files to both restart and history? That is add back:
<hist_file_extension>i..*.nc$</hist_file_extension>

@kdraeder
Copy link
Author

My test using @jedwards4b suggestion results in copies of the intermediate .i. files in archive/atm/hist
and final files in $rundir, archive/atm/hist, and archive/rest.
So there's redundancy in the final files, but they aren't frequent or huge, so this looks like a reasonable and easy solution.

I actually tested
<hist_file_extension>i\..*\.nc(\.gz)?$</hist_file_extension>
to include the need for DART to handle compressed files and to make the pattern more restricted
to '.i.' files. If that's not misguided, I'll include this form in a PR.

@jedwards4b
Copy link

@kdraeder sounds good, thank you.

@kdraeder
Copy link
Author

If that's not misguided, I'll include this form in a PR.

I should have written "I'll commit this change and push it for review"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

5 participants