update test coverage and test options #129

apcraig · 2018-04-27T20:43:47Z

Add new setting options and add new tests. Add ability to generate a set of test options using the test_files.$option format.

Developer(s): tcraig
Are the code changes bit for bit, different at roundoff level, or more substantial? bit-for-bit
Is the documentation being updated with this PR? (Y/N) Y
Other Relevant Details:

apcraig · 2018-04-27T20:47:40Z

First look at the changes. I have added 7 coverage tests to the base suite and these are passing on conrad with intel. Are we happy with the test names, alt01, alt02, boxdyn, etc? These name match the spreadsheet. Also, the tests may not cover every option specified in the spreadsheet, but I believe most of the settings are.

These tests should be further reviewed to make sure they are setup correctly and changes should be suggested. I also plan to update the documentation before this PR is merged.

apcraig · 2018-04-27T21:05:05Z

I added some documentation and will build on readthedocs to make sure it looks ok. when that is ready, I'll add a link.

apcraig · 2018-04-27T21:19:08Z

http://apcraig-cice.readthedocs.io/en/tests/developer_guide/dg_scripts.html#test-options

eclare108213

set_nml.fbotxfercdn, set_nml.highfreq need a return at the end of the file
set_nml.pondoff is also 'ageoff'
calc_tsfc and tr_pond_* were removed from set_nml.swccsm3, but these actually need to be set this way for this kind of shortwave. How is that enforced now?
dg_scripts.rst needs more editing but this can be cleaned up later

Overall this looks okay to me. I'd like to run the suites on one of the computers here and look at the output to check for conflicts, reasonable output, etc. @apcraig did you search the diagnostic output for "WARNING"? I suspect that's how issue #130 came about.

apcraig · 2018-05-01T22:00:29Z

I did not look thru the log files for WARNING messages. I will make the corrections suggested and then have a look at the log files. In some ways, this is a problem with me setting up these configurations, I really don't know how different namelist interact or even what they do in many cases.

If there are namelist that have to go together, we should think about how to set them. For instance, if with ccsm3 shortwave, calc_tsfc has to be set a certain way, but for other shortwave options, calc_tsfc can be set various ways, what should we do? I would argue we should have an option to set the shortwave and a separate option to set the calc_tsfc and if those are inconsistent, the model would abort (or be "fixed") at runtime. Otherwise, we run into problems where a user sets swccsm3 (with calc_tsfc also set) and then also chooses a calc_tsfc option separately with --sets and depending on the order that they are set in the list, the namelist will be different with the run sometimes working and other times not.

What makes sense to me is that (1) different options are clearly separated if they have any independence (2) options that depend only on each other, can go together. For instance revised_evp is only used when evp is set, so we can have a evp and revp option that also sets the revised_evp flag and we would not have a revised_evp option. (3) the model will abort if options are not consistent. That way the options a user chooses are unlikely to conflict and cause an "order dependency", and the model will only run if reasonable combinations are chosen. This is the best way to guarantee we're testing what we expect to be testing and that users understand what namelist settings they are using. Just my thoughts.

I also think we should write down what namelist settings depend on each other and/or conflict with each other. That way we can implement all the necessary checks in the code and also come up with a cleaner set of test configurations. It would be good for that information to also end up in the documentation.

apcraig · 2018-05-03T01:59:56Z

I have updated the tests and am now running a full test suite on conrad. Will post results when they are done.

apcraig · 2018-05-03T15:32:09Z

Test results are here under 6065f5c,
https://github.com/CICE-Consortium/Test-Results/wiki/cice_by_hash_forks

apcraig · 2018-05-03T15:56:40Z

There are lots of several warning messages in the cice.run log files. The box tests and alt01 and alt03 basically have

WARNING: runtype, restart, ice_ic are inconsistent:
WARNING: Need ice_ic = .
WARNING: Initializing using ice_ic conditions
WARNING - Zero-layer thermodynamics

alt02 has

WARNING: runtype, restart, ice_ic are inconsistent:
WARNING: Need ice_ic = .
WARNING: Initializing using ice_ic conditions
WARNING: ITD required for ncat > 1
WARNING: Setting kitd and kcatbound to default values
WARNING: tr_pond_cesm=T
WARNING: frzpnd, dpscale not used
WARNING: Must use dEdd shortwave
WARNING: with tr_pond and calc_tsfc=T.
WARNING: Setting shortwave = dEdd

alt04 has

WARNING: runtype, restart, ice_ic are inconsistent:
WARNING: Need ice_ic = .
WARNING: Initializing using ice_ic conditions
WARNING: ktherm = 1 and tfrz_option = mushy
WARNING: For consistency, set tfrz_option = linear_salt

eclare108213 · 2018-05-03T16:46:11Z

Okay, thanks for checking. I'll work through these and suggest changes.

eclare108213 · 2018-05-04T18:33:24Z

Working through the warnings, one set at a time...

pinto_intel_restart_gx3_8x2_diag1_pondcesm.apc/logs/cice.runlog.180504-104149: WARNING: tr_pond_cesm=T
pinto_intel_restart_gx3_8x2_diag1_pondcesm.apc/logs/cice.runlog.180504-104149: WARNING: frzpnd, dpscale not used

namelist combination:

tr_pond_cesm = .true.
dpscale = 1.e-3
frzpnd = 'hlid'

code:

 if (tr_pond_cesm .and. trim(frzpnd) /= 'cesm') then
    if (my_task == master_task) then
       write (nu_diag,*) 'WARNING: tr_pond_cesm=T'
       write (nu_diag,*) 'WARNING: frzpnd, dpscale not used'
    endif
    frzpnd = 'cesm'
 endif

Recommendation:
Remove this warning completely and add a statement to the documentation, noting that dpscale and frzpnd are only used for tr_pond_lvl and their values are ignored for the other pond schemes.

eclare108213 · 2018-05-04T18:55:00Z

pinto_intel_restart_gx3_4x2_alt03.apc1/logs/cice.runlog.180504-105205: WARNING: runtype, restart, ice_ic are inconsistent:
pinto_intel_restart_gx3_4x2_alt03.apc1/logs/cice.runlog.180504-105205: WARNING: Need ice_ic = .
pinto_intel_restart_gx3_4x2_alt03.apc1/logs/cice.runlog.180504-105205: WARNING: Initializing using ice_ic conditions

namelist combination:

runtype = 'continue'
ice_ic = 'default'
restart = .true.

The problem here is an inconsistency between the namelist settings and the test suite settings.
alt03 has

icdefault ice_ic = 'default'

which changes the default ice_in from

runtype = 'initial'
ice_ic = './restart/iced_gx3_v5.nc'
restart = .true.

to

runtype = 'initial'
ice_ic = 'default'
restart = .true.

which the code would normally accept (it ignores the value of restart in this case).
However, the test suite settings are

restart gx3 4x2 alt03

which makes

runtype = 'continue'

which is inconsistent with the default initial condition (ice_ic).
Recommendation: make alt03 and alt04 smoke tests.
EDIT: changed the recommendation below to set restart = .false.

eclare108213 · 2018-05-04T18:57:10Z

Recommendation: make alt01 and alt02 smoke tests also.
EDIT: changed the recommendation below to set restart = .false.

dabail10 · 2018-05-04T19:02:20Z

One that I've run into is tests with histfreq changed to 'd','x','x','x','x'. There are no variables on the daily stream, so we need at least one as:

f_aice = 'd'

Dave

eclare108213 · 2018-05-04T19:07:36Z

Actually, I think the idea was that alt01-alt04 should be restart tests, since this is the only opportunity to check restarts for those options and doing this would cover the smoke test. You can try to do that by removing the icdefault option. I’m not sure whether the other options would still be consistent. There are cases where you cannot restart from an existing restart file (e.g. if the number of layers or categories is different), but if you're doing both runs together (the initial one and then the restart run) it should work. EDIT: changed the recommendation below to set restart = .false.

eclare108213 · 2018-05-04T19:19:19Z

Continuing with alt03 and alt04...

I see now that the test is doing two runs, first 'initial' then 'continue'. The inconsistency is actually with the value of restart:

WARNING: runtype, restart, ice_ic are inconsistent:
initial T default
WARNING: Need ice_ic = .
WARNING: Initializing using ice_ic conditions

Recommendation:
Set restart = .false. and
add WARNING: to the beginning of the line printing the three values so it shows up with grep.

eclare108213 · 2018-05-04T19:25:38Z

pinto_intel_restart_gx3_4x4_alt04.apc1/logs/cice.runlog.180504-105505: WARNING: ktherm = 1 and tfrz_option = mushy
pinto_intel_restart_gx3_4x4_alt04.apc1/logs/cice.runlog.180504-105505: WARNING: For consistency, set tfrz_option = linear_salt

code:

 if (ktherm == 1 .and. trim(tfrz_option) /= 'linear_salt') then
    if (my_task == master_task) then
    write (nu_diag,*) &
    'WARNING: ktherm = 1 and tfrz_option = ',trim(tfrz_option)
    write (nu_diag,*) &
    'WARNING: For consistency, set tfrz_option = linear_salt'
    endif
 endif

This does not reset anything. It runs this way, and the warning is appropriate, per the physics.
Recommendation: Set tfrz_option = 'linear_salt'.

eclare108213 · 2018-05-04T19:29:23Z

pinto_intel_restart_gx3_6x2_alt01.apc1/logs/cice.runlog.180504-104635: WARNING - Zero-layer thermodynamics

code:

    if (.not.heat_capacity) then
       write (nu_diag,*) 'WARNING - Zero-layer thermodynamics'

The intent here is "Are you sure you really want to do this?"
We can remove this warning completely (it's in cicedynB/general/ice_init.F90).

eclare108213 · 2018-05-04T19:32:59Z

pinto_intel_restart_gx3_8x2_alt02.apc1/logs/cice.runlog.180504-104935: WARNING: ITD required for ncat > 1
pinto_intel_restart_gx3_8x2_alt02.apc1/logs/cice.runlog.180504-104935: WARNING: Setting kitd and kcatbound to default values

code:

 if (ncat /= 1 .and. kcatbound == -1) then
    if (my_task == master_task) then
       write (nu_diag,*) &
          'WARNING: ITD required for ncat > 1'
       write (nu_diag,*) &
          'WARNING: Setting kitd and kcatbound to default values'
    endif
    kitd = 1
    kcatbound = 0
 endif

If this is the only test with kcatbound=-1, then we need to set ncat=1. Otherwise set kcatbound to a different value.

eclare108213 · 2018-05-04T19:40:06Z

pinto_intel_restart_gx3_8x2_alt02.apc1/logs/cice.runlog.180504-104935: WARNING: Must use dEdd shortwave
pinto_intel_restart_gx3_8x2_alt02.apc1/logs/cice.runlog.180504-104935: WARNING: with tr_pond and calc_tsfc=T.
pinto_intel_restart_gx3_8x2_alt02.apc1/logs/cice.runlog.180504-104935: WARNING: Setting shortwave = dEdd

namelist combination:

tr_pond_cesm = .true.
shortwave = 'ccsm3'
calc_Tsfc = .true.

code:

 if (trim(shortwave) /= 'dEdd' .and. tr_pond .and. calc_tsfc) then
    if (my_task == master_task) then
       write (nu_diag,*) 'WARNING: Must use dEdd shortwave'
       write (nu_diag,*) 'WARNING: with tr_pond and calc_tsfc=T.'
       write (nu_diag,*) 'WARNING: Setting shortwave = dEdd'
    endif
    shortwave = 'dEdd'
 endif

Recommendation: Set tr_pond_cesm = .false. and make sure that option is tested somewhere else (with shortwave = 'dEdd').

apcraig · 2018-05-08T21:18:53Z

I will update these with some reference to discussion #130.

@eclare108213, just a clarification on the restart with ice_ic='none' or 'default'. If that is the case, we want restart=.false. on the initial run? Then we want restart=.true. on the continue run?

And if I understand the logic, that means when runtype=initial, the ice_ic is used and the restart flag is ignored. When ice_ic=continue, the ice_ic is ignored and the restart file is defined in pointer_file=ice.restart_file. And in all cases, the dumpfreq defines whether restarts are written and that is independent of the restart and runtype. Do I understand correctly? I don't think I do. Can you just summarize how runtype, ice_ic, and restart interact with each other and what happens in various settings.

eclare108213 · 2018-05-08T21:26:43Z

There is a table in the documentation that mapped out the various combinations of parameters for initializing and/or continuing a run. If ice_ic = 'none' or 'default' then we do want restart=F. For continuation runs restart=T. We can also use restart=T for a run starting from a restart file from some other run, when we aren't continuing the previous run (e.g. the data could be different). In that case, runtype='initial' and ice_ic is set to the restart filename and restart=T. That's actually my standard starting configuration for CICE, using the default restart file that's available with the other input data. pointer_file is used for continuation runs. dumpfreq is about writing restart files, so yes it's independent of restart and runtype values. I hope I'm remember all of this correctly...

apcraig · 2018-05-08T21:38:17Z

Thanks Elizabeth, that helps. I will try to fix the testing.

apcraig · 2018-05-09T17:09:10Z

I found the table, http://cice-consortium-cice.readthedocs.io/en/master/user_guide/ug_implementation.html#initialization-and-coupling.
We might need to update this a bit.

apcraig · 2018-05-10T16:28:09Z

My latest update has alt01-05 passing exact restart with no namelist input ERRORS. There are a couple warning messages (reduce no of blocks) and then for alt04,

WARNING: ktherm = 1 and tfrz_option = mushy
WARNING: For consistency, set tfrz_option = linear_salt

The box smoke tests are all working, I will try changing these to restart tests. The gx1prod is failing with

Current forcing data year = 1978
forrtl: severe (36): attempt to access non-existent record, unit 13, file /p/work1/RASM_data/cice_consortium/CICE_data/forcing/gx1/COREII/4XDAILY/t_10.1977.dat

I think this is progress overall. We still need a working advection=upwind case. I will run a full test suite now on conrad.

eclare108213 · 2018-05-10T16:41:02Z

Thanks @apcraig
Try setting tfrz_option = 'linear_salt' for alt04. I don't see that in the spreadsheet, probably because I considered it an Icepack test and deleted it from the list, although it should still be available in the CICE namelist file.
We provide only 5 years of gx1 data, 2005-2009, so the gx1prod configuration will not work as it is. The "prod" is for production, which means running with more years of data. I'd like to have that configuration available as an option, even if we never test it as part of the standard suites, so that when I want to run it I don't have to figure out how to set up the forcing namelist values... :)

CORRECTION: tfrz_option is in the spreadsheet, under forcing_nml. I changed the value for alt04.

apcraig · 2018-05-10T22:11:24Z

Test results look fine. The decomp tests are failing because the frz_onset field is different, but all other fields are the same. It wasn't doing that before. Any idea why that is happening?

I think we could consider merging this PR to make some progress. There is still more to do.

apcraig · 2018-05-11T02:43:49Z

I've looked at the frz_onset non-bit-for-bit with different decomps and can't understand it. It looks like frz_onset is just a diagnostics. The runs are bit-for-bit physics and looking closer, only one gridcell is different in the frz_onset field in two different runs. I think we should add an issue and defer for now.

apcraig · 2018-05-11T02:44:33Z

I think this is ready for merge. I will run a full suite with 4 compilers on conrad tonight.

eclare108213 · 2018-05-11T15:15:58Z

If the conrad tests are successful I'll go ahead and merge this.
Is the grid cell in which frz_onset is different located along the coast? I'm wondering if this is related to the land mask / forcing discrepancies that @dabail10 pointed out. Is that grid cell in a halo for either the working test or the decomp test that is failing?

apcraig · 2018-05-11T16:26:13Z

Test results from conrad are here,

https://github.com/CICE-Consortium/Test-Results/wiki/cice_by_hash_forks

hash 09df2f6

There are 3 outstanding issues

frz_onset is not bfb with different decompositions
the boxrestore case does not restart exactly
conrad_intel_restart_gx1_40x4_debug_droundrobin is aborting in a global reduction
I will create issues for the first two and have a quick look at the 3rd issue.

I think the problem grid cell with frz_onset is near land and is on the edge of the block for some of the decompositions (but not all). We need to debug further. I don't have a good sense of what might be causing this. I did have a quick look at the implementation and nothing jumps out at me other than the field seems to be uninitialized, but when I fixed that, it did not change answers.

I recommend we merge this at this point. We also still need an upwind test case.

eclare108213 · 2018-05-11T17:20:38Z

Cross-referencing associated issues:
#137 frz_onset not BFB
#138 box restart fails

update test coverage and test options

6c9e57d

apcraig requested a review from eclare108213 April 27, 2018 20:44

apcraig self-assigned this Apr 27, 2018

update documentation and fix boxadv settings

6065f5c

eclare108213 reviewed May 1, 2018

View reviewed changes

update tests based on review

b7092b8

update decomp test maxblocks

f5f261b

apcraig mentioned this pull request May 8, 2018

Namelist conflict aborts #130

Closed

update test coverage and namelist error/warning messages

ad50933

update test suites

09df2f6

eclare108213 merged commit 3c30e23 into CICE-Consortium:master May 11, 2018

apcraig deleted the tests branch August 17, 2022 20:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

update test coverage and test options #129

update test coverage and test options #129

apcraig commented Apr 27, 2018

apcraig commented Apr 27, 2018

apcraig commented Apr 27, 2018

apcraig commented Apr 27, 2018

eclare108213 left a comment

apcraig commented May 1, 2018

apcraig commented May 3, 2018

apcraig commented May 3, 2018

apcraig commented May 3, 2018

eclare108213 commented May 3, 2018

eclare108213 commented May 4, 2018

eclare108213 commented May 4, 2018 •

edited

Loading

eclare108213 commented May 4, 2018 •

edited

Loading

dabail10 commented May 4, 2018

eclare108213 commented May 4, 2018 via email •

edited

Loading

eclare108213 commented May 4, 2018

eclare108213 commented May 4, 2018

eclare108213 commented May 4, 2018

eclare108213 commented May 4, 2018

eclare108213 commented May 4, 2018

apcraig commented May 8, 2018

eclare108213 commented May 8, 2018

apcraig commented May 8, 2018

apcraig commented May 9, 2018

apcraig commented May 10, 2018

eclare108213 commented May 10, 2018 •

edited

Loading

apcraig commented May 10, 2018

apcraig commented May 11, 2018

apcraig commented May 11, 2018

eclare108213 commented May 11, 2018

apcraig commented May 11, 2018

eclare108213 commented May 11, 2018

update test coverage and test options #129

update test coverage and test options #129

Conversation

apcraig commented Apr 27, 2018

apcraig commented Apr 27, 2018

apcraig commented Apr 27, 2018

apcraig commented Apr 27, 2018

eclare108213 left a comment

Choose a reason for hiding this comment

apcraig commented May 1, 2018

apcraig commented May 3, 2018

apcraig commented May 3, 2018

apcraig commented May 3, 2018

eclare108213 commented May 3, 2018

eclare108213 commented May 4, 2018

eclare108213 commented May 4, 2018 • edited Loading

eclare108213 commented May 4, 2018 • edited Loading

dabail10 commented May 4, 2018

eclare108213 commented May 4, 2018 via email • edited Loading

eclare108213 commented May 4, 2018

eclare108213 commented May 4, 2018

eclare108213 commented May 4, 2018

eclare108213 commented May 4, 2018

eclare108213 commented May 4, 2018

apcraig commented May 8, 2018

eclare108213 commented May 8, 2018

apcraig commented May 8, 2018

apcraig commented May 9, 2018

apcraig commented May 10, 2018

eclare108213 commented May 10, 2018 • edited Loading

apcraig commented May 10, 2018

apcraig commented May 11, 2018

apcraig commented May 11, 2018

eclare108213 commented May 11, 2018

apcraig commented May 11, 2018

eclare108213 commented May 11, 2018

eclare108213 commented May 4, 2018 •

edited

Loading

eclare108213 commented May 4, 2018 •

edited

Loading

eclare108213 commented May 4, 2018 via email •

edited

Loading

eclare108213 commented May 10, 2018 •

edited

Loading