Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update test coverage and test options #129

Merged
merged 6 commits into from
May 11, 2018

Conversation

apcraig
Copy link
Contributor

@apcraig apcraig commented Apr 27, 2018

Add new setting options and add new tests. Add ability to generate a set of test options using the test_files.$option format.

  • Developer(s): tcraig

  • Are the code changes bit for bit, different at roundoff level, or more substantial? bit-for-bit

  • Is the documentation being updated with this PR? (Y/N) Y

  • Other Relevant Details:

@apcraig apcraig requested a review from eclare108213 April 27, 2018 20:44
@apcraig apcraig self-assigned this Apr 27, 2018
@apcraig
Copy link
Contributor Author

apcraig commented Apr 27, 2018

First look at the changes. I have added 7 coverage tests to the base suite and these are passing on conrad with intel. Are we happy with the test names, alt01, alt02, boxdyn, etc? These name match the spreadsheet. Also, the tests may not cover every option specified in the spreadsheet, but I believe most of the settings are.

These tests should be further reviewed to make sure they are setup correctly and changes should be suggested. I also plan to update the documentation before this PR is merged.

@apcraig
Copy link
Contributor Author

apcraig commented Apr 27, 2018

I added some documentation and will build on readthedocs to make sure it looks ok. when that is ready, I'll add a link.

@apcraig
Copy link
Contributor Author

apcraig commented Apr 27, 2018

Copy link
Contributor

@eclare108213 eclare108213 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • set_nml.fbotxfercdn, set_nml.highfreq need a return at the end of the file
  • set_nml.pondoff is also 'ageoff'
  • calc_tsfc and tr_pond_* were removed from set_nml.swccsm3, but these actually need to be set this way for this kind of shortwave. How is that enforced now?
  • dg_scripts.rst needs more editing but this can be cleaned up later

Overall this looks okay to me. I'd like to run the suites on one of the computers here and look at the output to check for conflicts, reasonable output, etc. @apcraig did you search the diagnostic output for "WARNING"? I suspect that's how issue #130 came about.

@apcraig
Copy link
Contributor Author

apcraig commented May 1, 2018

I did not look thru the log files for WARNING messages. I will make the corrections suggested and then have a look at the log files. In some ways, this is a problem with me setting up these configurations, I really don't know how different namelist interact or even what they do in many cases.

If there are namelist that have to go together, we should think about how to set them. For instance, if with ccsm3 shortwave, calc_tsfc has to be set a certain way, but for other shortwave options, calc_tsfc can be set various ways, what should we do? I would argue we should have an option to set the shortwave and a separate option to set the calc_tsfc and if those are inconsistent, the model would abort (or be "fixed") at runtime. Otherwise, we run into problems where a user sets swccsm3 (with calc_tsfc also set) and then also chooses a calc_tsfc option separately with --sets and depending on the order that they are set in the list, the namelist will be different with the run sometimes working and other times not.

What makes sense to me is that (1) different options are clearly separated if they have any independence (2) options that depend only on each other, can go together. For instance revised_evp is only used when evp is set, so we can have a evp and revp option that also sets the revised_evp flag and we would not have a revised_evp option. (3) the model will abort if options are not consistent. That way the options a user chooses are unlikely to conflict and cause an "order dependency", and the model will only run if reasonable combinations are chosen. This is the best way to guarantee we're testing what we expect to be testing and that users understand what namelist settings they are using. Just my thoughts.

I also think we should write down what namelist settings depend on each other and/or conflict with each other. That way we can implement all the necessary checks in the code and also come up with a cleaner set of test configurations. It would be good for that information to also end up in the documentation.

@apcraig
Copy link
Contributor Author

apcraig commented May 3, 2018

I have updated the tests and am now running a full test suite on conrad. Will post results when they are done.

@apcraig
Copy link
Contributor Author

apcraig commented May 3, 2018

@apcraig
Copy link
Contributor Author

apcraig commented May 3, 2018

There are lots of several warning messages in the cice.run log files. The box tests and alt01 and alt03 basically have

WARNING: runtype, restart, ice_ic are inconsistent:
WARNING: Need ice_ic = .
WARNING: Initializing using ice_ic conditions
WARNING - Zero-layer thermodynamics

alt02 has

WARNING: runtype, restart, ice_ic are inconsistent:
WARNING: Need ice_ic = .
WARNING: Initializing using ice_ic conditions
WARNING: ITD required for ncat > 1
WARNING: Setting kitd and kcatbound to default values
WARNING: tr_pond_cesm=T
WARNING: frzpnd, dpscale not used
WARNING: Must use dEdd shortwave
WARNING: with tr_pond and calc_tsfc=T.
WARNING: Setting shortwave = dEdd

alt04 has

WARNING: runtype, restart, ice_ic are inconsistent:
WARNING: Need ice_ic = .
WARNING: Initializing using ice_ic conditions
WARNING: ktherm = 1 and tfrz_option = mushy
WARNING: For consistency, set tfrz_option = linear_salt

@eclare108213
Copy link
Contributor

Okay, thanks for checking. I'll work through these and suggest changes.

@eclare108213
Copy link
Contributor

Working through the warnings, one set at a time...

pinto_intel_restart_gx3_8x2_diag1_pondcesm.apc/logs/cice.runlog.180504-104149: WARNING: tr_pond_cesm=T
pinto_intel_restart_gx3_8x2_diag1_pondcesm.apc/logs/cice.runlog.180504-104149: WARNING: frzpnd, dpscale not used

namelist combination:

tr_pond_cesm = .true.
dpscale = 1.e-3
frzpnd = 'hlid'

code:

 if (tr_pond_cesm .and. trim(frzpnd) /= 'cesm') then
    if (my_task == master_task) then
       write (nu_diag,*) 'WARNING: tr_pond_cesm=T'
       write (nu_diag,*) 'WARNING: frzpnd, dpscale not used'
    endif
    frzpnd = 'cesm'
 endif

Recommendation:
Remove this warning completely and add a statement to the documentation, noting that dpscale and frzpnd are only used for tr_pond_lvl and their values are ignored for the other pond schemes.

@eclare108213
Copy link
Contributor

eclare108213 commented May 4, 2018

pinto_intel_restart_gx3_4x2_alt03.apc1/logs/cice.runlog.180504-105205: WARNING: runtype, restart, ice_ic are inconsistent:
pinto_intel_restart_gx3_4x2_alt03.apc1/logs/cice.runlog.180504-105205: WARNING: Need ice_ic = .
pinto_intel_restart_gx3_4x2_alt03.apc1/logs/cice.runlog.180504-105205: WARNING: Initializing using ice_ic conditions

namelist combination:

runtype = 'continue'
ice_ic = 'default'
restart = .true.

The problem here is an inconsistency between the namelist settings and the test suite settings.
alt03 has

icdefault ice_ic = 'default'

which changes the default ice_in from

runtype = 'initial'
ice_ic = './restart/iced_gx3_v5.nc'
restart = .true.

to

runtype = 'initial'
ice_ic = 'default'
restart = .true.

which the code would normally accept (it ignores the value of restart in this case).
However, the test suite settings are

restart gx3 4x2 alt03

which makes

runtype = 'continue'

which is inconsistent with the default initial condition (ice_ic).
Recommendation: make alt03 and alt04 smoke tests.
EDIT: changed the recommendation below to set restart = .false.

@eclare108213
Copy link
Contributor

eclare108213 commented May 4, 2018

Recommendation: make alt01 and alt02 smoke tests also.
EDIT: changed the recommendation below to set restart = .false.

@dabail10
Copy link
Contributor

dabail10 commented May 4, 2018

One that I've run into is tests with histfreq changed to 'd','x','x','x','x'. There are no variables on the daily stream, so we need at least one as:

f_aice = 'd'

Dave

@eclare108213
Copy link
Contributor

eclare108213 commented May 4, 2018 via email

@eclare108213
Copy link
Contributor

Continuing with alt03 and alt04...

I see now that the test is doing two runs, first 'initial' then 'continue'. The inconsistency is actually with the value of restart:

WARNING: runtype, restart, ice_ic are inconsistent:
initial T default
WARNING: Need ice_ic = .
WARNING: Initializing using ice_ic conditions

Recommendation:
Set restart = .false. and
add WARNING: to the beginning of the line printing the three values so it shows up with grep.

@eclare108213
Copy link
Contributor

pinto_intel_restart_gx3_4x4_alt04.apc1/logs/cice.runlog.180504-105505: WARNING: ktherm = 1 and tfrz_option = mushy
pinto_intel_restart_gx3_4x4_alt04.apc1/logs/cice.runlog.180504-105505: WARNING: For consistency, set tfrz_option = linear_salt

code:

 if (ktherm == 1 .and. trim(tfrz_option) /= 'linear_salt') then
    if (my_task == master_task) then
    write (nu_diag,*) &
    'WARNING: ktherm = 1 and tfrz_option = ',trim(tfrz_option)
    write (nu_diag,*) &
    'WARNING: For consistency, set tfrz_option = linear_salt'
    endif
 endif

This does not reset anything. It runs this way, and the warning is appropriate, per the physics.
Recommendation: Set tfrz_option = 'linear_salt'.

@eclare108213
Copy link
Contributor

pinto_intel_restart_gx3_6x2_alt01.apc1/logs/cice.runlog.180504-104635: WARNING - Zero-layer thermodynamics

code:

    if (.not.heat_capacity) then
       write (nu_diag,*) 'WARNING - Zero-layer thermodynamics'

The intent here is "Are you sure you really want to do this?"
We can remove this warning completely (it's in cicedynB/general/ice_init.F90).

@eclare108213
Copy link
Contributor

pinto_intel_restart_gx3_8x2_alt02.apc1/logs/cice.runlog.180504-104935: WARNING: ITD required for ncat > 1
pinto_intel_restart_gx3_8x2_alt02.apc1/logs/cice.runlog.180504-104935: WARNING: Setting kitd and kcatbound to default values

code:

 if (ncat /= 1 .and. kcatbound == -1) then
    if (my_task == master_task) then
       write (nu_diag,*) &
          'WARNING: ITD required for ncat > 1'
       write (nu_diag,*) &
          'WARNING: Setting kitd and kcatbound to default values'
    endif
    kitd = 1
    kcatbound = 0
 endif

If this is the only test with kcatbound=-1, then we need to set ncat=1. Otherwise set kcatbound to a different value.

@eclare108213
Copy link
Contributor

pinto_intel_restart_gx3_8x2_alt02.apc1/logs/cice.runlog.180504-104935: WARNING: Must use dEdd shortwave
pinto_intel_restart_gx3_8x2_alt02.apc1/logs/cice.runlog.180504-104935: WARNING: with tr_pond and calc_tsfc=T.
pinto_intel_restart_gx3_8x2_alt02.apc1/logs/cice.runlog.180504-104935: WARNING: Setting shortwave = dEdd

namelist combination:

tr_pond_cesm = .true.
shortwave = 'ccsm3'
calc_Tsfc = .true.

code:

 if (trim(shortwave) /= 'dEdd' .and. tr_pond .and. calc_tsfc) then
    if (my_task == master_task) then
       write (nu_diag,*) 'WARNING: Must use dEdd shortwave'
       write (nu_diag,*) 'WARNING: with tr_pond and calc_tsfc=T.'
       write (nu_diag,*) 'WARNING: Setting shortwave = dEdd'
    endif
    shortwave = 'dEdd'
 endif

Recommendation: Set tr_pond_cesm = .false. and make sure that option is tested somewhere else (with shortwave = 'dEdd').

@apcraig apcraig mentioned this pull request May 8, 2018
@apcraig
Copy link
Contributor Author

apcraig commented May 8, 2018

I will update these with some reference to discussion #130.

@eclare108213, just a clarification on the restart with ice_ic='none' or 'default'. If that is the case, we want restart=.false. on the initial run? Then we want restart=.true. on the continue run?

And if I understand the logic, that means when runtype=initial, the ice_ic is used and the restart flag is ignored. When ice_ic=continue, the ice_ic is ignored and the restart file is defined in pointer_file=ice.restart_file. And in all cases, the dumpfreq defines whether restarts are written and that is independent of the restart and runtype. Do I understand correctly? I don't think I do. Can you just summarize how runtype, ice_ic, and restart interact with each other and what happens in various settings.

@eclare108213
Copy link
Contributor

There is a table in the documentation that mapped out the various combinations of parameters for initializing and/or continuing a run. If ice_ic = 'none' or 'default' then we do want restart=F. For continuation runs restart=T. We can also use restart=T for a run starting from a restart file from some other run, when we aren't continuing the previous run (e.g. the data could be different). In that case, runtype='initial' and ice_ic is set to the restart filename and restart=T. That's actually my standard starting configuration for CICE, using the default restart file that's available with the other input data. pointer_file is used for continuation runs. dumpfreq is about writing restart files, so yes it's independent of restart and runtype values. I hope I'm remember all of this correctly...

@apcraig
Copy link
Contributor Author

apcraig commented May 8, 2018

Thanks Elizabeth, that helps. I will try to fix the testing.

@apcraig
Copy link
Contributor Author

apcraig commented May 9, 2018

@apcraig
Copy link
Contributor Author

apcraig commented May 10, 2018

My latest update has alt01-05 passing exact restart with no namelist input ERRORS. There are a couple warning messages (reduce no of blocks) and then for alt04,

WARNING: ktherm = 1 and tfrz_option = mushy
WARNING: For consistency, set tfrz_option = linear_salt

The box smoke tests are all working, I will try changing these to restart tests. The gx1prod is failing with

Current forcing data year = 1978
forrtl: severe (36): attempt to access non-existent record, unit 13, file /p/work1/RASM_data/cice_consortium/CICE_data/forcing/gx1/COREII/4XDAILY/t_10.1977.dat

I think this is progress overall. We still need a working advection=upwind case. I will run a full test suite now on conrad.

@eclare108213
Copy link
Contributor

eclare108213 commented May 10, 2018

Thanks @apcraig
Try setting tfrz_option = 'linear_salt' for alt04. I don't see that in the spreadsheet, probably because I considered it an Icepack test and deleted it from the list, although it should still be available in the CICE namelist file.
We provide only 5 years of gx1 data, 2005-2009, so the gx1prod configuration will not work as it is. The "prod" is for production, which means running with more years of data. I'd like to have that configuration available as an option, even if we never test it as part of the standard suites, so that when I want to run it I don't have to figure out how to set up the forcing namelist values... :)

CORRECTION: tfrz_option is in the spreadsheet, under forcing_nml. I changed the value for alt04.

@apcraig
Copy link
Contributor Author

apcraig commented May 10, 2018

Test results look fine. The decomp tests are failing because the frz_onset field is different, but all other fields are the same. It wasn't doing that before. Any idea why that is happening?

I think we could consider merging this PR to make some progress. There is still more to do.

@apcraig
Copy link
Contributor Author

apcraig commented May 11, 2018

I've looked at the frz_onset non-bit-for-bit with different decomps and can't understand it. It looks like frz_onset is just a diagnostics. The runs are bit-for-bit physics and looking closer, only one gridcell is different in the frz_onset field in two different runs. I think we should add an issue and defer for now.

@apcraig
Copy link
Contributor Author

apcraig commented May 11, 2018

I think this is ready for merge. I will run a full suite with 4 compilers on conrad tonight.

@eclare108213
Copy link
Contributor

If the conrad tests are successful I'll go ahead and merge this.
Is the grid cell in which frz_onset is different located along the coast? I'm wondering if this is related to the land mask / forcing discrepancies that @dabail10 pointed out. Is that grid cell in a halo for either the working test or the decomp test that is failing?

@apcraig
Copy link
Contributor Author

apcraig commented May 11, 2018

Test results from conrad are here,

https://github.com/CICE-Consortium/Test-Results/wiki/cice_by_hash_forks

hash 09df2f6

There are 3 outstanding issues

  • frz_onset is not bfb with different decompositions
  • the boxrestore case does not restart exactly
  • conrad_intel_restart_gx1_40x4_debug_droundrobin is aborting in a global reduction
    I will create issues for the first two and have a quick look at the 3rd issue.

I think the problem grid cell with frz_onset is near land and is on the edge of the block for some of the decompositions (but not all). We need to debug further. I don't have a good sense of what might be causing this. I did have a quick look at the implementation and nothing jumps out at me other than the field seems to be uninitialized, but when I fixed that, it did not change answers.

I recommend we merge this at this point. We also still need an upwind test case.

@eclare108213
Copy link
Contributor

Cross-referencing associated issues:
#137 frz_onset not BFB
#138 box restart fails

@eclare108213 eclare108213 merged commit 3c30e23 into CICE-Consortium:master May 11, 2018
@apcraig apcraig deleted the tests branch August 17, 2022 20:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants