-
Notifications
You must be signed in to change notification settings - Fork 132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Nml settings #208
Nml settings #208
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not obvious to me why the one case is failing. roundrobin seems to be rather touchy.
Why reorder the forcing_nml namelist? Does this match something else better? These were ordered roughly (imperfectly) with general items first, then atmo, then ocean, then ice restoring. I recommend keeping the order the same as in Icepack, to the extent possible.
I think you can simply replace sil_data_type etc with bgc_data_type and then simplify the logic in the code (remove duplicates). This removes the ability to mix and match bgc data sets, but the tests that we have now don't mix and match... The one section of the code that wouldn't work would be where nit_data_type is allowed to be 'sss'. We could add logic so that the other blocks are activated if bgc_data_type is 'clim' or 'ISPOL' or 'sss', or we could abandon that (sss) functionality in the interest of simplicity. @njeffery ? HOWEVER since @njeffery has a couple of PRs waiting, let's wait on this change, to avoid getting even more conflicts in the others.
I also didn't know why just the one test is failing. I'm hoping @apcraig might lend some insight here. I ordered the ice_in &forcing_nml section to roughly parallel the icepack_in &forcing_nml section (see below, very bottom of comment). They aren't one-to-one, but I could make them as close as possible. That's very minor, so just let me know what you think. My concern with the {sil,nit,fe}_data_type switch to bgc_data_type was exactly as you describe. THe mixing and matching would go away and I'm not sure how critical or not that is and that @njeffery would likely need to be involved and I wasn't sure she had the time now. I think if we just keep it in the current format and maybe add an "improvement" issue for future modification that is the best step at this time. I think we've converged that this will wait. &forcing_nml (from icepack_in for icepack) &forcing_nml (from ice_in from cice) |
@apcraig The error I'm getting on the failed test is this: WARNING: evp_prep calculates surface tilt (abort_ice)ABORTED: Log is here: In the log file from the CICE_master test suite roundrobin simulation I don't see anything at all about trying to read the oceanmixed_ice_depth.nc file and this didn't show up in any other tests. It looks to me like somehow the 'ocn_data_dir' is somehow automatically being set to this file. I don't think this is something I messed up in coding it up (if so I can't find it in the code I changed). My best guess is that the option is set incorrectly here: configuration/scripts/options/set_nml.gx1 where we have 'ocn_data_dir = 'oceanmixed_ice_depth.nc'. I think maybe this should be: That being said, there isn't an oceanmixed_ice_depth.nc file in the forcing directory structure listed above. I did do some poking around and found it here: /glade/p/cesm/pcwg_dev/other_files/ but this seems odd. If we need it for the test suite shouldn't I include it in the CICE_data regular structure/tarball and if it is needed, why aren't other tests blowing up. Does this seem like the right path or is something else going on? |
@eclare108213 I just updated so that the namelist order now matches the icepack namelist order. I re-ran the test suite and it's failing on the same roundrobin test: The error message is the same (see path below), so @apcraig hopefully you can help me unravel the issue I highlighted in the previous message to this thread. |
Sorry, was on travel but back now. I had a quick look. This is the only gx1 test and it is the only one failing. I do see a possible typo, bgc_data_dir = /uknown_bgc_data_dir I don't know if that is creating some problems (missing n). The string "unknown" is a special string in a few places. That should probably be fixed. Comparing the gx3 and gx1 set_nml files, there is a difference that the ocn_data is set for gx1 but not gx3. It actually does look like the ocean mixed layer forcing data file is not correct. The error message says, ocean mixed layer forcing data file = So there is a problem with the namelist. The namelist has
I think ocn_data_dir is supposed to be a directory while oceanmixed_file should be the nc file or something like that? I think it comes down to these settings in set_nml.gx1, ocn_data_format = 'nc' I note the set_nml.gx1 has not changed at all. Is that right? And that something there is not set properly. I also see that ocn_data_dir is set to /unknown... where it used to be just unknown... in the default ice_in file. Is that correct? My next step will be to run some cases with old and new sandboxes and try to understand what's different better and debug further. Let me know if you want me to do that. |
Just a quick clarification that we have an ocean forcing data file for gx1 but not for gx3. |
@apcraig I am still running into some testing issues with the roundrobin case.
(abort_ice)ABORTED:
forrtl: severe (174): SIGSEGV, segmentation fault occurred |
@eclare108213 Yes, there is a forcing file for gx1 but not gx3. However we currently don't include this file as part of our ftp tarballs, so I don't think it should be necessary for tests. I guess what I'm confused about is why this roundrobin test fails for me but not in the main test suite. I have checked the master test suite against which I'm comparing and it has the typo on the ice_in namelist: |
If you diff ice_in from master and your current run, there are several diffs and they probably matter. < is in the master, > is in the new case. These should be reviewed. I think the namelist setup has changed. In particular, the new namelist has "ocn_data_type=ncar" while the old one had "sst_data_type=default"?
|
@apcraig Yes, there are differences in the namelist since some of these are specific changes @eclare108213 requested in issue #123 . Specifically,
Of the changes you list everything looks okay except sst_data_type being set from 'default' in the master case to 'ncar' in the test I'm running. I'll look into this more. |
The erroneous data directory is getting set with the gx1 grid option:
which obviously needs to be fixed to include both ocn_data_dir and ocn_data_file, correctly. This file also sets ocn_data_type = 'ncar', as it should. I don't know why that's not crashing all of the gx1 runs, unless the logic in the code that handles ocn_data_dir and ocn_data_file is wrong. So there might be several things that need to be fixed. My initial suspicion is that some parameter is not getting set properly, or there's a memory conflict, or something like that, which changes depending on the parallelization/decomposition. I suggest starting by turning on the debug option, to rule out that kind of thing. At some point it might be necessary to step through the code with a debugger. I can do this in my office (where I'm not, at the moment)... And then we should take a good look at the options files to make sure that the configurations make sense. |
I don't think this is a decomposition issue. This is the only gx1 test, it just happens to be roundrobin. But I don't believe the decomp is playing a role. I think the question is why is this broken now and not on the master. If "ocn_data_type=ncar" and "sst_data_type=default" are not the same setting, then that is probably playing a bit role. I'm not clear whether the gx1 should or should not be using the ocn_data_type data. It could be that we have never run the ocn_data_type setting on any cases before and it would be broken on the master in the same way if we turned it on. If that's the case, it's really just a matter of getting the settings and filenames in the namelist correct. |
That makes sense. The gx1prod option should use the ocean forcing data file -- but it might not have been run since the bug was introduced into gx1 option file (in May). |
I have confirmed that if in gx1, we set sst_data_type to 'ncar', the test case fails. We had ocn_data_type='ncar' and that was doing nothing to the namelist. I am also modifying the parse_namelist stuff so we catch errors in the namelist options if they occur. |
@apcraig Just to clarify. It sounds like the error was that in the current consortium master we set ocn_data_type in the configuration/scripts/options/set_nml.gx1 and since that wasn't a valid namelist option it was ignored and the test ran fine. But when you modified this to be sst_data_type, which is a valid namelist option, and set it to 'ncar' the consortium master gx1 roundrobin fails. Is this all right? I did not see sst_data_type in any of the set_nml.X options. ocn_data_type was already set in set_nml.gx1, set_nml.bgcNICE, and set_nml.bgcISPOL. I have not tested all these to make sure all pass and that ocn_data_type works properly. Just to mention. I did try to go through and look at all the bgc_data_dir and {bgc,sil,nit,fe_data_type} options and this was a big, confusing mess. We don't provide half those files and it's unclear whether we should be able to mix and match datasets easily in CICE. It should be cleaned up, I just didn't want to mess too much with BGC stuff since I'm not sure exactly what other developments are in the works. Just my thoughts on this for the future. @apcraig I just want to clarify next steps:
|
After chatting with @njeffery and exploring the tracer issues a bit, I have a pretty good idea of what needs to be done for the bgc. There are extra options in the code, which we can leave in there and not support, or comment out, or even delete. Once we have the basic tests working, I think we should merge the changes so far, and then work on cleaning it up further. |
Ok. @apcraig when you can weigh in on how to proceed I will try to get everything squared up tomorrow so it's ready to merge. @eclare108213 do you want this done before or after #207 ? Does it matter? |
It probably doesn't matter. If you have it ready to go, we should merge it and then deal with the consequences to #207. |
I will create a PR shortly to bring in a few changes including an update to parse_namelist. I think we can merge this PR now knowing the gx1 case is failing. We need to fix the gx1 case, but lets take that separately. Maybe I will wait for this to merge, update my PR, and try to fix the gx1. Separately, I'll let @eclare108213 take the lead on the bgc stuff. We'll want to merge that when we can. |
Got verbal approval to move ahead
When the 'default_season' namelist setting was added in 01494c7 (Nml settings (CICE-Consortium#208), 2018-10-19) to replace 'l_winter' and 'l_spring', a call to 'broadcast_scalar' was forgotten, such that the 'default_season' value from the namelist is only use on the first MPI process; all other processes get the hardcoded default value 'winter' defined in 'ice_init::input_data', resulting in different initialization across the grid for several variables if anything other than 'winter' is used in the namelist. Fix that by broadcasting 'default_season' to all MPI procs.
When the 'default_season' namelist setting was added in 01494c7 (Nml settings (CICE-Consortium#208), 2018-10-19) to replace 'l_winter' and 'l_spring', a call to 'broadcast_scalar' was forgotten, such that the 'default_season' value from the namelist is only use on the first MPI process; all other processes get the hardcoded default value 'winter' defined in 'ice_init::input_data', resulting in different initialization across the grid for several variables if anything other than 'winter' is used in the namelist. Fix that by broadcasting 'default_season' to all MPI procs.
When the 'default_season' namelist setting was added in 01494c7 (Nml settings (CICE-Consortium#208), 2018-10-19) to replace 'l_winter' and 'l_spring', a call to 'broadcast_scalar' was forgotten, such that the 'default_season' value from the namelist is only used on the first MPI process; all other processes get the hardcoded default value 'winter' defined in 'ice_init::input_data', resulting in different initialization across the grid for several variables if anything other than 'winter' is used in the namelist. Fix that by broadcasting 'default_season' to all MPI procs.
When the 'default_season' namelist setting was added in 01494c7 (Nml settings (#208), 2018-10-19) to replace 'l_winter' and 'l_spring', a call to 'broadcast_scalar' was forgotten, such that the 'default_season' value from the namelist is only used on the first MPI process; all other processes get the hardcoded default value 'winter' defined in 'ice_init::input_data', resulting in different initialization across the grid for several variables if anything other than 'winter' is used in the namelist. Fix that by broadcasting 'default_season' to all MPI procs.
When the 'default_season' namelist setting was added in 01494c7 (Nml settings (CICE-Consortium#208), 2018-10-19) to replace 'l_winter' and 'l_spring', a call to 'broadcast_scalar' was forgotten, such that the 'default_season' value from the namelist is only use on the first MPI process; all other processes get the hardcoded default value 'winter' defined in 'ice_init::input_data', resulting in different initialization across the grid for several variables if anything other than 'winter' is used in the namelist. Fix that by broadcasting 'default_season' to all MPI procs.
A number of namelist changes to address issue #123
Developer(s): Alice DuVivier
Please suggest code Pull Request reviewers in the column at right.
Are the code changes bit for bit, different at roundoff level, or more substantial?
They should be bit for bit, but a few tests are failing that I can't figure out why and am hoping @apcraig can help here.
Please include the link to test results or paste the summary block from the bottom of the testing output below.
https://github.com/CICE-Consortium/Test-Results/wiki/8512623310.cheyenne.intel.181015.231518
Does this PR create or have dependencies on Icepack or any other models? No
Is the documentation being updated with this PR? (Y/N) Y
If not, does the documentation need to be updated separately at a later time? (Y/N) N/A
Other Relevant Details:
These variable were already modified on the namelist, so don't need to be addressed (update namelist and documentation #123 list was in error):
emissivity
tfrz_option: mushy_layer --> mushy
Namelist re-naming/moving/adding:
default_season (Added)
sst_data_type --> ocn_data_type (Renamed)
sss_data_type --> bgc_data_type (Renamed)
restore_sst --> restore_ocn (Renamed)
shortwave: default --> ccsm3 (option renamed)
albedo_type: default --> ccsm3 (option renamed)
bgc_data_dir --> forcing_nml (moved consistent w/ Icepack)
sil_data_type, nit_data_type, and fe_data_type --> forcing_nml (moved consistent with icepack)
I did not change the bgc data types (Sil, nit, fe) because despite @njeffery's comment it was a lot less clear how this should be done cleanly given the way these are implemented in the code. I'm open to further discussion on this since it still feels a bit messy to me.