-
Notifications
You must be signed in to change notification settings - Fork 132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Updated CICE_InitMod.F90 for cice driver to set all tracer values to … #16
Conversation
…c0 over land after initialization. This supports bit-for-bit testing on different pe counts and different decompositions where land block elmination might be different and results in all land tracer values to be zero on history and restart files. Updated the test scripts and added a new -td feature that allows test results to be compared with each other. Refactored some of the scripts to make them more flexible for the -td option. Changed the test name convention so machine is the first item in the name. This is to make it easier to use the -td option. Modified tests so now there are only smoke and restart tests. Added new settings (run5day, run10day, run1year, etc) to support different length smoke tests. Refactored the test scripts so there is a test_smoke.script and a test_restart.script now. Moved the main test script from under tests to the scripts directory and renamed it to cice.test.setup.csh. Updated the README.test. Updated the base_suite file to add new tests and make testnames conform to new convention. Changed the restart test to be a 10 day run with restart at day 5. Added other new options to support different batch wall times (short, medium long), support modifications for grids (gx1, gx3), to turn on the bfbflag (bfbflagT). Added some new error checks to make sure create.case aborts if there is a problem detected. Modified create.case -h so it now automatically generates the list of -s options and the list of -t tests as part of the documentation. These results are bit-for-bit over the ocean and not bit-for-bit over land for some tracers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This all looks good to me. I haven't pulled and tested the changes, but will early next week.
The only thing I noticed that needs changing is in README.test, lines 161-162 and 168-169. Both of these pass "-m conrad" to create.case but then cd into "cheyenee_".
Thanks Matt. When you give it a try, let me know if there is anything you don't like or understand. We should start to put together a developers guide for the scripts once we feel the implementation is mature enough. |
I started running through the tests, following the new README.test instructions. The first one, setting up a baseline to compare to, worked fine. The second one fails, with no change to the code: Here is the standard out: CICE rundir is /net/scratch3/eclare/CICE_RUNS/wolf_smoke_gx3_4x1.t01 CICE COMPLETED SUCCESSFULLY Regression Compare Mode: and the standard error output: cmp: /net/scratch3/eclare/CICE_RUNS/wolf_smoke_gx3_4x1.t01/restart/: Is a directory So it looks like the filenames aren't included in the compare command? |
I will test the cases in the README.test and update that document again. I was running the base suite. Maybe you can try that and see if it works. I agree that the README.test examples should ultimately be documented properly, but it's a work in progress :). |
Will do, and as long as you are updating README.test, there's still a mixture of conrad and cheyenne examples. Please pick one, and state somewhere near the first instance that it's a machine name, to be perfectly clear. |
Do the regression tests check or otherwise ensure that the model configuration used for the new version is the same as the old version, or provide an option for them to be different? For the Icepack restart implementation, we have been assuming that they would be (forced to be) the same and all tracers associated with a given configuration would be written for both the baseline and the comparison run. |
So for regressions testing, the idea is that the default namelist and setup should not change. What we will do is have different settings, that can be specified via the -s command line option, to turn on different features and namelist. I imagine, we'll test 10-20 different configurations once we have things setup as part of a test suite. Each of those cases will have a different test name and each test is saved and checked separately. If the defaults change from version to version, or if a non-bit-for-bit change is introduced, the regression testing should pick that up. And those failures might be expected or not expected and the team can respond as needed. |
ok. Find indices of diagnostic pointsPrimary job terminated normally, but 1 process returned
|
I just updated the scripts and README.test. That should be consistent now. All of the tests pass on cheyenne. The tests that are failing on wolf are using threads. The all MPI tests on wolf are passing. My guess is that the modules, batch settings, or compile settings on wolf are not set properly for threading. Maybe you haven't tried that yet? |
That's probably true -- I usually don't mess with threads. I'll look into it. |
Running the base_suite on Conrad, all tests pass except conrad_smoke_gx3_4x1_diag1_run5day_thread, which fails to run. The runlog shows:
The log also shows:
So, for some reason when CICE_THREADED is set to true, it fails to find one of the diagnostic points |
Matt, Are the tests with mpi/openmp working on conrad? If so, the all MPI compile with treading explicitly on in the compile should also work in my experience. I can login to conrad if you want me to have a look as well. I don't think this means the current scripts implementation is bad, I think this means our test suite is being extended and we're catching new problems in the code/compile/etc that we need to fix. That's a good thing. I think we'll want to rapidly extend the test suite in the next few weeks more which I'm sure will lead to more work. It took me a while to get bit-for-bit restarts with different pe counts and decomps due to values over "land", but now that seems to be fixed. This is all good progress. |
Tony,
Yes, the hybrid MPI+OpenMP tests are working on Conrad:
I agree that the error I am getting points to an issue in the code/compile and not in the testing scripts. Perhaps I should submit this as an issue since it shouldn't delay/prevent the pull request from being merged. |
@mattdturner. That makes sense to me. The issue should be specific, "single threaded runs on conrad failing" or something like that. As we extend the testing, these kinds of issues are probably going to become a good portion of our efforts at getting the model more robust. thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I worked through the test examples in README.test, and they were all successful on wolf using the updated Macro file (https://github.com/CICE-Consortium/CICE/pull/18/commits). I like the test suite, including plots of the output! As far as I'm concerned, this is ready to merge.
…c0 over land after
initialization. This supports bit-for-bit testing on different pe counts and different
decompositions where land block elmination might be different and results in all land tracer
values to be zero on history and restart files.
Updated the test scripts and added a new -td feature that allows test results to be compared
with each other. Refactored some of the scripts to make them more flexible for the -td option.
Changed the test name convention so machine is the first item in the name. This is to make
it easier to use the -td option.
Modified tests so now there are only smoke and restart tests. Added new settings (run5day,
run10day, run1year, etc) to support different length smoke tests. Refactored the test
scripts so there is a test_smoke.script and a test_restart.script now. Moved the main
test script from under tests to the scripts directory and renamed it to cice.test.setup.csh.
Updated the README.test.
Updated the base_suite file to add new tests and make testnames conform to new convention.
Changed the restart test to be a 10 day run with restart at day 5.
Added other new options to support different batch wall times (short, medium long), support
modifications for grids (gx1, gx3), to turn on the bfbflag (bfbflagT).
Added some new error checks to make sure create.case aborts if there is a problem detected.
Modified create.case -h so it now automatically generates the list of -s options and the
list of -t tests as part of the documentation.
These results are bit-for-bit over the ocean and not bit-for-bit over land for some tracers.