Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Basic test setup #80

Open
nordmoen opened this issue Apr 20, 2021 · 14 comments
Open

Basic test setup #80

nordmoen opened this issue Apr 20, 2021 · 14 comments

Comments

@nordmoen
Copy link
Collaborator

It would be nice to setup some basic testing for BLOM, both for continuous integration (to ensure that the output of the different compilers actually work) and for assurance when updating the code that nothing is broken. With a good test setup it would also be easier to make performance improvements since there would be ways to ensure that the new code is up to specification.

Meson has built in support for unit testing which we could leverage, however, the structure of these unit tests are quite free. Ideally the executable used for the test should be self-contained (meaning it should test a few relevant factors), deterministic (the data which is tested against should not be affected by changes to the code) and quite quick to run.

Several avenues are open to us when implementing this:

  1. We could implement the executable in Fortran. The executable would then call into BLOM with predefined inputs and known valid output and compare the results.
  2. We could implement the executable in a different language (e.g. Python) and use existing facilities in BLOM that output statistics during a run to compare against. This requires us to generate an initial set of known good values for different inputs, but it requires minimal changes to BLOM.
    • One challenge with this is that the test executable needs to run the new version of BLOM, something that is easier to automate with suggestion 1.
  3. Other suggestions?

I would be available to help with this, especially the Meson and CI integration, but I would require some help defining test cases and figuring out how to implement it in BLOM.

@nordmoen
Copy link
Collaborator Author

I'm trying to set it up so that Meson can run a few test cases (to begin with, simply run the binary). One thing that is not clear to me is what the namelist file is which phy/rdlim.F is checking for is, or where it could be in the repository. Is this something which I could generate based on files in the repository?

@AleksiNummelin
Copy link
Collaborator

At the moment the limits file (which is what the phy/rdlim.Fexpects) is not included in the repository. In the NorESM context, it is created by cime_config/buildnml but I don't think one can easily run that outside NorESM since it needs some XML files. I created a pull request just today that includes a limits.json and a limits_from_json.py that will create the limits file. There is also a simple submission script for running on Betzy. Note that these are all specific to the channel setup.

@nordmoen
Copy link
Collaborator Author

At the moment the limits file (which is what the phy/rdlim.Fexpects) is not included in the repository. In the NorESM context, it is created by cime_config/buildnml but I don't think one can easily run that outside NorESM since it needs some XML files. I created a pull request just today that includes a limits.json and a limits_from_json.py that will create the limits file. There is also a simple submission script for running on Betzy. Note that these are all specific to the channel setup.

Thanks for the quick reply @AleksiNummelin

Reading through the responses in your PR it seems that this is something that needs to be decided before I can move forward with testing.

@nordmoen
Copy link
Collaborator Author

Alternatively, for testing we would just need a minimal workable limits file for each of the test cases. If the channel setup is more or less standalone I can use that for testing.

@AleksiNummelin
Copy link
Collaborator

AleksiNummelin commented Apr 28, 2021

We discussed this today in the BLOM-core meeting. For now, one can use for example the channel setup for testing, but we agreed that a first proper test case should be built upon the fuk95 case. In this context, I'd imagine testing would imply building, running, and checking diagnostics against a reference (+these cases could be used for testing scalability etc.). Therefore a full-flexed test case should also include checks for the physical parameters (tracer conservation, matching a reference kinetic and potential energy budgets etc.). Related to this discussion, we thought that there is also a need to move some of the idealized cases to another folder. I created another issue for this #86 since that discussion might be a bit different from the focus here.

@nordmoen
Copy link
Collaborator Author

We discussed this today in the BLOM-core meeting. For now, one can use for example the channel setup for testing ...

Could you help me set this up? I don't know what would be needed, but I have the general setup for testing and generating an individual dimension.F file for each test. "All" I need now is some test data so the program will actually run and then we can start to think about checking correctness.

@matsbn
Copy link
Contributor

matsbn commented Apr 29, 2021

I have placed a Fortran namelist (“limits”) for the fuk95 test case here: https://gist.github.com/matsbn/718c1419cc1ecc064d78d18f5687439f

This test case does not need any other input files. To shorten the integration time, "NDAY2" can be reduced from 10 to say 1.

@nordmoen
Copy link
Collaborator Author

I downloaded the limits file and it worked with some configuration updates. One snag is that the program crashes when I use OpenMP (the program runs single threaded).

Thread 1 "fuk95_blom" received signal SIGSEGV, Segmentation fault.                                                
0x00000000005016df in mod_dia::diaacc (m=<error reading variable: Cannot access memory at address 0x7fffff46f978>,
    n=<error reading variable: Cannot access memory at address 0x7fffff46f970>,                                   
    mm=<error reading variable: Cannot access memory at address 0x7fffff46f968>,                                  
    nn=<error reading variable: Cannot access memory at address 0x7fffff46f960>,                                  
    k1m=<error reading variable: Cannot access memory at address 0x7fffff46f958>,                                 
    k1n=<error reading variable: Cannot access memory at address 0x7fffff46f950>) at ../phy/mod_dia.F:972         
972           subroutine diaacc(m,n,mm,nn,k1m,k1n)                                                                

I'm developing the unit tests on the branch feature_unit_tests if any of you have time to help me debug the OpenMP problem. Once this is fixed I think we can start looking into how to check the results.

@nordmoen
Copy link
Collaborator Author

nordmoen commented May 3, 2021

I downloaded the limits file and it worked with some configuration updates. One snag is that the program crashes when I use OpenMP (the program runs single threaded).

Thread 1 "fuk95_blom" received signal SIGSEGV, Segmentation fault.                                                
0x00000000005016df in mod_dia::diaacc (m=<error reading variable: Cannot access memory at address 0x7fffff46f978>,
    n=<error reading variable: Cannot access memory at address 0x7fffff46f970>,                                   
    mm=<error reading variable: Cannot access memory at address 0x7fffff46f968>,                                  
    nn=<error reading variable: Cannot access memory at address 0x7fffff46f960>,                                  
    k1m=<error reading variable: Cannot access memory at address 0x7fffff46f958>,                                 
    k1n=<error reading variable: Cannot access memory at address 0x7fffff46f950>) at ../phy/mod_dia.F:972         
972           subroutine diaacc(m,n,mm,nn,k1m,k1n)                                                                

I'm developing the unit tests on the branch feature_unit_tests if any of you have time to help me debug the OpenMP problem. Once this is fixed I think we can start looking into how to check the results.

I was able to overcome this locally by increasing the stack size with ulimit -s unlimited. Now it is only a matter of configuring CI with the same setup.

The next step is probably to create some scripts that can be run to check the results of the run. I can implement this if I know what the different files mean, their expected values and their type.

@nordmoen
Copy link
Collaborator Author

To follow up on this.

Now that the initial test can be run we need to check that the output is as expected. For me it would be easiest to write a small script in Python that could check the output, but I need some help with the file type and what the expected output should look like.

@TomasTorsvik
Copy link
Contributor

There is a tool called "cprnc" that has been created for comparison of output netCDF files for CESM. The cime source that is bundled with NorESM is a bit old, but it seems to work. The most recent version is available at
https://github.com/ESMCI/cime/tree/master/tools/cprnc

There is also a python version of this tool, but it does not seem to be maintained
https://github.com/NCAR/cprnc_python

Basically, the tool takes two netCDF files as input and creates a report on the difference in the data, while ignoring any information related to the specific run.

I compiled a version of the tool on Betzy (using the source code from NorESM2.0.4), which is available from
/cluster/shared/noresm/diagnostics/cprnc/

Maybe something along this line would be useful as a check on the output?

@AleksiNummelin
Copy link
Collaborator

A bit late with a response here, but for a more comprehensive test in the future, would be maybe interesting to use a package like xarray (if we can afford having a conda environment that is fast to install). There is some nice functionality (coming from numpy) that allows for very basic checks as well, for example checking if two variables are equal to a tolerance http://xarray.pydata.org/en/stable/generated/xarray.testing.assert_allclose.html

With xarray it would be easy to implement checks for dynamical consistency (energy levels etc. we've talked about).

@matsbn
Copy link
Contributor

matsbn commented May 27, 2021

Also very late with my response here, sorry for that! If one wants to test for bit-identical simulations, I think the available checksum functionality in BLOM should work well. Actually each BLOM simulation dumps " chksum: dp: 0x ... " to stdout at the end of the simulation that I have found very reliable in detecting simulation differences. This could be extended, e.g by adding a checksum for a sensitive iHAMOCC field. There is of course a value in actually checking the output since the generation of output can also be erroneous.

For detecting simulation differences within an acceptable tolerance, I promised to implement some energy diagnostics in BLOM that would dump say global kinetic and potential energy sums to stdout. For "simple" metrics like that, I believe this approach would be easier to integrate in a CI framework compared to relying on external tools for obtaining these metrics. More sophisticated metrics might surely be more convenient to develop in something other than Fortran. Unfortunately, I have had no time to implement these energy metrics yet, but hopefully I can make a stab at it very soon.

@nordmoen
Copy link
Collaborator Author

I think both of these tracks should be followed. Bit-identical checksums are excellent for CI where we just need to verify that changes do not affect the output of simulation. However, for day-to-day development it would be better with tolerance based tests so that one can better gauge the effect of changes while developing. Tolerance based testing is also essential for moving to GPUs where bit-identical will be difficult (maybe even impossible) and it would here be good to be able to measure the difference in accuracy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Todo
Development

No branches or pull requests

4 participants