Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add timing thresholds to test codes #532

Draft
wants to merge 1 commit into
base: develop
Choose a base branch
from
Draft

Conversation

jbathegit
Copy link
Collaborator

re: some other discussions we've been having in spack-stack#589, and possibly also related to #527 and elsewhere, here's an idea for how to start moving forward with adding timing thresholds to the NCEPLIBS-bufr test codes.

This is just an example for how to do this in one of our many test codes that I picked at random, and the same thing could be done in all of other test/intest* and test/outtest* codes as well. The idea is that we could eventually remove the extraneous 'print *,' statements for these values and instead compare each final cpu and wall time to some threshold and then just do a stop with a nonzero return inside of the code if the threshold is exceeded. Of course we'd also need to do some baseline tests to figure out what those thresholds should even be. But before doing any of that I wanted to put this out there to give you all a chance to comment, and in case you may have a better idea or know of a better or more standard way of doing this(?)

At some point we should probably set up similar timing tests on all of our utilities via the test_debufr.sh, test_sinv.sh, etc. scripts, but those would be a bit different b/c those are utilities that are used by the wider community, so we wouldn't want to embed any cpu_time or system_clock calls directly in those codes. Instead, my thinking was maybe we use the UNIX time command whenever we run each test of that utility inside of the associated test_(utilityname).sh script, and then check the thresholds that way within the script. But since the UNIX time command writes its results to stderr, we'd probably have to use something a bit kludgey along the lines of { time ../utils/debufr $args_n 2> /dev/null; } 2> timings_n to capture those timings and then compare them to some threshold within the script. Or maybe there's a better way to do that as well?

@jbathegit
Copy link
Collaborator Author

@edwardhartnett @jack-woollen @AlexanderRichert-NOAA sorry I forgot to tag you all in the initial post, but when you get a chance please take a look at this suggestion of a way forward, and let me know what you think (thanks!)

@edwardhartnett
Copy link
Contributor

edwardhartnett commented Oct 25, 2023

I have some suggestions based on doing this for netCDF.

  • Don't just take out all the print statements, but add a verbose flag variable and have the print statements still work if the verbose flag is set. That way, you can turn on all the prints for debugging and turn them all off for timing. For this I would just use an integer in the code and set it to 0 when checking in code, and change it to 1 on my machine for debugging. (In other words, not a command line argument to the test, just an internal variable).
  • I don't think all tests should be timed like this, because most tests do not do what the user really cares about, which is move data. So select or create some tests that do what a user does, and writes/reads a reasonable amount of data, and use those for performance. Other tests might do all kinds of crazy things with metadata, or test lots of failure modes, and we don't care about timing it.
  • Add a CMake flag ENABLE_BENCHMARKS, and only run your benchmarks if that flag is present; that way you can make your benchmarks beefy, which is bound to slow down testing a lot.
  • Whether you can automatically detect declining performance numbers on GitHub is an interesting question. I think having performance tests is very useful, even if they are only run occasionally by the developers and power users.
  • With netCDF, I add tailored performance testing for NOAA. For example, when there was a performance question about writing the FV3 ATM files in netCDF, I added a performance test to netcdf-c which mimics the FV3 ATM writes, with some fake data. The goal is to simulate the operational data flow in a test as closely as possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants