add timing thresholds to test codes #532

jbathegit · 2023-10-25T20:58:02Z

re: some other discussions we've been having in spack-stack#589, and possibly also related to #527 and elsewhere, here's an idea for how to start moving forward with adding timing thresholds to the NCEPLIBS-bufr test codes.

This is just an example for how to do this in one of our many test codes that I picked at random, and the same thing could be done in all of other test/intest* and test/outtest* codes as well. The idea is that we could eventually remove the extraneous 'print *,' statements for these values and instead compare each final cpu and wall time to some threshold and then just do a stop with a nonzero return inside of the code if the threshold is exceeded. Of course we'd also need to do some baseline tests to figure out what those thresholds should even be. But before doing any of that I wanted to put this out there to give you all a chance to comment, and in case you may have a better idea or know of a better or more standard way of doing this(?)

At some point we should probably set up similar timing tests on all of our utilities via the test_debufr.sh, test_sinv.sh, etc. scripts, but those would be a bit different b/c those are utilities that are used by the wider community, so we wouldn't want to embed any cpu_time or system_clock calls directly in those codes. Instead, my thinking was maybe we use the UNIX time command whenever we run each test of that utility inside of the associated test_(utilityname).sh script, and then check the thresholds that way within the script. But since the UNIX time command writes its results to stderr, we'd probably have to use something a bit kludgey along the lines of { time ../utils/debufr $args_n 2> /dev/null; } 2> timings_n to capture those timings and then compare them to some threshold within the script. Or maybe there's a better way to do that as well?

…codes

jbathegit · 2023-10-25T21:00:00Z

@edwardhartnett @jack-woollen @AlexanderRichert-NOAA sorry I forgot to tag you all in the initial post, but when you get a chance please take a look at this suggestion of a way forward, and let me know what you think (thanks!)

edwardhartnett · 2023-10-25T21:08:36Z

I have some suggestions based on doing this for netCDF.

Don't just take out all the print statements, but add a verbose flag variable and have the print statements still work if the verbose flag is set. That way, you can turn on all the prints for debugging and turn them all off for timing. For this I would just use an integer in the code and set it to 0 when checking in code, and change it to 1 on my machine for debugging. (In other words, not a command line argument to the test, just an internal variable).
I don't think all tests should be timed like this, because most tests do not do what the user really cares about, which is move data. So select or create some tests that do what a user does, and writes/reads a reasonable amount of data, and use those for performance. Other tests might do all kinds of crazy things with metadata, or test lots of failure modes, and we don't care about timing it.
Add a CMake flag ENABLE_BENCHMARKS, and only run your benchmarks if that flag is present; that way you can make your benchmarks beefy, which is bound to slow down testing a lot.
Whether you can automatically detect declining performance numbers on GitHub is an interesting question. I think having performance tests is very useful, even if they are only run occasionally by the developers and power users.
With netCDF, I add tailored performance testing for NOAA. For example, when there was a performance question about writing the FV3 ATM files in netCDF, I added a performance test to netcdf-c which mimics the FV3 ATM writes, with some fake data. The goal is to simulate the operational data flow in a test as closely as possible.

proposed example for starting to do cpu and wall time checks in test …

616b1be

…codes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add timing thresholds to test codes #532

add timing thresholds to test codes #532

jbathegit commented Oct 25, 2023

jbathegit commented Oct 25, 2023

edwardhartnett commented Oct 25, 2023 •

edited

Loading

add timing thresholds to test codes #532

Are you sure you want to change the base?

add timing thresholds to test codes #532

Conversation

jbathegit commented Oct 25, 2023

jbathegit commented Oct 25, 2023

edwardhartnett commented Oct 25, 2023 • edited Loading

edwardhartnett commented Oct 25, 2023 •

edited

Loading