cmor tool improvements #205

ilaflott · 2024-10-11T19:47:40Z

I ran my code
I tried to make my code readable
I tried to comment my code
I wrote a new test, if applicable
I wrote new instructions/documentation, if applicable
I ran pytest and inspected it's output
I ran pylint and attempted to implement some of it's feedback

This PR...

overhauls the README.md with fre-cli oriented usage and removes outdated doc
refactors cmor_mixer.py heavily to be more readable, have more modularity to it to clearly illustrate who-does-what and why, with an emphasis on clearly exposing data movement inherent in the routine
functions now have doc strings with argument descriptions in most places
better run-time feedback of functioning/IO/logic etc during runtime, hopefully not too much
better comments where possible
new test case where key:value pair in var_list argument differ
more test case cleanup/setup, more specific test case success conditions
new CLI/click.testing.CliRunner test for the new functionality of the fre.cmor module
change of the way the click module interfaces with the cmor run subtool module

…change test succeess condition to reflect WHAT WE WANT and NOT what is currently happening...

…cmor_mixer, refactor the old function netcdf_var, but most importantly, RENAMING THINGS for readability. push all input netCDF4.Dataset reading for that function as close to the start as possible in most cases. refactor vertical dimension logic. several new helper functions to enhance logic parsing of the bulk routines.

…s, and ignore incoming cmor module warnings thrown by pylint that we cannot prevent nor control

…what kinds of objects are passed to which function, etc. theres definitely some easy confusion regarding the key/value pair in varlist

…nto functions, delving into data movement finally a bit. additioanl test case for both the modular functionality of the python package, and the cli calls. probably not in a completley working state, but want to jot a note before i get even more ahead of myself.

…p command line options the same while adjusting python module argument names, switch from context forward to context invoke to map the arg names

…by click.echo of result.stderr to remind myself why thats commented out in the first place. not capd sep err

…hen trying to remove tmp dir or outdir...), the output files ARE still being recreated at least. better control over output directories now as well.

… when they cleave scope. netcdf4 datasets explicitly closed now too. removed useless cmor.close() call. remove outdated os.system calls and replace with subprocess which... cant quite believe im saying it, but is preferable to this way of making system calls. add absolutely dreadful absolute-path resolving right before rewrite_netcdf_var... but it does get cmor to do its temporary file processing in the right spot

.github/workflows/create_test_conda_env.yml

fre/cmor/README.md

fre/cmor/frecmor.py

fre/tests/test_files/varlist_local_target_vars_differ

fre/cmor/cmor_mixer.py

ilaflott · 2024-10-22T19:59:32Z

fre/cmor/tests/test_cmor_run_subtool.py

+    # VERY ANNOYING !!! FYI WARNING TODO
+    if Path(TMPDIR).exists():
+        try:
+            shutil.rmtree(TMPDIR)
+        except OSError as exc:
+            print(f'WARNING: TMPDIR={TMPDIR} could not be removed.')
+            print( '         this does not matter that much, but is unfortunate.')
+            print( '         supicion: something the cmor module is using is not being closed')
+
+    #assert not Path(TMPDIR).exists()    # VERY ANNOYING !!! FYI WARNING TODO
+
+    # VERY ANNOYING !!! FYI WARNING TODO
+    if Path(OUTDIR).exists():
+        try:
+            shutil.rmtree(OUTDIR)
+        except OSError as exc:
+            print(f'WARNING: OUTDIR={OUTDIR} could not be removed.')
+            print( '         this does not matter that much, but is unfortunate.')
+            print( '         supicion: something the cmor module is using is not being closed')
+
+    #assert not Path(OUTDIR).exists()    # VERY ANNOYING !!! FYI WARNING TODO


these cleanup steps throw some very annoying error messages if you don't let the exception go-

... ... ... fre/cmor/tests/test_cmor_run_subtool.py::test_fre_cmor_run_subtool_case1_output_compare_metadata PASSED [ 55%] fre/cmor/tests/test_cmor_run_subtool.py::test_setup_fre_cmor_run_subtool_case2 FAILED [ 66%] ======================================================================================== FAILURES ======================================================================================== _________________________________________________________________________ test_setup_fre_cmor_run_subtool_case2 __________________________________________________________________________ capfd = <_pytest.capture.CaptureFixture object at 0x2b0f0acad790> def test_setup_fre_cmor_run_subtool_case2(capfd): ''' make a copy of the input file to the slightly different name. checks for outputfile from prev pytest runs, removes it if it's present. this routine also checks to make sure the desired input file is present''' if Path(FULL_OUTPUTFILE).exists(): Path(FULL_OUTPUTFILE).unlink() assert not Path(FULL_OUTPUTFILE).exists() if Path(OUTDIR+'/CMIP6').exists(): shutil.rmtree(OUTDIR+'/CMIP6') assert not Path(OUTDIR+'/CMIP6').exists() # VERY ANNOYING !!! FYI WARNING TODO if Path(TMPDIR).exists(): #try: > shutil.rmtree(TMPDIR) fre/cmor/tests/test_cmor_run_subtool.py:156: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ ../../conda/envs/fre_cli/lib/python3.9/shutil.py:734: in rmtree _rmtree_safe_fd(fd, path, onerror) ../../conda/envs/fre_cli/lib/python3.9/shutil.py:667: in _rmtree_safe_fd _rmtree_safe_fd(dirfd, fullname, onerror) ... ... ... ../../conda/envs/fre_cli/lib/python3.9/shutil.py:690: in _rmtree_safe_fd onerror(os.unlink, fullname, sys.exc_info()) _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ topfd = 28, path = 'fre/tests/test_files/outdir/tmp/CMIP6/CMIP6/ISMIP6/PCMDI/PCMDI-test-1-0/piControl-withism/r3i1p1f1/Omon/sos/gn/v20241022' onerror = <function rmtree.<locals>.onerror at 0x2b0f0accbe50> def _rmtree_safe_fd(topfd, path, onerror): ... ... ... ... > os.unlink(entry.name, dir_fd=topfd) E OSError: [Errno 16] Device or resource busy: '.nfs00000000a9f8735600000090' ../../conda/envs/fre_cli/lib/python3.9/shutil.py:688: OSError ================================================================================ short test summary info ================================================================================= FAILED fre/cmor/tests/test_cmor_run_subtool.py::test_setup_fre_cmor_run_subtool_case2 - OSError: [Errno 16] Device or resource busy: '.nfs00000000a9f8735600000090' !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! ============================================================================== 1 failed, 5 passed in 2.65s ===============================================================================

leaving this unresolved for now. the directory re-creation for the automatic test pipeline is a nice-to-have, not essential. as if you remove the directory by hand after erroring, and restart at this failed test, the test succeeds as expected.

largely that re-tests the directory creation that already succeeded in the prev test, so its even a tad redundant.

another reason to leave it unresolved- i suspect this is cmor's fault...

fre/cmor/cmor_mixer.py

ilaflott · 2024-10-22T20:05:18Z

this will close #204

… gimme the bugs and i squish em

ilaflott · 2024-10-23T13:28:20Z

come a long way havent we...

…using the included test files are shown. dependency on external configs from PCMDI made clear

…changed

ilaflott · 2024-10-23T16:50:39Z

regarding meniton in NOAA-GFDL/CatalogBuilder#79

links in commit point to https://github.com/NOAA-GFDL/MDTF-diagnostics/tree/main/data/gfdl-cmor-tables

note submodule dep: https://github.com/NOAA-GFDL/MDTF-diagnostics/tree/main/data/cmip6-cmor-tables comes from https://github.com/PCMDI/cmip6-cmor-tables

ceblanton

Your code deserved a better review than what I did. I didn't look hard at the testing additions, but the rest looked great with the few comments I had. I don't think any of the comments are deal-breakers tho for taking this.

fre/cmor/cmor_mixer.py

ceblanton · 2024-10-29T17:13:56Z

fre/cmor/cmor_mixer.py

+            print(f'(cmor_run_subtool) WARNING: local_var == {local_var} '
+                  f'!= {target_var} == target_var')
+            print(f'i am expecting {local_var} to be in the filename, and i expect the variable'
+                  f' in that file to be {target_var}')


I think I'm following but disagree with the message here. Shouldn't it be:

I am expecting {local_var} to be in the filename, and I expect the variable in that file to be {local_var}. The output file will be named {target_var}.

filenames are just filenames- they can be whatever people want! the variable in the filename doesnt have to match what is in that file.

I understood the var-list input to be key/value pairs where the key could be named var or varV2 or varFIX, etc so the model runner in mind has some flexibility. The value of that key should always be the variable that the MIP table wants/expects, and the cmor module sets the output filename based on that table

ilaflott · 2024-10-29T18:41:21Z

@ceblanton has good feedback here, most of which is actually not straightforward to address, or involves some creative decision making. given that this PR is gigantic but functional, i'm inclined to push that feedback forward for now, and spin those up into issues and/or dev efforts on the next PR #218

fix datetime array built from files targeted

944eb07

ilaflott self-assigned this Oct 11, 2024

ilaflott linked an issue Oct 11, 2024 that may be closed by this pull request

Improve fre cmor run functionality #204

Closed

ilaflott added 15 commits October 16, 2024 11:38

refactor filename loop/routines in cmor_mixer a little bit. slightly …

c69b307

…change test succeess condition to reflect WHAT WE WANT and NOT what is currently happening...

new test cases, another test text file

0926ccd

adjust pylint calls to be a little more kind on the max number of arg…

797417a

…s, and ignore incoming cmor module warnings thrown by pylint that we cannot prevent nor control

updates to variable names and some print statements. argument names, …

52b6729

…what kinds of objects are passed to which function, etc. theres definitely some easy confusion regarding the key/value pair in varlist

fix readme to display content as markdown correctly.

ebebae9

Merge branch 'main' into 204-improve-fre-cmor-run-functionality

e9b73d7

more detailed doc strings iwth argument descriptions

933dd6b

more detailed doc strings iwth argument descriptions

55f6137

adjust click-based call to _cmor_run_subtool entrypoint function. kee…

b2c3d89

…p command line options the same while adjusting python module argument names, switch from context forward to context invoke to map the arg names

adjust cli test calls, in particular the full_outputdir. put comment …

e00e90c

…by click.echo of result.stderr to remind myself why thats commented out in the first place. not capd sep err

make case2 checks as detailed. clean up best i can (nfs mount error w…

242f13a

…hen trying to remove tmp dir or outdir...), the output files ARE still being recreated at least. better control over output directories now as well.

pylint feedback

4478d51

ilaflott requested a review from ceblanton October 22, 2024 19:35