Redesigned dataset_compliance w/ standard names validation#373
Redesigned dataset_compliance w/ standard names validation#373sadielbartholomew wants to merge 168 commits intoNCAS-CMS:mainfrom
dataset_compliance w/ standard names validation#373Conversation
f57edc9 to
eacd492
Compare
…advised (for now)
|
Linting CI job is failing due to issues with the service ("Our services aren't available right now") so please ignore that - I have run pre-commit on the final PR pre-review state as above and (excluding doc-string formatting which I've chosen to ignore in the interest of time) everything passes. |
davidhassell
left a comment
There was a problem hiding this comment.
Hi Sadie - a first pass.
Logically looks good (I think - netcdfread.py changes are pretty hard to follow :).
Structurally, I think checker.py should be in read_write/netcdf/, as commented.
I'm going to submit this now, but would still like to look some more at netcdfread.py for my own education.
|
|
||
| def _make_ugrid_1(filename): | ||
| """Create a UGRID file with a 2-d mesh topology.""" | ||
| def _make_ugrid_1(filename, standard_names): |
There was a problem hiding this comment.
Can we make it so that the correct standard names are in this function? Then do stuff like
if standard_names:
Mesh2_node_x.standard_name = standard_names[0]
else:
Mesh2_node_x.standard_name = "longitude"There was a problem hiding this comment.
That would be better, so happy to otherwise, but actually by this time that I am getting back onto this work, with v1.13.1.0 we have the ability to write UGRID datasets so I can avoid having to change the create_test_files code at all and produce the UGRID test file with changed named programmatically in the test (which is nicer, unless you can foresee a reason that we'd want to keep the standard names changing kwarg on the function).
I'll make this change in a new commit (or two) and point to those in a fresh comment, once done.
| def _check_cell_methods(self, field_ncvar, cell_method): | ||
| """Check the cell methods. | ||
|
|
||
| .. versionadded:: (cfdm) NEXTVERSION | ||
|
|
||
| """ | ||
| # TODO SLB unclear how to check on cell methods, will leave | ||
| # for now. | ||
| # self._check_standard_names( | ||
| # field_ncvar, | ||
| # field_ncvar, | ||
| # # self.read_vars["variable_attributes"][field_ncvar]["cell_methods"], | ||
| # ) |
There was a problem hiding this comment.
This is currently sort embroiled within _parse_cell_methods
Co-authored-by: David Hassell <davidhassell@users.noreply.github.com>
Co-authored-by: David Hassell <davidhassell@users.noreply.github.com>
Close #366 by setting up discussed data structure to close #365, reporting invalid standard names in the new output structure, as indicated in #365 (comment).
Is quite a hefty PR with a tragic amount of commits, so happy to squash down the first ~50-100 of these, which were mostly development (and/or investigative behaviour) commits which were incrementally updated as we revised our idea for the Conformance Data Model (see UML diagram in #365 (comment)).
Some minor follow on work when we have time to restart conformance work is to:
Outstanding questions
Aspects I am unsure about / questions:
1..*NonConformanceCaseforAttributeNonConformanceas per our UML - I think in practice the non-conformance could be further down the chain, not a direct association - so I think this should be0..*and that is what this PR code assumes (does that make sense?).Review guidance
Structure of new
conformancemoduleUML diagrams generated with
pyreverse, though note they only include theconformancemodule separate to the wholecfdmmodule, so don't pick up on external connections notably to all dataset reading logic especiallyNetCDFRead. But could be a useful overview:Packages
Classes
Notes on PR and approach
conformancewhich is based on a Conformance Data Model._check_*and_ugrid_check_*method fromnetcdfreadto the new dedicated submoduleconformance.checker.conformance.reporting.as_report_fragmentis the ultimate main method from thedatamodelmodule to note fordataset_compliance- it generates adictby recursively operating on all relevant*NonConformanceobjects with the same method defined, to generate a structure from all of the dict fragments resulting in the possibly (heavily-)nested output.Advice on how to review
conformance.checker._check_*and_ugrid_check_*methods, which is just to add_check_standard_nameand_include_component_reportcalls in the right places. To make reviewing easier I copied themainpost-merge state of those methods innetcdfreadand then made any changes to those once moved in 33786f5, with some further additions necessary for tweaks and fixes, so please rungit diff 0a3be736b85cebea58587844cc887beff9cfc497 checker.pyto see and review all updates to the checking methods previously living innetcdfread.Representative outputs
As per the new test module
test_compliance_checking.py, we test on a non-UGRID 'kitchen sink' and a UGRID field with the expected outputs as follows, abiding by the Conformance Data Model:Kitchen sink non-UGRID field
{'CF version': '1.13', 'ta': {'attributes': {'ancillary_variables': {'value': 'air_temperature_standard_error', 'variables': {'air_temperature_standard_error': {'attributes': {'standard_name': {'non-conformance': [{'code': 400022, 'reason': 'standard_name ' 'attribute ' 'has ' 'a ' 'value ' 'that ' 'is ' 'not ' 'a ' 'valid ' 'name ' 'contained ' 'in ' 'the ' 'current ' 'standard ' 'name ' 'table'}], 'value': 'badname_air_temperature_standard_error'}}}}}, 'cell_measures': {'value': 'cell_measure', 'variables': {'cell_measure': {'attributes': {'standard_name': {'non-conformance': [{'code': 400022, 'reason': 'standard_name ' 'attribute ' 'has ' 'a ' 'value ' 'that ' 'is ' 'not ' 'a ' 'valid ' 'name ' 'contained ' 'in ' 'the ' 'current ' 'standard ' 'name ' 'table'}], 'value': 'badname_cell_measure'}}}}}, 'coordinates': {'value': 'time', 'variables': {'auxiliary': {'attributes': {'standard_name': {'non-conformance': [{'code': 400022, 'reason': 'standard_name ' 'attribute ' 'has ' 'a ' 'value ' 'that ' 'is ' 'not ' 'a ' 'valid ' 'name ' 'contained ' 'in ' 'the ' 'current ' 'standard ' 'name ' 'table'}], 'value': 'badname_auxiliary'}}}, 'latitude_1': {'attributes': {'standard_name': {'non-conformance': [{'code': 400022, 'reason': 'standard_name ' 'attribute ' 'has ' 'a ' 'value ' 'that ' 'is ' 'not ' 'a ' 'valid ' 'name ' 'contained ' 'in ' 'the ' 'current ' 'standard ' 'name ' 'table'}], 'value': 'badname_latitude_1'}}}, 'longitude_1': {'attributes': {'standard_name': {'non-conformance': [{'code': 400022, 'reason': 'standard_name ' 'attribute ' 'has ' 'a ' 'value ' 'that ' 'is ' 'not ' 'a ' 'valid ' 'name ' 'contained ' 'in ' 'the ' 'current ' 'standard ' 'name ' 'table'}], 'value': 'badname_longitude_1'}}}, 'time': {'attributes': {'standard_name': {'non-conformance': [{'code': 400022, 'reason': 'standard_name ' 'attribute ' 'has ' 'a ' 'value ' 'that ' 'is ' 'not ' 'a ' 'valid ' 'name ' 'contained ' 'in ' 'the ' 'current ' 'standard ' 'name ' 'table'}], 'value': 'badname_time'}}}}}, 'standard_name': {'non-conformance': [{'code': 400022, 'reason': 'standard_name ' 'attribute ' 'has a ' 'value ' 'that ' 'is ' 'not a ' 'valid ' 'name ' 'contained ' 'in ' 'the ' 'current ' 'standard ' 'name ' 'table'}], 'value': 'badname_ta'}}}}UGRID field
{'CF version': '1.13', 'pa': {'attributes': {'mesh': {'value': 'Mesh2', 'variables': {'Mesh2': {'attributes': {'edge_node_connectivity': {'value': 'Mesh2_edge_nodes', 'variables': {'Mesh2_edge_nodes': {'attributes': {'standard_name': {'non-conformance': [{'code': 400022, 'reason': 'standard_name ' 'attribute ' 'has ' 'a ' 'value ' 'that ' 'is ' 'not ' 'a ' 'valid ' 'name ' 'contained ' 'in ' 'the ' 'current ' 'standard ' 'name ' 'table'}], 'value': 'badname_Mesh2_edge_nodes'}}}}}, 'face_face_connectivity': {'value': 'Mesh2_face_links', 'variables': {'Mesh2_face_links': {'attributes': {'standard_name': {'non-conformance': [{'code': 400022, 'reason': 'standard_name ' 'attribute ' 'has ' 'a ' 'value ' 'that ' 'is ' 'not ' 'a ' 'valid ' 'name ' 'contained ' 'in ' 'the ' 'current ' 'standard ' 'name ' 'table'}], 'value': 'badname_Mesh2_face_links'}}}}}, 'face_node_connectivity': {'value': 'Mesh2_face_nodes', 'variables': {'Mesh2_face_nodes': {'attributes': {'standard_name': {'non-conformance': [{'code': 400022, 'reason': 'standard_name ' 'attribute ' 'has ' 'a ' 'value ' 'that ' 'is ' 'not ' 'a ' 'valid ' 'name ' 'contained ' 'in ' 'the ' 'current ' 'standard ' 'name ' 'table'}], 'value': 'badname_Mesh2_face_nodes'}}}}}, 'standard_name': {'non-conformance': [{'code': 400022, 'reason': 'standard_name ' 'attribute ' 'has ' 'a ' 'value ' 'that ' 'is ' 'not ' 'a ' 'valid ' 'name ' 'contained ' 'in ' 'the ' 'current ' 'standard ' 'name ' 'table'}], 'value': 'badname_Mesh2'}}}}}, 'standard_name': {'non-conformance': [{'code': 400022, 'reason': 'standard_name ' 'attribute ' 'has a ' 'value ' 'that ' 'is ' 'not a ' 'valid ' 'name ' 'contained ' 'in ' 'the ' 'current ' 'standard ' 'name ' 'table'}], 'value': 'badname_air_pressure'}}}}