Skip to content

Redesigned dataset_compliance w/ standard names validation#373

Open
sadielbartholomew wants to merge 168 commits intoNCAS-CMS:mainfrom
sadielbartholomew:validate-standard-names
Open

Redesigned dataset_compliance w/ standard names validation#373
sadielbartholomew wants to merge 168 commits intoNCAS-CMS:mainfrom
sadielbartholomew:validate-standard-names

Conversation

@sadielbartholomew
Copy link
Copy Markdown
Member

@sadielbartholomew sadielbartholomew commented Dec 19, 2025

Close #366 by setting up discussed data structure to close #365, reporting invalid standard names in the new output structure, as indicated in #365 (comment).

Is quite a hefty PR with a tragic amount of commits, so happy to squash down the first ~50-100 of these, which were mostly development (and/or investigative behaviour) commits which were incrementally updated as we revised our idea for the Conformance Data Model (see UML diagram in #365 (comment)).

Some minor follow on work when we have time to restart conformance work is to:

Outstanding questions

Aspects I am unsure about / questions:

  • cell methods and how to report about issues on those;
  • whether the Data Model should have 1..* NonConformanceCase for AttributeNonConformance as per our UML - I think in practice the non-conformance could be further down the chain, not a direct association - so I think this should be 0..* and that is what this PR code assumes (does that make sense?).

Review guidance

Structure of new conformance module

UML diagrams generated with pyreverse, though note they only include the conformance module separate to the whole cfdm module, so don't pick up on external connections notably to all dataset reading logic especially NetCDFRead. But could be a useful overview:

Packages

packages_conf-final-conformancedir

Classes

classes_conf-final-conformancedir

Notes on PR and approach

  1. As discussed in person during development, the new conformance checking logic is implemented using a new submodule conformance which is based on a Conformance Data Model.
  2. Towards separation of concerns, I have moved all _check_* and _ugrid_check_* method from netcdfread to the new dedicated submodule conformance.checker.
  3. And any reporting-related functionality is in conformance.reporting. as_report_fragment is the ultimate main method from the datamodel module to note for dataset_compliance - it generates a dict by recursively operating on all relevant *NonConformance objects with the same method defined, to generate a structure from all of the dict fragments resulting in the possibly (heavily-)nested output.

Advice on how to review

  • Best review the code changes as a whole (not on an individual commit basis - there are too many and a bit of a mess due to the moving nature of development goals, sorry!), though note the below regarding reviewing conformance.checker.
  • Given (2) above, I realised on later merge conflict resolution that it would be difficult to see what changes I made to the _check_* and _ugrid_check_* methods, which is just to add _check_standard_name and _include_component_report calls in the right places. To make reviewing easier I copied the main post-merge state of those methods in netcdfread and then made any changes to those once moved in 33786f5, with some further additions necessary for tweaks and fixes, so please run git diff 0a3be736b85cebea58587844cc887beff9cfc497 checker.py to see and review all updates to the checking methods previously living in netcdfread.

Representative outputs

As per the new test module test_compliance_checking.py, we test on a non-UGRID 'kitchen sink' and a UGRID field with the expected outputs as follows, abiding by the Conformance Data Model:

Kitchen sink non-UGRID field

{'CF version': '1.13',
 'ta': {'attributes': {'ancillary_variables': {'value': 'air_temperature_standard_error',
                                               'variables': {'air_temperature_standard_error': {'attributes': {'standard_name': {'non-conformance': [{'code': 400022,
                                                                                                                                                      'reason': 'standard_name '
                                                                                                                                                                'attribute '
                                                                                                                                                                'has '
                                                                                                                                                                'a '
                                                                                                                                                                'value '
                                                                                                                                                                'that '
                                                                                                                                                                'is '
                                                                                                                                                                'not '
                                                                                                                                                                'a '
                                                                                                                                                                'valid '
                                                                                                                                                                'name '
                                                                                                                                                                'contained '
                                                                                                                                                                'in '
                                                                                                                                                                'the '
                                                                                                                                                                'current '
                                                                                                                                                                'standard '
                                                                                                                                                                'name '
                                                                                                                                                                'table'}],
                                                                                                                                 'value': 'badname_air_temperature_standard_error'}}}}},
                       'cell_measures': {'value': 'cell_measure',
                                         'variables': {'cell_measure': {'attributes': {'standard_name': {'non-conformance': [{'code': 400022,
                                                                                                                              'reason': 'standard_name '
                                                                                                                                        'attribute '
                                                                                                                                        'has '
                                                                                                                                        'a '
                                                                                                                                        'value '
                                                                                                                                        'that '
                                                                                                                                        'is '
                                                                                                                                        'not '
                                                                                                                                        'a '
                                                                                                                                        'valid '
                                                                                                                                        'name '
                                                                                                                                        'contained '
                                                                                                                                        'in '
                                                                                                                                        'the '
                                                                                                                                        'current '
                                                                                                                                        'standard '
                                                                                                                                        'name '
                                                                                                                                        'table'}],
                                                                                                         'value': 'badname_cell_measure'}}}}},
                       'coordinates': {'value': 'time',
                                       'variables': {'auxiliary': {'attributes': {'standard_name': {'non-conformance': [{'code': 400022,
                                                                                                                         'reason': 'standard_name '
                                                                                                                                   'attribute '
                                                                                                                                   'has '
                                                                                                                                   'a '
                                                                                                                                   'value '
                                                                                                                                   'that '
                                                                                                                                   'is '
                                                                                                                                   'not '
                                                                                                                                   'a '
                                                                                                                                   'valid '
                                                                                                                                   'name '
                                                                                                                                   'contained '
                                                                                                                                   'in '
                                                                                                                                   'the '
                                                                                                                                   'current '
                                                                                                                                   'standard '
                                                                                                                                   'name '
                                                                                                                                   'table'}],
                                                                                                    'value': 'badname_auxiliary'}}},
                                                     'latitude_1': {'attributes': {'standard_name': {'non-conformance': [{'code': 400022,
                                                                                                                          'reason': 'standard_name '
                                                                                                                                    'attribute '
                                                                                                                                    'has '
                                                                                                                                    'a '
                                                                                                                                    'value '
                                                                                                                                    'that '
                                                                                                                                    'is '
                                                                                                                                    'not '
                                                                                                                                    'a '
                                                                                                                                    'valid '
                                                                                                                                    'name '
                                                                                                                                    'contained '
                                                                                                                                    'in '
                                                                                                                                    'the '
                                                                                                                                    'current '
                                                                                                                                    'standard '
                                                                                                                                    'name '
                                                                                                                                    'table'}],
                                                                                                     'value': 'badname_latitude_1'}}},
                                                     'longitude_1': {'attributes': {'standard_name': {'non-conformance': [{'code': 400022,
                                                                                                                           'reason': 'standard_name '
                                                                                                                                     'attribute '
                                                                                                                                     'has '
                                                                                                                                     'a '
                                                                                                                                     'value '
                                                                                                                                     'that '
                                                                                                                                     'is '
                                                                                                                                     'not '
                                                                                                                                     'a '
                                                                                                                                     'valid '
                                                                                                                                     'name '
                                                                                                                                     'contained '
                                                                                                                                     'in '
                                                                                                                                     'the '
                                                                                                                                     'current '
                                                                                                                                     'standard '
                                                                                                                                     'name '
                                                                                                                                     'table'}],
                                                                                                      'value': 'badname_longitude_1'}}},
                                                     'time': {'attributes': {'standard_name': {'non-conformance': [{'code': 400022,
                                                                                                                    'reason': 'standard_name '
                                                                                                                              'attribute '
                                                                                                                              'has '
                                                                                                                              'a '
                                                                                                                              'value '
                                                                                                                              'that '
                                                                                                                              'is '
                                                                                                                              'not '
                                                                                                                              'a '
                                                                                                                              'valid '
                                                                                                                              'name '
                                                                                                                              'contained '
                                                                                                                              'in '
                                                                                                                              'the '
                                                                                                                              'current '
                                                                                                                              'standard '
                                                                                                                              'name '
                                                                                                                              'table'}],
                                                                                               'value': 'badname_time'}}}}},
                       'standard_name': {'non-conformance': [{'code': 400022,
                                                              'reason': 'standard_name '
                                                                        'attribute '
                                                                        'has a '
                                                                        'value '
                                                                        'that '
                                                                        'is '
                                                                        'not a '
                                                                        'valid '
                                                                        'name '
                                                                        'contained '
                                                                        'in '
                                                                        'the '
                                                                        'current '
                                                                        'standard '
                                                                        'name '
                                                                        'table'}],
                                         'value': 'badname_ta'}}}}

UGRID field

{'CF version': '1.13',
 'pa': {'attributes': {'mesh': {'value': 'Mesh2',
                                'variables': {'Mesh2': {'attributes': {'edge_node_connectivity': {'value': 'Mesh2_edge_nodes',
                                                                                                  'variables': {'Mesh2_edge_nodes': {'attributes': {'standard_name': {'non-conformance': [{'code': 400022,
                                                                                                                                                                                           'reason': 'standard_name '
                                                                                                                                                                                                     'attribute '
                                                                                                                                                                                                     'has '
                                                                                                                                                                                                     'a '
                                                                                                                                                                                                     'value '
                                                                                                                                                                                                     'that '
                                                                                                                                                                                                     'is '
                                                                                                                                                                                                     'not '
                                                                                                                                                                                                     'a '
                                                                                                                                                                                                     'valid '
                                                                                                                                                                                                     'name '
                                                                                                                                                                                                     'contained '
                                                                                                                                                                                                     'in '
                                                                                                                                                                                                     'the '
                                                                                                                                                                                                     'current '
                                                                                                                                                                                                     'standard '
                                                                                                                                                                                                     'name '
                                                                                                                                                                                                     'table'}],
                                                                                                                                                                      'value': 'badname_Mesh2_edge_nodes'}}}}},
                                                                       'face_face_connectivity': {'value': 'Mesh2_face_links',
                                                                                                  'variables': {'Mesh2_face_links': {'attributes': {'standard_name': {'non-conformance': [{'code': 400022,
                                                                                                                                                                                           'reason': 'standard_name '
                                                                                                                                                                                                     'attribute '
                                                                                                                                                                                                     'has '
                                                                                                                                                                                                     'a '
                                                                                                                                                                                                     'value '
                                                                                                                                                                                                     'that '
                                                                                                                                                                                                     'is '
                                                                                                                                                                                                     'not '
                                                                                                                                                                                                     'a '
                                                                                                                                                                                                     'valid '
                                                                                                                                                                                                     'name '
                                                                                                                                                                                                     'contained '
                                                                                                                                                                                                     'in '
                                                                                                                                                                                                     'the '
                                                                                                                                                                                                     'current '
                                                                                                                                                                                                     'standard '
                                                                                                                                                                                                     'name '
                                                                                                                                                                                                     'table'}],
                                                                                                                                                                      'value': 'badname_Mesh2_face_links'}}}}},
                                                                       'face_node_connectivity': {'value': 'Mesh2_face_nodes',
                                                                                                  'variables': {'Mesh2_face_nodes': {'attributes': {'standard_name': {'non-conformance': [{'code': 400022,
                                                                                                                                                                                           'reason': 'standard_name '
                                                                                                                                                                                                     'attribute '
                                                                                                                                                                                                     'has '
                                                                                                                                                                                                     'a '
                                                                                                                                                                                                     'value '
                                                                                                                                                                                                     'that '
                                                                                                                                                                                                     'is '
                                                                                                                                                                                                     'not '
                                                                                                                                                                                                     'a '
                                                                                                                                                                                                     'valid '
                                                                                                                                                                                                     'name '
                                                                                                                                                                                                     'contained '
                                                                                                                                                                                                     'in '
                                                                                                                                                                                                     'the '
                                                                                                                                                                                                     'current '
                                                                                                                                                                                                     'standard '
                                                                                                                                                                                                     'name '
                                                                                                                                                                                                     'table'}],
                                                                                                                                                                      'value': 'badname_Mesh2_face_nodes'}}}}},
                                                                       'standard_name': {'non-conformance': [{'code': 400022,
                                                                                                              'reason': 'standard_name '
                                                                                                                        'attribute '
                                                                                                                        'has '
                                                                                                                        'a '
                                                                                                                        'value '
                                                                                                                        'that '
                                                                                                                        'is '
                                                                                                                        'not '
                                                                                                                        'a '
                                                                                                                        'valid '
                                                                                                                        'name '
                                                                                                                        'contained '
                                                                                                                        'in '
                                                                                                                        'the '
                                                                                                                        'current '
                                                                                                                        'standard '
                                                                                                                        'name '
                                                                                                                        'table'}],
                                                                                         'value': 'badname_Mesh2'}}}}},
                       'standard_name': {'non-conformance': [{'code': 400022,
                                                              'reason': 'standard_name '
                                                                        'attribute '
                                                                        'has a '
                                                                        'value '
                                                                        'that '
                                                                        'is '
                                                                        'not a '
                                                                        'valid '
                                                                        'name '
                                                                        'contained '
                                                                        'in '
                                                                        'the '
                                                                        'current '
                                                                        'standard '
                                                                        'name '
                                                                        'table'}],
                                         'value': 'badname_air_pressure'}}}}

@sadielbartholomew sadielbartholomew marked this pull request as ready for review January 28, 2026 21:35
@sadielbartholomew
Copy link
Copy Markdown
Member Author

Linting CI job is failing due to issues with the service ("Our services aren't available right now") so please ignore that - I have run pre-commit on the final PR pre-review state as above and (excluding doc-string formatting which I've chosen to ignore in the interest of time) everything passes.

@davidhassell davidhassell added this to the NEXTVERSION milestone Feb 18, 2026
Copy link
Copy Markdown
Contributor

@davidhassell davidhassell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Sadie - a first pass.

Logically looks good (I think - netcdfread.py changes are pretty hard to follow :).

Structurally, I think checker.py should be in read_write/netcdf/, as commented.

I'm going to submit this now, but would still like to look some more at netcdfread.py for my own education.

Comment thread cfdm/conformance/datamodel.py Outdated
Comment thread cfdm/conformance/standardnames.py Outdated
Comment thread Changelog.rst Outdated

def _make_ugrid_1(filename):
"""Create a UGRID file with a 2-d mesh topology."""
def _make_ugrid_1(filename, standard_names):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make it so that the correct standard names are in this function? Then do stuff like

            if standard_names:
                Mesh2_node_x.standard_name = standard_names[0]
            else:
                Mesh2_node_x.standard_name = "longitude"

Copy link
Copy Markdown
Member Author

@sadielbartholomew sadielbartholomew Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be better, so happy to otherwise, but actually by this time that I am getting back onto this work, with v1.13.1.0 we have the ability to write UGRID datasets so I can avoid having to change the create_test_files code at all and produce the UGRID test file with changed named programmatically in the test (which is nicer, unless you can foresee a reason that we'd want to keep the standard names changing kwarg on the function).

I'll make this change in a new commit (or two) and point to those in a fresh comment, once done.

Comment thread cfdm/conformance/checker.py
Comment thread cfdm/conformance/datamodel.py Outdated
Comment thread cfdm/conformance/datamodel.py Outdated
Comment thread cfdm/conformance/datamodel.py Outdated
Comment thread cfdm/conformance/checker.py Outdated
Comment thread cfdm/conformance/checker.py Outdated
Comment on lines +336 to +348
def _check_cell_methods(self, field_ncvar, cell_method):
"""Check the cell methods.

.. versionadded:: (cfdm) NEXTVERSION

"""
# TODO SLB unclear how to check on cell methods, will leave
# for now.
# self._check_standard_names(
# field_ncvar,
# field_ncvar,
# # self.read_vars["variable_attributes"][field_ncvar]["cell_methods"],
# )
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is currently sort embroiled within _parse_cell_methods

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

conformance enhancement New feature or request UGRID Relating to UGRID mesh topologies

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Compliance reporting: flag any invalid standard names Output for Field.dataset_compliance towards a CF Checker

2 participants