Skip to content

Add a reader for NWC SAF GEO HRW data #3070

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
Mar 21, 2025
Merged

Conversation

pnuu
Copy link
Member

@pnuu pnuu commented Feb 21, 2025

This PR adds a reader for the High Resolution Winds data from NWC SAF GEO.

The data structure is very complex, and due to the unsupported compound data type can't be opened with xr.open_dataset(). Because there are 259 datasets, I've made the dataset definitions dynamic instead of putting them into the reader YAML. The code is in a separate file because the internal structure is completely different to the other NWC SAF GEO products (see the linked issue).

By default the file handler reads the datasets separately for each imaging channel. That is, the datasets are named wind_vis06_air_pressure, wind_hrvis_wind_speed, and so on. The prefix is the name of the channel within the files.

The user can also supply reader_kwargs={"merge_channels": True} to collect all the data together. In this case the datasets are named without the prefix, such as air_pressure, wind_speed, etc.

@pnuu pnuu added enhancement code enhancements, features, improvements component:readers labels Feb 21, 2025
@pnuu pnuu self-assigned this Feb 21, 2025
Copy link

codecov bot commented Feb 21, 2025

Codecov Report

Attention: Patch coverage is 97.73756% with 5 lines in your changes missing coverage. Please review.

Project coverage is 96.15%. Comparing base (128bb9e) to head (6e690d9).
Report is 47 commits behind head on main.

Files with missing lines Patch % Lines
satpy/readers/nwcsaf_hrw_nc.py 96.00% 5 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff            @@
##             main    #3070    +/-   ##
========================================
  Coverage   96.14%   96.15%            
========================================
  Files         383      385     +2     
  Lines       55798    56021   +223     
========================================
+ Hits        53649    53867   +218     
- Misses       2149     2154     +5     
Flag Coverage Δ
behaviourtests 3.86% <0.00%> (-0.02%) ⬇️
unittests 96.24% <97.73%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@pnuu
Copy link
Member Author

pnuu commented Feb 24, 2025

I reduced the number of data rows that were written to the test data. With the original 1234 data points it took 19 seconds to run the tests on my laptop. Increasing to 12345 the tests took 170 seconds and with the current 123 only 2 seconds.

@pnuu
Copy link
Member Author

pnuu commented Feb 24, 2025

I'll have a look at adding a kwarg to merge the different channel observations, so the user could do something like

scn = Scene(reader="nwcsaf-geo", filenames=filenames, reader_kwargs={"merge_channels": True})
scn.load(["wind_speed", "wind_from_direction"])

instead of loading each of the channels (wind_vis06_wind_speed, wind_vis08_wind_speed, etc.) separately.

@pnuu
Copy link
Member Author

pnuu commented Feb 24, 2025

Also some documentation added.

@coveralls
Copy link

coveralls commented Feb 27, 2025

Pull Request Test Coverage Report for Build 13971092899

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

Details

  • 218 of 223 (97.76%) changed or added relevant lines in 2 files are covered.
  • 23 unchanged lines in 1 file lost coverage.
  • Overall coverage increased (+0.006%) to 96.259%

Changes Missing Coverage Covered Lines Changed/Added Lines %
satpy/readers/nwcsaf_hrw_nc.py 121 126 96.03%
Files with Coverage Reduction New Missed Lines %
satpy/writers/init.py 23 90.91%
Totals Coverage Status
Change from base Build 13624409281: 0.006%
Covered Lines: 54116
Relevant Lines: 56219

💛 - Coveralls

FILETYPE_INFO = {"file_type": "nc_nwcsaf_geo_hrw"}


@pytest.fixture
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
@pytest.fixture
@pytest.fixture(scope="module")

Unless I misunderstand the fixture, this should make it so it is only created once for all of this module's tests.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with the scope, but this will break tmp_path, you need to use tmp_path_factory instead https://docs.pytest.org/en/stable/how-to/tmp_path.html#the-tmp-path-factory-fixture

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adjusted in ba584f3

Copy link
Member

@djhoese djhoese left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a suggestion or two, but I really don't think I should have final say on this as I have no experience with nwcsaf readers. I'm marking my review as approve even though I requested a few things. Not doing my suggestions does not mean this file handler is broken or that it can't be merged, but it isn't as good as it could be 😉

with suppress(OSError):
self.h5f.close()

def available_datasets(self, configured_datasets=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The configured_datasets are not "forwarded on" as suggested in the base file handler:

for is_avail, ds_info in (configured_datasets or []):
if is_avail is not None:
# some other file handler said it has this dataset
# we don't know any more information than the previous
# file handler so let's yield early
yield is_avail, ds_info
continue
yield self.file_type_matches(ds_info["file_type"]), ds_info

Without this users will not be able to statically define datasets in the YAML.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So what to change?

Copy link
Member

@mraspaud mraspaud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a couple of small things, but otherwise LGTM

scn = Scene(reader="nwcsaf-geo", filenames=[filename])
pprint.pprint(scn.available_dataset_names())

This print all the available datasets. The truncated output of this is::
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This print all the available datasets. The truncated output of this is::
This prints all the available datasets. The truncated output of this is::

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 6e690d9

FILETYPE_INFO = {"file_type": "nc_nwcsaf_geo_hrw"}


@pytest.fixture
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with the scope, but this will break tmp_path, you need to use tmp_path_factory instead https://docs.pytest.org/en/stable/how-to/tmp_path.html#the-tmp-path-factory-fixture

Comment on lines +257 to +258
except ValueError:
logger.warning("Reading %s is not supported.", dataset_name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would have thought this raises a KeyError...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It raises ValueError when the data are there but the compound datatype is unreadable.

Copy link
Member

@mraspaud mraspaud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mraspaud mraspaud merged commit 1ded1dd into pytroll:main Mar 21, 2025
18 checks passed
@pnuu pnuu deleted the feature-nwc-geo-hrw branch March 21, 2025 12:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component:readers enhancement code enhancements, features, improvements
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Reader for NWC SAF HRW (high resolution winds) data
5 participants