-
Notifications
You must be signed in to change notification settings - Fork 16
Adds land-sea mask generation #783
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #783 +/- ##
==========================================
Coverage 100.00% 100.00%
==========================================
Files 16 18 +2
Lines 1782 1958 +176
==========================================
+ Hits 1782 1958 +176 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
@tomvothecoder @lee1043 @pochedls @chengzhuzhang As it was pointed out last meeting using |
ed5c9f2 to
993ae15
Compare
|
Notes from today's meeting (07/30/25)
|
|
@tomvothecoder @lee1043 @pochedls @chengzhuzhang Option 1: ds.geomask.regionmask("pr", keep="sea", ...)
ds.geomask.pcmdi("pr", keep="land", threshold1=0.5, threshold2=0.8, source=improved_navy_land)Option 2: ds.geomask.mask_land("pr", method="pcmdi", options={"threshold1": 0.5, "threshold2": 0.8, source: improved_navy_land}) |
|
I'd prefer a generic function where you specify the method (i.e., Option 2). I think there must be a way of avoiding passing in the arguments as a def mask_land(vid, method='pcmdi', threshold1=0.5, threshold2=0.8, source=improved_navy_land):
"""
...
Parameters:
------------
vid ...
method ...
threshold1 (float) : [...description...]. Parameter used with 'pcmdi' method (otherwise ignored).
Default value is 0.5.
...
"""
if method == 'pcmdi':
return _pcmdi_land_sea_mask(..., threshold1=threshold1, threshold2=threshold2, ...)
elif method == 'regionmask'
...
else:
raise ValueError('Invalid method supplied...')If the user supplies |
|
@pochedls I could do that but I'm worried it could be a little confusing/overwhelming. Confusing in the sense of no clear indication which method the arguments work for besides being in the docstring. Overwhelming if we end up adding additional methods or arguments for the methods. I could do it similar to how it was done in the regridder methods, where it's a def mask_sea(
self,
data_var: str,
method: str = "regionmask",
criteria: float | None = None,
mask: xr.DataArray | None = None,
output_mask: bool = False,
**options: Any,
): |
I agree with you @jasonb5. It can be confusing and hard for users to validate which parameters apply to which
I learned that this is a "hybrid dispatcher pattern" that is a very common and accepted approach in scientific Python libraries (e.g., Xarray, MatPlotLib, etc.). There is one public API with multiple backends that have different implements. It keeps the main entry point simple by defining the shared parameters upfront with optional arguments passed via I think this is a reasonable approach, as long as we document it well, use strict option checking, and structure the internal dispatch clearly. We already use this pattern in the regridder too. |
|
I am good with whatever way you choose, but I have to admit that it is inconvenient that the regridder arguments are embedded across several docstrings (that users may have trouble locating). In this issue/case, the docstring wouldn't actually be that long (see xr.open_mfdataset for an example of a pretty long docstring). I don't find it confusing that some arguments would apply under some conditions but not others (if it is noted in the docstring). With that said, I'm okay with docstrings being broken up, but I do have a preference for a main mask call (e.g., mask_land) versus two calls based on method (pcmdi or regionmask). |
|
@tomvothecoder Do we have a documentation section for methods not associated with an accessor? |
If you're referring to stand-alone functions, there is a Top-level API Functions section. We might want to split this section to sub-sections at some point, as it is growing pretty long. |
|
@tomvothecoder The function I want to add is not top-level. It's a function in |
I see. If you want to list that function in the docs, we can create a separate section for module-level functions. |
|
@tomvothecoder @lee1043 This can be tested now. I updated the documentation to include the module-level methods. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds support for generating land-sea masks using the regionmask and pcmdi methods. It introduces a new .geomask accessor for xarray Datasets that provides land and sea masking functionality.
- Adds land-sea mask generation with two methods: regionmask (default) and PCMDI
- Introduces a new MaskAccessor class with
.mask_land()and.mask_sea()methods - Refactors grid accessor code to extract reusable functions for grid dataset creation
Reviewed Changes
Copilot reviewed 9 out of 10 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| xcdat/mask.py | New module implementing land-sea mask generation functionality |
| xcdat/regridder/grid.py | Adds add_bounds parameter and dtype fix for Gaussian axis bounds |
| xcdat/regridder/accessor.py | Refactors grid property to use extracted helper functions |
| xcdat/init.py | Imports MaskAccessor to register the .geomask accessor |
| tests/test_mask.py | Comprehensive test suite for mask functionality |
| pyproject.toml | Adds regionmask dependency and package data configuration |
| docs/api.rst | Documents new mask API functions and accessor methods |
| conda-env/dev.yml | Adds regionmask dependency for development environment |
| conda-env/ci.yml | Adds regionmask dependency for CI environment |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
xcdat/mask.py
Outdated
| Generate a land-sea mask using the PCMDI method with a custom high resolution source: | ||
| >>> highres_ds = xcdata.open_dataset("/path/to/file") |
Copilot
AI
Sep 17, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"xcdata" should be "xcdat" in the example code.
| >>> highres_ds = xcdata.open_dataset("/path/to/file") | |
| >>> highres_ds = xcdat.open_dataset("/path/to/file") |
|
@jasonb5 Can you provide minimum reproducible examples with the two methods, ideally with some test data? This will help expedite the code review process. |
Actually, I can probably just create a script using the xcdat-data repo. Feel free to provide one if you have it though. |
a744528 to
a5407e3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jasonb5 My commit dcd1980 (#783) moves the navy_land.nc file to xcdat/resources so that it doesn't float at the top-level of the directory. It also simplifies the _get_resource_path() function and moves it to the utils.py file (with updated unit tests in test_utils.py)
I left review comments on the code diff for further clarification. Let me know if you agree with the changes. I am still reviewing changes to other files.
Changes
- Move
navy_land.nctoresources/and_get_resource_path()toutils.py - Refactor
_get_resource_path()and expand unit tests - Update
pcmdi_land_sea_mask()docstring and disable time decoding fornavy_land.nc - Adjust
.gitignoreandpyproject.tomlto include resources
EDIT: Explore hosting navy_land.nc file externally and downloading based on user need (my comment here).
|
@jasonb5 @lee1043 The |
@tomvothecoder Agreed, hosting the data somewhere and download only when it is needed sounds a good solution to me. |
I pushed |
|
@tomvothecoder Thanks for your work on this, I can add an example shortly. |
- Updated pcmdi_land_sea_mask() docstring and disabled time decoding for navy_land.nc - Replaced bundled navy_land.nc with pooch fetch-and-cache system - Removed mask_get_resource_path() and its tests - Added _data._get_pcmdi_mask_path() with corresponding tests - Updated environment configs to require pooch >=1.8 - Refreshed related docstrings
2671c8d to
56f0d61
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jasonb5 I just squashed my changes into a single commit so it is easier to review. Can you review my changes to your PR (56f0d61) and let me know if you agree.
Please refer to the commit message for more info. The code itself looks good to me. Thank you for working on this!
@lee1043 will test the functionality in the PMP. When he confirms it is good to go, we can proceed with merging.
|
This is awesome and will be a hit feature. I got this working with: import xcdat as xc
ds = xc.tutorial.open_dataset("tas_amon_access")
xc.mask.generate_mask(ds, 'tas', keep='sea')I was expecting that |
I just analyzed the code and confirm you're correct. I'm guessing the reason why @jasonb5 implemented I'm not opposed to adding a wrapper accessor method that enables @xr.register_dataset_accessor("geomask")
class MaskAccessor:
def __init__(self, ds):
self._ds = ds
def generate(self, var, keep="sea"):
return generate_mask(self._ds, var, keep=keep) |
|
It seems like this should be a dataset accessor (maybe under |
|
@tomvothecoder @pochedls That's a miss on my side. If this seems reasonable I'll update the accessor.
|
|
Decisions from today's meeting (09/24/25) --
|
|
Thank you @jasonb5 for the hard work! It works nicely on my end! It is very nice to have this feature enabled in xCDAT. For a note, it looks like Here I am sharing my test: test.html
|
@lee1043 and @jasonb5 if both of you are available for today's office hours, let's discuss these items. |
|
@lee1043 About the issue concerning |
|
@jasonb5 thanks for taking a look. I think |
|
@lee1043 Fixed generating a mask from a grid. I've also added an example notebook, if you'd like to review that as well. |
Description
Adds support for generating land-sea mask using the
regionmaskandpcmdimethods.Checklist
If applicable: