Skip to content

Conversation

@jasonb5
Copy link
Collaborator

@jasonb5 jasonb5 commented Jul 28, 2025

Description

Adds support for generating land-sea mask using the regionmask and pcmdi methods.

Checklist

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • My changes generate no new warnings
  • Any dependent changes have been merged and published in downstream modules

If applicable:

  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass with my changes (locally and CI/CD build)
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have noted that this is a breaking change for a major release (fix or feature that would cause existing functionality to not work as expected)

@codecov
Copy link

codecov bot commented Jul 28, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 100.00%. Comparing base (92b3d2e) to head (b8e1e40).
⚠️ Report is 3 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##              main      #783    +/-   ##
==========================================
  Coverage   100.00%   100.00%            
==========================================
  Files           16        18     +2     
  Lines         1782      1958   +176     
==========================================
+ Hits          1782      1958   +176     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@jasonb5
Copy link
Collaborator Author

jasonb5 commented Jul 28, 2025

@tomvothecoder @lee1043 @pochedls @chengzhuzhang As it was pointed out last meeting using mask as the accessor name will not work. It's currently named "geo_mask" temporarily. Any suggestions on a name or should we just moved the methods under another existing accessor?

@jasonb5 jasonb5 force-pushed the feature/576-landsea-mask branch from ed5c9f2 to 993ae15 Compare July 29, 2025 16:55
@tomvothecoder
Copy link
Collaborator

Notes from today's meeting (07/30/25)

  • Rename accessor to geomask
  • Rename accessor methods sea() to mask_sea() and land() to mask_land()
  • Open a separate GitHub issue to explore performance enhancements using numba

@jasonb5
Copy link
Collaborator Author

jasonb5 commented Jul 30, 2025

@tomvothecoder @lee1043 @pochedls @chengzhuzhang
This just occurred to me, it's currently non-trivial to pass method (regionmask or pcmdi) specific options.

Option 1:
I can change the accessor methods to be named by method and add keep argument.

ds.geomask.regionmask("pr", keep="sea", ...)

ds.geomask.pcmdi("pr", keep="land", threshold1=0.5, threshold2=0.8, source=improved_navy_land)

Option 2:
Add an options argument that will be passed to the underlying method.

ds.geomask.mask_land("pr", method="pcmdi", options={"threshold1": 0.5, "threshold2": 0.8, source: improved_navy_land})

@pochedls
Copy link
Collaborator

I'd prefer a generic function where you specify the method (i.e., Option 2). I think there must be a way of avoiding passing in the arguments as a Dict. Couldn't you do something like this (or would this violate best practices in some way):

def mask_land(vid, method='pcmdi', threshold1=0.5, threshold2=0.8, source=improved_navy_land):
    """
    ...
    Parameters:
    ------------
    vid ...
    method ...
    threshold1 (float) : [...description...]. Parameter used with 'pcmdi' method (otherwise ignored).
                            Default value is 0.5.
    ...
    """
    if method == 'pcmdi':
        return _pcmdi_land_sea_mask(..., threshold1=threshold1, threshold2=threshold2, ...)
    elif method == 'regionmask'
        ...
    else:
        raise ValueError('Invalid method supplied...')

If the user supplies method1 for regionmask, it just isn't passed in (and this is noted in the docstring). If the user uses the pcmdi method and supplies [does not supply] an argument for method1 then the their value [the default value] is used.

@jasonb5
Copy link
Collaborator Author

jasonb5 commented Jul 31, 2025

@pochedls I could do that but I'm worried it could be a little confusing/overwhelming. Confusing in the sense of no clear indication which method the arguments work for besides being in the docstring. Overwhelming if we end up adding additional methods or arguments for the methods.

I could do it similar to how it was done in the regridder methods, where it's a ** argument that just gets passed down and the there's a reference in the documentation to see the individual method docstrings for details.

    def mask_sea(
        self,
        data_var: str,
        method: str = "regionmask",
        criteria: float | None = None,
        mask: xr.DataArray | None = None,
        output_mask: bool = False,
        **options: Any,
    ):

@tomvothecoder
Copy link
Collaborator

@pochedls I could do that but I'm worried it could be a little confusing/overwhelming. Confusing in the sense of no clear indication which method the arguments work for besides being in the docstring. Overwhelming if we end up adding additional methods or arguments for the methods.

I agree with you @jasonb5. It can be confusing and hard for users to validate which parameters apply to which method, resulting in unexpected results. Also the function signature can be too long and IDE/type hints will be less useful.

I could do it similar to how it was done in the regridder methods, where it's a ** argument that just gets passed down and the there's a reference in the documentation to see the individual method docstrings for details.

    def mask_sea(
        self,
        data_var: str,
        method: str = "regionmask",
        criteria: float | None = None,
        mask: xr.DataArray | None = None,
        output_mask: bool = False,
        **options: Any,
    ):

I learned that this is a "hybrid dispatcher pattern" that is a very common and accepted approach in scientific Python libraries (e.g., Xarray, MatPlotLib, etc.). There is one public API with multiple backends that have different implements. It keeps the main entry point simple by defining the shared parameters upfront with optional arguments passed via **options.

I think this is a reasonable approach, as long as we document it well, use strict option checking, and structure the internal dispatch clearly. We already use this pattern in the regridder too.

@pochedls
Copy link
Collaborator

I am good with whatever way you choose, but I have to admit that it is inconvenient that the regridder arguments are embedded across several docstrings (that users may have trouble locating). In this issue/case, the docstring wouldn't actually be that long (see xr.open_mfdataset for an example of a pretty long docstring).

I don't find it confusing that some arguments would apply under some conditions but not others (if it is noted in the docstring).

With that said, I'm okay with docstrings being broken up, but I do have a preference for a main mask call (e.g., mask_land) versus two calls based on method (pcmdi or regionmask).

@github-actions github-actions bot added the type: docs Updates to documentation label Jul 31, 2025
@jasonb5
Copy link
Collaborator Author

jasonb5 commented Aug 1, 2025

@tomvothecoder Do we have a documentation section for methods not associated with an accessor?

@tomvothecoder
Copy link
Collaborator

@tomvothecoder Do we have a documentation section for methods not associated with an accessor?

If you're referring to stand-alone functions, there is a Top-level API Functions section. We might want to split this section to sub-sections at some point, as it is growing pretty long.

@jasonb5
Copy link
Collaborator Author

jasonb5 commented Aug 1, 2025

@tomvothecoder The function I want to add is not top-level. It's a function in mask.py that contains the documentation for the PCMDI method options. I was going to just link the function in the accessor methods docstring.

@tomvothecoder
Copy link
Collaborator

@tomvothecoder The function I want to add is not top-level. It's a function in mask.py that contains the documentation for the PCMDI method options. I was going to just link the function in the accessor methods docstring.

I see. If you want to list that function in the docs, we can create a separate section for module-level functions.

@jasonb5
Copy link
Collaborator Author

jasonb5 commented Aug 14, 2025

@tomvothecoder @lee1043 This can be tested now. I updated the documentation to include the module-level methods.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds support for generating land-sea masks using the regionmask and pcmdi methods. It introduces a new .geomask accessor for xarray Datasets that provides land and sea masking functionality.

  • Adds land-sea mask generation with two methods: regionmask (default) and PCMDI
  • Introduces a new MaskAccessor class with .mask_land() and .mask_sea() methods
  • Refactors grid accessor code to extract reusable functions for grid dataset creation

Reviewed Changes

Copilot reviewed 9 out of 10 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
xcdat/mask.py New module implementing land-sea mask generation functionality
xcdat/regridder/grid.py Adds add_bounds parameter and dtype fix for Gaussian axis bounds
xcdat/regridder/accessor.py Refactors grid property to use extracted helper functions
xcdat/init.py Imports MaskAccessor to register the .geomask accessor
tests/test_mask.py Comprehensive test suite for mask functionality
pyproject.toml Adds regionmask dependency and package data configuration
docs/api.rst Documents new mask API functions and accessor methods
conda-env/dev.yml Adds regionmask dependency for development environment
conda-env/ci.yml Adds regionmask dependency for CI environment

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

xcdat/mask.py Outdated
Generate a land-sea mask using the PCMDI method with a custom high resolution source:
>>> highres_ds = xcdata.open_dataset("/path/to/file")
Copy link

Copilot AI Sep 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"xcdata" should be "xcdat" in the example code.

Suggested change
>>> highres_ds = xcdata.open_dataset("/path/to/file")
>>> highres_ds = xcdat.open_dataset("/path/to/file")

Copilot uses AI. Check for mistakes.
@tomvothecoder
Copy link
Collaborator

@jasonb5 Can you provide minimum reproducible examples with the two methods, ideally with some test data? This will help expedite the code review process.

@tomvothecoder
Copy link
Collaborator

@jasonb5 Can you provide minimum reproducible examples with the two methods, ideally with some test data? This will help expedite the code review process.

Actually, I can probably just create a script using the xcdat-data repo. Feel free to provide one if you have it though.

@tomvothecoder tomvothecoder force-pushed the feature/576-landsea-mask branch from a744528 to a5407e3 Compare September 17, 2025 20:17
Copy link
Collaborator

@tomvothecoder tomvothecoder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jasonb5 My commit dcd1980 (#783) moves the navy_land.nc file to xcdat/resources so that it doesn't float at the top-level of the directory. It also simplifies the _get_resource_path() function and moves it to the utils.py file (with updated unit tests in test_utils.py)

I left review comments on the code diff for further clarification. Let me know if you agree with the changes. I am still reviewing changes to other files.

Changes

  • Move navy_land.nc to resources/ and _get_resource_path() to utils.py
  • Refactor _get_resource_path() and expand unit tests
  • Update pcmdi_land_sea_mask() docstring and disable time decoding for navy_land.nc
  • Adjust .gitignore and pyproject.toml to include resources

EDIT: Explore hosting navy_land.nc file externally and downloading based on user need (my comment here).

@tomvothecoder
Copy link
Collaborator

tomvothecoder commented Sep 17, 2025

@jasonb5 @lee1043 The navy_land.nc file is 8.96mb, which bloats the repo. We should host this file in xcdat-data and use pooch to download this file with caching if the user calls the pcmdi masking functionality. I don't think that many people will be using this masking method, considering it is specialized and regionmask is the default.

@lee1043
Copy link
Collaborator

lee1043 commented Sep 17, 2025

@jasonb5 @lee1043 The navy_land.nc file is 8.96mb, which bloats the repo. We should host this file in xcdat-data and use pooch to download this file with caching if the user calls the pcmdi masking functionality. I don't think that many people will be using this masking method, considering it is specialized and regionmask is the default.

@tomvothecoder Agreed, hosting the data somewhere and download only when it is needed sounds a good solution to me.

@tomvothecoder
Copy link
Collaborator

@jasonb5 @lee1043 The navy_land.nc file is 8.96mb, which bloats the repo. We should host this file in xcdat-data and use pooch to download this file with caching if the user calls the pcmdi masking functionality. I don't think that many people will be using this masking method, considering it is specialized and regionmask is the default.

I pushed 04cc38d (#783) to use pooch for fetching and caching navy_land.nc from xcdat-data (here) if the PCMDI land sea mask functionality is used. This makes the code cleaner and the bundle size is kept small.

@jasonb5
Copy link
Collaborator Author

jasonb5 commented Sep 18, 2025

@tomvothecoder Thanks for your work on this, I can add an example shortly.

- Updated pcmdi_land_sea_mask() docstring and disabled time decoding for navy_land.nc
- Replaced bundled navy_land.nc with pooch fetch-and-cache system
- Removed mask_get_resource_path() and its tests
- Added _data._get_pcmdi_mask_path() with corresponding tests
- Updated environment configs to require pooch >=1.8
- Refreshed related docstrings
@tomvothecoder tomvothecoder force-pushed the feature/576-landsea-mask branch from 2671c8d to 56f0d61 Compare September 18, 2025 21:23
Copy link
Collaborator

@tomvothecoder tomvothecoder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jasonb5 I just squashed my changes into a single commit so it is easier to review. Can you review my changes to your PR (56f0d61) and let me know if you agree.

Please refer to the commit message for more info. The code itself looks good to me. Thank you for working on this!

@lee1043 will test the functionality in the PMP. When he confirms it is good to go, we can proceed with merging.

@pochedls
Copy link
Collaborator

This is awesome and will be a hit feature. I got this working with:

import xcdat as xc

ds = xc.tutorial.open_dataset("tas_amon_access")
xc.mask.generate_mask(ds, 'tas', keep='sea')

I was expecting that generate_mask would operate on the dataset object, e.g., ds.mask.generate_mask('tas'), but it doesn't seem like mask is added as an accessor. Am I wrong about that?

@tomvothecoder
Copy link
Collaborator

This is awesome and will be a hit feature. I got this working with:

import xcdat as xc

ds = xc.tutorial.open_dataset("tas_amon_access")
xc.mask.generate_mask(ds, 'tas', keep='sea')

I was expecting that generate_mask would operate on the dataset object, e.g., ds.mask.generate_mask('tas'), but it doesn't seem like mask is added as an accessor. Am I wrong about that?

I just analyzed the code and confirm you're correct. I'm guessing the reason why @jasonb5 implemented generate_mask as a separate function is because it returns a new dataset object, rather than modify or extend the original dataset (aka "stateless").

I'm not opposed to adding a wrapper accessor method that enables ds.geomask.generate_mask("tas") like so:

@xr.register_dataset_accessor("geomask")
class MaskAccessor:
    def __init__(self, ds):
        self._ds = ds

    def generate(self, var, keep="sea"):
        return generate_mask(self._ds, var, keep=keep)

@pochedls
Copy link
Collaborator

It seems like this should be a dataset accessor (maybe under .spatial?). Also – if I am remembering right – this function does more than generate the mask; it also masks the data. Maybe the function should be called apply_mask or something?

@jasonb5
Copy link
Collaborator Author

jasonb5 commented Sep 24, 2025

@tomvothecoder @pochedls That's a miss on my side. If this seems reasonable I'll update the accessor.

  • mask_land: Returns Dataset with land masked on data_var.
  • mask_sea: Returns Dataset with sea masked on data_var.
  • generate_mask: Returns DataArray with mask.
  • Rename generate_mask to apply_mask.

@tomvothecoder
Copy link
Collaborator

Decisions from today's meeting (09/24/25) --

  • Move existing geomask accessor methods to spatial to improve usability
  • Keep shared utilities under mask.py for possible future extensibility (even if unlikely)

@lee1043
Copy link
Collaborator

lee1043 commented Oct 8, 2025

Thank you @jasonb5 for the hard work! It works nicely on my end! It is very nice to have this feature enabled in xCDAT.

For a note, it looks like generate_mask is renamed to generate_land_sea_mask, which is fine, just leaving a note for future reference.

Here I am sharing my test: test.html

  • It looks like ds.spatial.generate_land_sea_mask('tas', keep='sea') and ds.spatial.generate_land_sea_mask('tas', keep='land').plot() return identical mask, not sure if that was intended or not.
  • ds.spatial.mask_land('tas') and ds.spatial.mask_sea('tas') work as expected.
  • After generating grid using grid = xc.create_uniform_grid(-90, 90, 1, 0, 359, 1), couldn't find a way to generate a mask for it, wondering if there is any workaround.

@tomvothecoder
Copy link
Collaborator

Thank you @jasonb5 for the hard work! It works nicely on my end! It is very nice to have this feature enabled in xCDAT.

For a note, it looks like generate_mask is renamed to generate_land_sea_mask, which is fine, just leaving a note for future reference.

Here I am sharing my test: test.html

* It looks like `ds.spatial.generate_land_sea_mask('tas', keep='sea')` and `ds.spatial.generate_land_sea_mask('tas', keep='land').plot()` return identical mask, not sure if that was intended or not.

* `ds.spatial.mask_land('tas')` and `ds.spatial.mask_sea('tas')` work as expected.

* After generating grid using `grid = xc.create_uniform_grid(-90, 90, 1, 0, 359, 1)`, couldn't find a way to generate a mask for it, wondering if there is any workaround.

@lee1043 and @jasonb5 if both of you are available for today's office hours, let's discuss these items.

@jasonb5
Copy link
Collaborator Author

jasonb5 commented Oct 8, 2025

@lee1043 About the issue concerning ds.spatial.generate_land_sea_mask('tas', keep='sea') and ds.spatial.generate_land_sea_mask('tas', keep='land') producing the same output. This is actually the correct behavior. For this function keep is not a valid argument. This method just generates the mask, to either have a land or sea mask you'd need to use xr.where which is what generate_and_apply_land_sea_mask does. Would it provide better usability if we had two explicit functions; generate_land_mask and generate_sea_mask?

@lee1043
Copy link
Collaborator

lee1043 commented Oct 8, 2025

@jasonb5 thanks for taking a look. I think generate_land_sea_mask without keep should be fine.

@jasonb5
Copy link
Collaborator Author

jasonb5 commented Oct 9, 2025

@lee1043 Fixed generating a mask from a grid. I've also added an example notebook, if you'd like to review that as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

type: docs Updates to documentation type: enhancement New enhancement request

Projects

Status: Todo

Development

Successfully merging this pull request may close these issues.

[Feature]: Explore land sea mask generation

4 participants