Skip to content

Commit 663d0c9

Browse files
authored
Merge branch 'master' into drop_duplicates
2 parents a1ce19d + c54ec94 commit 663d0c9

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

51 files changed

+2342
-429
lines changed

.github/dependabot.yml

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
version: 2
2+
updates:
3+
- package-ecosystem: 'github-actions'
4+
directory: '/'
5+
schedule:
6+
# Check for updates once a week
7+
interval: 'weekly'

.github/workflows/cancel-duplicate-runs.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,6 @@ jobs:
99
name: Cancel previous runs
1010
runs-on: ubuntu-latest
1111
steps:
12-
- uses: styfle/cancel-workflow-action@0.8.0
12+
- uses: styfle/cancel-workflow-action@0.9.0
1313
with:
1414
workflow_id: ${{ github.event.workflow.id }}

.github/workflows/ci-additional.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -181,7 +181,7 @@ jobs:
181181
python xarray/util/print_versions.py
182182
- name: Run mypy
183183
run: |
184-
python -m mypy xarray
184+
python -m mypy .
185185
186186
min-version-policy:
187187
name: Minimum Version Policy

.github/workflows/ci-pre-commit.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,4 +13,4 @@ jobs:
1313
steps:
1414
- uses: actions/checkout@v2
1515
- uses: actions/setup-python@v2
16-
- uses: pre-commit/action@v2.0.0
16+
- uses: pre-commit/action@v2.0.2

.pre-commit-config.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,7 @@ repos:
3434
rev: v0.812
3535
hooks:
3636
- id: mypy
37+
# Copied from setup.cfg
3738
exclude: "properties|asv_bench"
3839
# run this occasionally, ref discussion https://github.com/pydata/xarray/pull/3194
3940
# - repo: https://github.com/asottile/pyupgrade

ci/requirements/mypy_only

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,4 @@
11
# used for the "Type checking (mypy)" CI run
22
# version must correspond to the one in .pre-commit-config.yaml
3+
# See https://github.com/pydata/xarray/issues/4881 for more details.
34
mypy=0.812

doc/howdoi.rst

Lines changed: 13 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,8 @@ How do I ...
2323
- :py:meth:`Dataset.set_coords`
2424
* - change the order of dimensions
2525
- :py:meth:`DataArray.transpose`, :py:meth:`Dataset.transpose`
26+
* - reshape dimensions
27+
- :py:meth:`DataArray.stack`, :py:meth:`Dataset.stack`
2628
* - remove a variable from my object
2729
- :py:meth:`Dataset.drop_vars`, :py:meth:`DataArray.drop_vars`
2830
* - remove dimensions of length 1 or 0
@@ -34,7 +36,9 @@ How do I ...
3436
* - rename a variable, dimension or coordinate
3537
- :py:meth:`Dataset.rename`, :py:meth:`DataArray.rename`, :py:meth:`Dataset.rename_vars`, :py:meth:`Dataset.rename_dims`,
3638
* - convert a DataArray to Dataset or vice versa
37-
- :py:meth:`DataArray.to_dataset`, :py:meth:`Dataset.to_array`
39+
- :py:meth:`DataArray.to_dataset`, :py:meth:`Dataset.to_array`, :py:meth:`Dataset.to_stacked_array`, :py:meth:`DataArray.to_unstacked_dataset`
40+
* - extract variables that have certain attributes
41+
- :py:meth:`Dataset.filter_by_attrs`
3842
* - extract the underlying array (e.g. numpy or Dask arrays)
3943
- :py:attr:`DataArray.data`
4044
* - convert to and extract the underlying numpy array
@@ -43,6 +47,8 @@ How do I ...
4347
- :py:func:`dask.is_dask_collection`
4448
* - know how much memory my object requires
4549
- :py:attr:`DataArray.nbytes`, :py:attr:`Dataset.nbytes`
50+
* - Get axis number for a dimension
51+
- :py:meth:`DataArray.get_axis_num`
4652
* - convert a possibly irregularly sampled timeseries to a regularly sampled timeseries
4753
- :py:meth:`DataArray.resample`, :py:meth:`Dataset.resample` (see :ref:`resampling` for more)
4854
* - apply a function on all data variables in a Dataset
@@ -51,6 +57,8 @@ How do I ...
5157
- :py:func:`Dataset.to_netcdf`, :py:func:`DataArray.to_netcdf` specifying ``engine="h5netcdf", invalid_netcdf=True``
5258
* - make xarray objects look like other xarray objects
5359
- :py:func:`~xarray.ones_like`, :py:func:`~xarray.zeros_like`, :py:func:`~xarray.full_like`, :py:meth:`Dataset.reindex_like`, :py:meth:`Dataset.interp_like`, :py:meth:`Dataset.broadcast_like`, :py:meth:`DataArray.reindex_like`, :py:meth:`DataArray.interp_like`, :py:meth:`DataArray.broadcast_like`
60+
* - Make sure my datasets have values at the same coordinate locations
61+
- ``xr.align(dataset_1, dataset_2, join="exact")``
5462
* - replace NaNs with other values
5563
- :py:meth:`Dataset.fillna`, :py:meth:`Dataset.ffill`, :py:meth:`Dataset.bfill`, :py:meth:`Dataset.interpolate_na`, :py:meth:`DataArray.fillna`, :py:meth:`DataArray.ffill`, :py:meth:`DataArray.bfill`, :py:meth:`DataArray.interpolate_na`
5664
* - extract the year, month, day or similar from a DataArray of time values
@@ -59,3 +67,7 @@ How do I ...
5967
- ``obj.dt.ceil``, ``obj.dt.floor``, ``obj.dt.round``. See :ref:`dt_accessor` for more.
6068
* - make a mask that is ``True`` where an object contains any of the values in a array
6169
- :py:meth:`Dataset.isin`, :py:meth:`DataArray.isin`
70+
* - Index using a boolean mask
71+
- :py:meth:`Dataset.query`, :py:meth:`DataArray.query`, :py:meth:`Dataset.where`, :py:meth:`DataArray.where`
72+
* - preserve ``attrs`` during (most) xarray operations
73+
- ``xr.set_options(keep_attrs=True)``

doc/internals/how-to-add-new-backend.rst

Lines changed: 21 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -32,16 +32,19 @@ This is what a ``BackendEntrypoint`` subclass should look like:
3232

3333
.. code-block:: python
3434
35+
from xarray.backends import BackendEntrypoint
36+
37+
3538
class MyBackendEntrypoint(BackendEntrypoint):
3639
def open_dataset(
3740
self,
3841
filename_or_obj,
3942
*,
4043
drop_variables=None,
4144
# other backend specific keyword arguments
45+
# `chunks` and `cache` DO NOT go here, they are handled by xarray
4246
):
43-
...
44-
return ds
47+
return my_open_dataset(filename_or_obj, drop_variables=drop_variables)
4548
4649
open_dataset_parameters = ["filename_or_obj", "drop_variables"]
4750
@@ -50,7 +53,7 @@ This is what a ``BackendEntrypoint`` subclass should look like:
5053
_, ext = os.path.splitext(filename_or_obj)
5154
except TypeError:
5255
return False
53-
return ext in {...}
56+
return ext in {".my_format", ".my_fmt"}
5457
5558
``BackendEntrypoint`` subclass methods and attributes are detailed in the following.
5659

@@ -74,20 +77,19 @@ The following is an example of the high level processing steps:
7477
decode_times=True,
7578
decode_timedelta=True,
7679
decode_coords=True,
77-
my_backend_param=None,
80+
my_backend_option=None,
7881
):
7982
vars, attrs, coords = my_reader(
8083
filename_or_obj,
8184
drop_variables=drop_variables,
82-
my_backend_param=my_backend_param,
85+
my_backend_option=my_backend_option,
8386
)
8487
vars, attrs, coords = my_decode_variables(
8588
vars, attrs, decode_times, decode_timedelta, decode_coords
8689
) # see also conventions.decode_cf_variables
8790
88-
ds = xr.Dataset(vars, attrs=attrs)
89-
ds = ds.set_coords(coords)
90-
ds.set_close(store.close)
91+
ds = xr.Dataset(vars, attrs=attrs, coords=coords)
92+
ds.set_close(my_close_method)
9193
9294
return ds
9395
@@ -98,9 +100,9 @@ method shall be set by using :py:meth:`~xarray.Dataset.set_close`.
98100

99101

100102
The input of ``open_dataset`` method are one argument
101-
(``filename``) and one keyword argument (``drop_variables``):
103+
(``filename_or_obj``) and one keyword argument (``drop_variables``):
102104

103-
- ``filename``: can be a string containing a path or an instance of
105+
- ``filename_or_obj``: can be any object but usually it is a string containing a path or an instance of
104106
:py:class:`pathlib.Path`.
105107
- ``drop_variables``: can be `None` or an iterable containing the variable
106108
names to be dropped when reading the data.
@@ -117,7 +119,7 @@ should implement in its interface the following boolean keyword arguments, calle
117119
- ``decode_coords``
118120

119121
Note: all the supported decoders shall be declared explicitly
120-
in backend ``open_dataset`` signature.
122+
in backend ``open_dataset`` signature and adding a ``**kargs`` is not allowed.
121123

122124
These keyword arguments are explicitly defined in Xarray
123125
:py:func:`~xarray.open_dataset` signature. Xarray will pass them to the
@@ -241,7 +243,7 @@ How to register a backend
241243

242244
Define a new entrypoint in your ``setup.py`` (or ``setup.cfg``) with:
243245

244-
- group: ``xarray.backend``
246+
- group: ``xarray.backends``
245247
- name: the name to be passed to :py:meth:`~xarray.open_dataset` as ``engine``
246248
- object reference: the reference of the class that you have implemented.
247249

@@ -251,9 +253,7 @@ You can declare the entrypoint in ``setup.py`` using the following syntax:
251253
252254
setuptools.setup(
253255
entry_points={
254-
"xarray.backends": [
255-
"engine_name=your_package.your_module:YourBackendEntryClass"
256-
],
256+
"xarray.backends": ["my_engine=my_package.my_module:MyBackendEntryClass"],
257257
},
258258
)
259259
@@ -263,18 +263,18 @@ in ``setup.cfg``:
263263
264264
[options.entry_points]
265265
xarray.backends =
266-
engine_name = your_package.your_module:YourBackendEntryClass
266+
my_engine = my_package.my_module:MyBackendEntryClass
267267
268268
269269
See https://packaging.python.org/specifications/entry-points/#data-model
270270
for more information
271271

272-
If you are using [Poetry](https://python-poetry.org/) for your build system, you can accomplish the same thing using "plugins". In this case you would need to add the following to your ``pyproject.toml`` file:
272+
If you are using `Poetry <https://python-poetry.org/>`_ for your build system, you can accomplish the same thing using "plugins". In this case you would need to add the following to your ``pyproject.toml`` file:
273273

274274
.. code-block:: toml
275275
276276
[tool.poetry.plugins."xarray_backends"]
277-
"engine_name" = "your_package.your_module:YourBackendEntryClass"
277+
"my_engine" = "my_package.my_module:MyBackendEntryClass"
278278
279279
See https://python-poetry.org/docs/pyproject/#plugins for more information on Poetry plugins.
280280

@@ -328,6 +328,9 @@ This is an example ``BackendArray`` subclass implementation:
328328

329329
.. code-block:: python
330330
331+
from xarray.backends import BackendArray
332+
333+
331334
class MyBackendArray(BackendArray):
332335
def __init__(
333336
self,

doc/user-guide/data-structures.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -239,7 +239,7 @@ to access any variable in a dataset, datasets have four key properties:
239239
used in ``data_vars`` (e.g., arrays of numbers, datetime objects or strings)
240240
- ``attrs``: :py:class:`dict` to hold arbitrary metadata
241241

242-
The distinction between whether a variables falls in data or coordinates
242+
The distinction between whether a variable falls in data or coordinates
243243
(borrowed from `CF conventions`_) is mostly semantic, and you can probably get
244244
away with ignoring it if you like: dictionary like access on a dataset will
245245
supply variables found in either category. However, xarray does make use of the

doc/whats-new.rst

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -67,6 +67,11 @@ New Features
6767
- Implement :py:meth:`Dataset.drop_duplicate_coords` and :py:meth:`DataArray.drop_duplicate_coords`
6868
to remove duplicate coordinate values (:pull:`5089`).
6969
By `Andrew Huang <https://github.com/ahuang11>`_.
70+
- Add typing information to unary and binary arithmetic operators operating on
71+
:py:class:`~core.dataset.Dataset`, :py:class:`~core.dataarray.DataArray`,
72+
:py:class:`~core.variable.Variable`, :py:class:`~core.groupby.DatasetGroupBy` or
73+
:py:class:`~core.groupby.DataArrayGroupBy` (:pull:`4904`).
74+
By `Richard Kleijn <https://github.com/rhkleijn>`_ .
7075
- Add a ``combine_attrs`` parameter to :py:func:`open_mfdataset` (:pull:`4971`).
7176
By `Justus Magin <https://github.com/keewis>`_.
7277
- Disable the `cfgrib` backend if the `eccodes` library is not installed (:pull:`5083`). By `Baudouin Raoult <https://github.com/b8raoult>`_.

setup.cfg

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -162,6 +162,8 @@ default_section = THIRDPARTY
162162
known_first_party = xarray
163163

164164
[mypy]
165+
exclude = properties|asv_bench|doc
166+
files = xarray/**/*.py
165167
show_error_codes = True
166168

167169
# Most of the numerical computing stack doesn't have type annotations yet.
@@ -238,7 +240,6 @@ ignore_missing_imports = True
238240
[mypy-xarray.core.pycompat]
239241
ignore_errors = True
240242

241-
242243
[aliases]
243244
test = pytest
244245

xarray/backends/__init__.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44
formats. They should not be used directly, but rather through Dataset objects.
55
"""
66
from .cfgrib_ import CfGribDataStore
7-
from .common import AbstractDataStore
7+
from .common import AbstractDataStore, BackendArray, BackendEntrypoint
88
from .file_manager import CachingFileManager, DummyFileManager, FileManager
99
from .h5netcdf_ import H5NetCDFStore
1010
from .memory import InMemoryDataStore
@@ -18,6 +18,8 @@
1818

1919
__all__ = [
2020
"AbstractDataStore",
21+
"BackendArray",
22+
"BackendEntrypoint",
2123
"FileManager",
2224
"CachingFileManager",
2325
"CfGribDataStore",

xarray/backends/api.py

Lines changed: 4 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@
2828
from ..core.dataset import Dataset, _get_chunk, _maybe_chunk
2929
from ..core.utils import is_remote_uri
3030
from . import plugins
31-
from .common import AbstractDataStore, ArrayWriter
31+
from .common import AbstractDataStore, ArrayWriter, _normalize_path
3232
from .locks import _get_scheduler
3333

3434
if TYPE_CHECKING:
@@ -109,16 +109,6 @@ def _get_default_engine(path: str, allow_remote: bool = False):
109109
return engine
110110

111111

112-
def _normalize_path(path):
113-
if isinstance(path, Path):
114-
path = str(path)
115-
116-
if isinstance(path, str) and not is_remote_uri(path):
117-
path = os.path.abspath(os.path.expanduser(path))
118-
119-
return path
120-
121-
122112
def _validate_dataset_names(dataset):
123113
"""DataArray.name and Dataset keys must be a string or None"""
124114

@@ -375,10 +365,11 @@ def open_dataset(
375365
scipy.io.netcdf (only netCDF3 supported). Byte-strings or file-like
376366
objects are opened by scipy.io.netcdf (netCDF3) or h5py (netCDF4/HDF).
377367
engine : {"netcdf4", "scipy", "pydap", "h5netcdf", "pynio", "cfgrib", \
378-
"pseudonetcdf", "zarr"}, optional
368+
"pseudonetcdf", "zarr"} or subclass of xarray.backends.BackendEntrypoint, optional
379369
Engine to use when reading files. If not provided, the default engine
380370
is chosen based on available dependencies, with a preference for
381-
"netcdf4".
371+
"netcdf4". A custom backend class (a subclass of ``BackendEntrypoint``)
372+
can also be used.
382373
chunks : int or dict, optional
383374
If chunks is provided, it is used to load the new dataset into dask
384375
arrays. ``chunks=-1`` loads the dataset with dask using a single

xarray/backends/cfgrib_.py

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@
1111
AbstractDataStore,
1212
BackendArray,
1313
BackendEntrypoint,
14+
_normalize_path,
1415
)
1516
from .locks import SerializableLock, ensure_lock
1617
from .store import StoreBackendEntrypoint
@@ -22,9 +23,10 @@
2223
except ModuleNotFoundError:
2324
has_cfgrib = False
2425
# cfgrib throws a RuntimeError if eccodes is not installed
25-
except RuntimeError:
26+
except (ImportError, RuntimeError):
2627
warnings.warn(
27-
"Failed to load cfgrib - most likely eccodes is missing. Try `import cfgrib` to get the error message"
28+
"Failed to load cfgrib - most likely there is a problem accessing the ecCodes library. "
29+
"Try `import cfgrib` to get the full error message"
2830
)
2931
has_cfgrib = False
3032

@@ -120,6 +122,7 @@ def open_dataset(
120122
time_dims=("time", "step"),
121123
):
122124

125+
filename_or_obj = _normalize_path(filename_or_obj)
123126
store = CfGribDataStore(
124127
filename_or_obj,
125128
indexpath=indexpath,

xarray/backends/common.py

Lines changed: 13 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,16 @@
11
import logging
2+
import os.path
23
import time
34
import traceback
5+
from pathlib import Path
46
from typing import Any, Dict, Tuple, Type, Union
57

68
import numpy as np
79

810
from ..conventions import cf_encoder
911
from ..core import indexing
1012
from ..core.pycompat import is_duck_dask_array
11-
from ..core.utils import FrozenDict, NdimSizeLenMixin
13+
from ..core.utils import FrozenDict, NdimSizeLenMixin, is_remote_uri
1214

1315
# Create a logger object, but don't add any handlers. Leave that to user code.
1416
logger = logging.getLogger(__name__)
@@ -17,6 +19,16 @@
1719
NONE_VAR_NAME = "__values__"
1820

1921

22+
def _normalize_path(path):
23+
if isinstance(path, Path):
24+
path = str(path)
25+
26+
if isinstance(path, str) and not is_remote_uri(path):
27+
path = os.path.abspath(os.path.expanduser(path))
28+
29+
return path
30+
31+
2032
def _encode_variable_name(name):
2133
if name is None:
2234
name = NONE_VAR_NAME

xarray/backends/h5netcdf_.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@
1212
BACKEND_ENTRYPOINTS,
1313
BackendEntrypoint,
1414
WritableCFDataStore,
15+
_normalize_path,
1516
find_root_and_group,
1617
)
1718
from .file_manager import CachingFileManager, DummyFileManager
@@ -366,6 +367,7 @@ def open_dataset(
366367
decode_vlen_strings=True,
367368
):
368369

370+
filename_or_obj = _normalize_path(filename_or_obj)
369371
store = H5NetCDFStore.open(
370372
filename_or_obj,
371373
format=format,

0 commit comments

Comments
 (0)