Skip to content
forked from pydata/xarray

Commit

Permalink
Merge branch 'main' into groupby-reduce
Browse files Browse the repository at this point in the history
* main:
  Add typing_extensions as a required dependency (pydata#5911)
  pydata#5740 follow up: supress xr.ufunc warnings in tests (pydata#5914)
  Avoid accessing slow .data in unstack (pydata#5906)
  Add wradlib to ecosystem in docs (pydata#5915)
  Use .to_numpy() for quantified facetgrids (pydata#5886)
  [test-upstream] fix pd skipna=None (pydata#5899)
  Add var and std to weighted computations (pydata#5870)
  Check for path-like objects rather than Path type, use os.fspath (pydata#5879)
  Handle single `PathLike` objects in `open_mfdataset()` (pydata#5884)
  • Loading branch information
dcherian committed Oct 29, 2021
2 parents 85b63b6 + bcb96ce commit fe870e5
Show file tree
Hide file tree
Showing 29 changed files with 440 additions and 148 deletions.
1 change: 1 addition & 0 deletions ci/requirements/environment-windows.yml
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,7 @@ dependencies:
- setuptools
- sparse
- toolz
- typing_extensions
- zarr
- pip:
- numbagg
1 change: 1 addition & 0 deletions ci/requirements/environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,7 @@ dependencies:
- setuptools
- sparse
- toolz
- typing_extensions
- zarr
- pip:
- numbagg
1 change: 1 addition & 0 deletions ci/requirements/py37-bare-minimum.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,3 +13,4 @@ dependencies:
- numpy=1.17
- pandas=1.0
- setuptools=40.4
- typing_extensions=3.7
1 change: 1 addition & 0 deletions ci/requirements/py37-min-all-deps.yml
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,7 @@ dependencies:
- setuptools=40.4
- sparse=0.8
- toolz=0.10
- typing_extensions=3.7
- zarr=2.4
- pip:
- numbagg==0.1
1 change: 1 addition & 0 deletions ci/requirements/py38-all-but-dask.yml
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,7 @@ dependencies:
- setuptools
- sparse
- toolz
- typing_extensions
- zarr
- pip:
- numbagg
6 changes: 6 additions & 0 deletions doc/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -779,12 +779,18 @@ Weighted objects

core.weighted.DataArrayWeighted
core.weighted.DataArrayWeighted.mean
core.weighted.DataArrayWeighted.std
core.weighted.DataArrayWeighted.sum
core.weighted.DataArrayWeighted.sum_of_squares
core.weighted.DataArrayWeighted.sum_of_weights
core.weighted.DataArrayWeighted.var
core.weighted.DatasetWeighted
core.weighted.DatasetWeighted.mean
core.weighted.DatasetWeighted.std
core.weighted.DatasetWeighted.sum
core.weighted.DatasetWeighted.sum_of_squares
core.weighted.DatasetWeighted.sum_of_weights
core.weighted.DatasetWeighted.var


Coarsen objects
Expand Down
1 change: 1 addition & 0 deletions doc/ecosystem.rst
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@ Geosciences
- `Spyfit <https://spyfit.readthedocs.io/en/master/>`_: FTIR spectroscopy of the atmosphere
- `windspharm <https://ajdawson.github.io/windspharm/index.html>`_: Spherical
harmonic wind analysis in Python.
- `wradlib <https://wradlib.org/>`_: An Open Source Library for Weather Radar Data Processing.
- `wrf-python <https://wrf-python.readthedocs.io/>`_: A collection of diagnostic and interpolation routines for use with output of the Weather Research and Forecasting (WRF-ARW) Model.
- `xarray-simlab <https://xarray-simlab.readthedocs.io>`_: xarray extension for computer model simulations.
- `xarray-spatial <https://makepath.github.io/xarray-spatial>`_: Numba-accelerated raster-based spatial processing tools (NDVI, curvature, zonal-statistics, proximity, hillshading, viewshed, etc.)
Expand Down
1 change: 1 addition & 0 deletions doc/getting-started-guide/installing.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ Required dependencies

- Python (3.7 or later)
- setuptools (40.4 or later)
- ``typing_extensions`` (3.7 or later)
- `numpy <http://www.numpy.org/>`__ (1.17 or later)
- `pandas <http://pandas.pydata.org/>`__ (1.0 or later)

Expand Down
20 changes: 17 additions & 3 deletions doc/user-guide/computation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -263,7 +263,7 @@ Weighted array reductions

:py:class:`DataArray` and :py:class:`Dataset` objects include :py:meth:`DataArray.weighted`
and :py:meth:`Dataset.weighted` array reduction methods. They currently
support weighted ``sum`` and weighted ``mean``.
support weighted ``sum``, ``mean``, ``std`` and ``var``.

.. ipython:: python
Expand Down Expand Up @@ -298,13 +298,27 @@ The weighted sum corresponds to:
weighted_sum = (prec * weights).sum()
weighted_sum
and the weighted mean to:
the weighted mean to:

.. ipython:: python
weighted_mean = weighted_sum / weights.sum()
weighted_mean
the weighted variance to:

.. ipython:: python
weighted_var = weighted_prec.sum_of_squares() / weights.sum()
weighted_var
and the weighted standard deviation to:

.. ipython:: python
weighted_std = np.sqrt(weighted_var)
weighted_std
However, the functions also take missing values in the data into account:

.. ipython:: python
Expand All @@ -327,7 +341,7 @@ If the weights add up to to 0, ``sum`` returns 0:
data.weighted(weights).sum()
and ``mean`` returns ``NaN``:
and ``mean``, ``std`` and ``var`` return ``NaN``:

.. ipython:: python
Expand Down
11 changes: 11 additions & 0 deletions doc/whats-new.rst
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,8 @@ v0.19.1 (unreleased)
New Features
~~~~~~~~~~~~
- Add :py:meth:`var`, :py:meth:`std` and :py:meth:`sum_of_squares` to :py:meth:`Dataset.weighted` and :py:meth:`DataArray.weighted`.
By `Christian Jauvin <https://github.com/cjauvin>`_.
- Added a :py:func:`get_options` method to xarray's root namespace (:issue:`5698`, :pull:`5716`)
By `Pushkar Kopparla <https://github.com/pkopparla>`_.
- Xarray now does a better job rendering variable names that are long LaTeX sequences when plotting (:issue:`5681`, :pull:`5682`).
Expand Down Expand Up @@ -80,6 +82,15 @@ Bug fixes
By `Jimmy Westling <https://github.com/illviljan>`_.
- Numbers are properly formatted in a plot's title (:issue:`5788`, :pull:`5789`).
By `Maxime Liquet <https://github.com/maximlt>`_.
- Faceted plots will no longer raise a `pint.UnitStrippedWarning` when a `pint.Quantity` array is plotted,
and will correctly display the units of the data in the colorbar (if there is one) (:pull:`5886`).
By `Tom Nicholas <https://github.com/TomNicholas>`_.
- With backends, check for path-like objects rather than ``pathlib.Path``
type, use ``os.fspath`` (:pull:`5879`).
By `Mike Taves <https://github.com/mwtoews>`_.
- ``open_mfdataset()`` now accepts a single ``pathlib.Path`` object (:issue: `5881`).
By `Panos Mavrogiorgos <https://github.com/pmav99>`_.
- Improved performance of :py:meth:`Dataset.unstack` (:pull:`5906`). By `Tom Augspurger <https://github.com/TomAugspurger>`_.

Documentation
~~~~~~~~~~~~~
Expand Down
2 changes: 1 addition & 1 deletion requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,4 +5,4 @@
numpy >= 1.17
pandas >= 1.0
setuptools >= 40.4
typing-extensions >= 3.10
typing-extensions >= 3.7
1 change: 1 addition & 0 deletions setup.cfg
Original file line number Diff line number Diff line change
Expand Up @@ -78,6 +78,7 @@ python_requires = >=3.7
install_requires =
numpy >= 1.17
pandas >= 1.0
typing_extensions >= 3.7
setuptools >= 40.4 # For pkg_resources

[options.extras_require]
Expand Down
21 changes: 11 additions & 10 deletions xarray/backends/api.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,6 @@
from glob import glob
from io import BytesIO
from numbers import Number
from pathlib import Path
from typing import (
TYPE_CHECKING,
Callable,
Expand Down Expand Up @@ -808,7 +807,7 @@ def open_mfdataset(
- "override": if indexes are of same size, rewrite indexes to be
those of the first object with that dimension. Indexes for the same
dimension must have the same size in all objects.
attrs_file : str or pathlib.Path, optional
attrs_file : str or path-like, optional
Path of the file used to read global attributes from.
By default global attributes are read from the first file provided,
with wildcard matches sorted by filename.
Expand Down Expand Up @@ -865,8 +864,10 @@ def open_mfdataset(
)
else:
paths = sorted(glob(_normalize_path(paths)))
elif isinstance(paths, os.PathLike):
paths = [os.fspath(paths)]
else:
paths = [str(p) if isinstance(p, Path) else p for p in paths]
paths = [os.fspath(p) if isinstance(p, os.PathLike) else p for p in paths]

if not paths:
raise OSError("no files to open")
Expand Down Expand Up @@ -958,8 +959,8 @@ def multi_file_closer():

# read global attributes from the attrs_file or from the first dataset
if attrs_file is not None:
if isinstance(attrs_file, Path):
attrs_file = str(attrs_file)
if isinstance(attrs_file, os.PathLike):
attrs_file = os.fspath(attrs_file)
combined.attrs = datasets[paths.index(attrs_file)].attrs

return combined
Expand Down Expand Up @@ -992,8 +993,8 @@ def to_netcdf(
The ``multifile`` argument is only for the private use of save_mfdataset.
"""
if isinstance(path_or_file, Path):
path_or_file = str(path_or_file)
if isinstance(path_or_file, os.PathLike):
path_or_file = os.fspath(path_or_file)

if encoding is None:
encoding = {}
Expand Down Expand Up @@ -1134,7 +1135,7 @@ def save_mfdataset(
----------
datasets : list of Dataset
List of datasets to save.
paths : list of str or list of Path
paths : list of str or list of path-like objects
List of paths to which to save each corresponding dataset.
mode : {"w", "a"}, optional
Write ("w") or append ("a") mode. If mode="w", any existing file at
Expand Down Expand Up @@ -1302,7 +1303,7 @@ def check_dtype(var):

def to_zarr(
dataset: Dataset,
store: Union[MutableMapping, str, Path] = None,
store: Union[MutableMapping, str, os.PathLike] = None,
chunk_store=None,
mode: str = None,
synchronizer=None,
Expand All @@ -1326,7 +1327,7 @@ def to_zarr(
if v.size == 0:
v.load()

# expand str and Path arguments
# expand str and path-like arguments
store = _normalize_path(store)
chunk_store = _normalize_path(chunk_store)

Expand Down
7 changes: 3 additions & 4 deletions xarray/backends/common.py
Original file line number Diff line number Diff line change
@@ -1,8 +1,7 @@
import logging
import os.path
import os
import time
import traceback
from pathlib import Path
from typing import Any, Dict, Tuple, Type, Union

import numpy as np
Expand All @@ -20,8 +19,8 @@


def _normalize_path(path):
if isinstance(path, Path):
path = str(path)
if isinstance(path, os.PathLike):
path = os.fspath(path)

if isinstance(path, str) and not is_remote_uri(path):
path = os.path.abspath(os.path.expanduser(path))
Expand Down
3 changes: 1 addition & 2 deletions xarray/backends/netCDF4_.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
import functools
import operator
import os
import pathlib
from contextlib import suppress

import numpy as np
Expand Down Expand Up @@ -346,7 +345,7 @@ def open(
autoclose=False,
):

if isinstance(filename, pathlib.Path):
if isinstance(filename, os.PathLike):
filename = os.fspath(filename)

if not isinstance(filename, str):
Expand Down
3 changes: 1 addition & 2 deletions xarray/backends/zarr.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@
import os
import pathlib
import warnings
from distutils.version import LooseVersion

Expand Down Expand Up @@ -346,7 +345,7 @@ def open_group(
):

# zarr doesn't support pathlib.Path objects yet. zarr-python#601
if isinstance(store, pathlib.Path):
if isinstance(store, os.PathLike):
store = os.fspath(store)

open_kwargs = dict(
Expand Down
66 changes: 33 additions & 33 deletions xarray/core/dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
from html import escape
from numbers import Number
from operator import methodcaller
from pathlib import Path
from os import PathLike
from typing import (
TYPE_CHECKING,
Any,
Expand Down Expand Up @@ -1832,7 +1832,7 @@ def to_netcdf(
Parameters
----------
path : str, Path or file-like, optional
path : str, path-like or file-like, optional
Path to which to save this dataset. File-like objects are only
supported by the scipy engine. If no path is provided, this
function returns the resulting netCDF file as bytes; in this case,
Expand Down Expand Up @@ -1914,8 +1914,8 @@ def to_netcdf(

def to_zarr(
self,
store: Union[MutableMapping, str, Path] = None,
chunk_store: Union[MutableMapping, str, Path] = None,
store: Union[MutableMapping, str, PathLike] = None,
chunk_store: Union[MutableMapping, str, PathLike] = None,
mode: str = None,
synchronizer=None,
group: str = None,
Expand Down Expand Up @@ -1944,9 +1944,9 @@ def to_zarr(
Parameters
----------
store : MutableMapping, str or Path, optional
store : MutableMapping, str or path-like, optional
Store or path to directory in local or remote file system.
chunk_store : MutableMapping, str or Path, optional
chunk_store : MutableMapping, str or path-like, optional
Store or path to directory in local or remote file system only for Zarr
array chunks. Requires zarr-python v2.4.0 or later.
mode : {"w", "w-", "a", "r+", None}, optional
Expand Down Expand Up @@ -4153,34 +4153,34 @@ def unstack(
)

result = self.copy(deep=False)
for dim in dims:

if (
# Dask arrays don't support assignment by index, which the fast unstack
# function requires.
# https://github.com/pydata/xarray/pull/4746#issuecomment-753282125
any(is_duck_dask_array(v.data) for v in self.variables.values())
# Sparse doesn't currently support (though we could special-case
# it)
# https://github.com/pydata/sparse/issues/422
or any(
isinstance(v.data, sparse_array_type)
for v in self.variables.values()
)
or sparse
# Until https://github.com/pydata/xarray/pull/4751 is resolved,
# we check explicitly whether it's a numpy array. Once that is
# resolved, explicitly exclude pint arrays.
# # pint doesn't implement `np.full_like` in a way that's
# # currently compatible.
# # https://github.com/pydata/xarray/pull/4746#issuecomment-753425173
# # or any(
# # isinstance(v.data, pint_array_type) for v in self.variables.values()
# # )
or any(
not isinstance(v.data, np.ndarray) for v in self.variables.values()
)
):
# we want to avoid allocating an object-dtype ndarray for a MultiIndex,
# so we can't just access self.variables[v].data for every variable.
# We only check the non-index variables.
# https://github.com/pydata/xarray/issues/5902
nonindexes = [
self.variables[k] for k in set(self.variables) - set(self.xindexes)
]
# Notes for each of these cases:
# 1. Dask arrays don't support assignment by index, which the fast unstack
# function requires.
# https://github.com/pydata/xarray/pull/4746#issuecomment-753282125
# 2. Sparse doesn't currently support (though we could special-case it)
# https://github.com/pydata/sparse/issues/422
# 3. pint requires checking if it's a NumPy array until
# https://github.com/pydata/xarray/pull/4751 is resolved,
# Once that is resolved, explicitly exclude pint arrays.
# pint doesn't implement `np.full_like` in a way that's
# currently compatible.
needs_full_reindex = sparse or any(
is_duck_dask_array(v.data)
or isinstance(v.data, sparse_array_type)
or not isinstance(v.data, np.ndarray)
for v in nonindexes
)

for dim in dims:
if needs_full_reindex:
result = result._unstack_full_reindex(dim, fill_value, sparse)
else:
result = result._unstack_once(dim, fill_value)
Expand Down
Loading

0 comments on commit fe870e5

Please sign in to comment.