Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API for N-dimensional combine #2616

Merged
merged 111 commits into from
Jun 25, 2019
Merged
Show file tree
Hide file tree
Changes from 105 commits
Commits
Show all changes
111 commits
Select commit Hold shift + click to select a range
88ee12a
concatenates along a single dimension
TomNicholas Nov 5, 2018
1aaa075
Wrote function to find correct tile_IDs from nested list of datasets
TomNicholas Nov 6, 2018
dbb371d
Wrote function to check that combined_tile_ids structure is valid
TomNicholas Nov 7, 2018
cc4d743
Added test of 2d-concatenation
TomNicholas Nov 7, 2018
d2fc7e7
Tests now check that dataset ordering is correct
TomNicholas Nov 8, 2018
e3f3699
Test concatentation along a new dimension
TomNicholas Nov 8, 2018
55bf685
Started generalising auto_combine to N-D by integrating the N-D conca…
TomNicholas Nov 9, 2018
845206c
All unit tests now passing
TomNicholas Nov 9, 2018
fb66626
Merge branch 'real_master' into feature/nd_combine
TomNicholas Nov 10, 2018
f4e9aad
Fixed a failing test which I didn't notice because I don't have pseud…
TomNicholas Nov 10, 2018
00004a1
Began updating open_mfdataset to handle N-D input
TomNicholas Nov 14, 2018
b41e374
Refactored to remove duplicate logic in open_mfdataset & auto_combine
TomNicholas Nov 14, 2018
8672a79
Implemented Shoyers suggestion in #2553 to rewrite the recursive nest…
TomNicholas Nov 14, 2018
4f56b24
--amend
TomNicholas Nov 14, 2018
4cfaf2e
Now raises ValueError if input not ordered correctly before concatena…
TomNicholas Nov 14, 2018
9fd1413
Added some more prototype tests defining desired behaviour more clearly
TomNicholas Nov 22, 2018
8ad0121
Now raises informative errors on invalid forms of input
TomNicholas Nov 24, 2018
4b2c544
Refactoring to alos merge along each dimension
TomNicholas Nov 25, 2018
3d0061e
Refactored to literally just apply the old auto_combine along each di…
TomNicholas Nov 25, 2018
60c93ba
Added unit tests for open_mfdatset
TomNicholas Nov 26, 2018
1824538
Removed TODOs
TomNicholas Nov 26, 2018
d380815
Removed format strings
TomNicholas Nov 30, 2018
c4bb8d0
test_get_new_tile_ids now doesn't assume dicts are ordered
TomNicholas Nov 30, 2018
6b7f889
Fixed failing tests on python3.5 caused by accidentally assuming dict…
TomNicholas Nov 30, 2018
58a3648
Test for getting new tile id
TomNicholas Nov 30, 2018
a12a34a
Fixed itertoolz import so that it's compatible with older versions
TomNicholas Nov 30, 2018
ada1f4a
Increased test coverage
TomNicholas Dec 1, 2018
ef0a30e
Added toolz as an explicit dependency to pass tests on python2.7
TomNicholas Dec 1, 2018
3be70bc
Updated 'what's new'
TomNicholas Dec 1, 2018
f266bc3
No longer attempts to shortcut all concatenation at once if concat_di…
TomNicholas Dec 1, 2018
cf49c2b
Merge branch 'master' into feature/nd_combine
TomNicholas Dec 1, 2018
878e1f9
Rewrote using itertools.groupby instead of toolz.itertoolz.groupby to…
TomNicholas Dec 1, 2018
7dea14f
Merged changes from master
TomNicholas Dec 1, 2018
e6f25a3
Fixed erroneous removal of utils import
TomNicholas Dec 1, 2018
f856485
Updated docstrings to include an example of multidimensional concaten…
TomNicholas Dec 2, 2018
6305d83
Clarified auto_combine docstring for N-D behaviour
TomNicholas Dec 5, 2018
ce59da1
Added unit test for nested list of Datasets with different variables
TomNicholas Dec 10, 2018
9fb34cf
Minor spelling and pep8 fixes
TomNicholas Dec 10, 2018
83dedb3
Started working on a new api with both auto_combine and manual_combine
TomNicholas Dec 11, 2018
de199a0
Merged master
TomNicholas Dec 17, 2018
3e64a83
Wrote basic function to infer concatenation order from coords.
TomNicholas Jan 3, 2019
963c794
Attempt at finalised version of public-facing API.
TomNicholas Jan 4, 2019
1a66530
No longer uses entire old auto_combine internally, only concat or merge
TomNicholas Jan 4, 2019
38d265e
Merged v0.11.1 and v0.11.2 changes
TomNicholas Jan 4, 2019
7525b23
Updated what's new
TomNicholas Jan 4, 2019
92e120a
Removed uneeded addition to what's new for old release
TomNicholas Jan 4, 2019
13a7f75
Fixed incomplete merge in docstring for open_mfdataset
TomNicholas Jan 4, 2019
b76e681
Tests for manual combine passing
TomNicholas Jan 6, 2019
c09df8b
Tests for auto_combine now passing
TomNicholas Jan 6, 2019
953d572
xfailed weird behaviour with manual_combine trying to determine conca…
TomNicholas Jan 6, 2019
b7bf1ad
Add auto_combine and manual_combine to API page of docs
TomNicholas Jan 6, 2019
855d819
Tests now passing for open_mfdataset
TomNicholas Jan 6, 2019
de7965e
Attempted to merge master in, but #2648 has stumped me
TomNicholas Jan 6, 2019
bfcb4e3
Completed merge so that #2648 is respected, and added tests.
TomNicholas Jan 7, 2019
eb053cc
Separated the tests for concat and both combines
TomNicholas Jan 7, 2019
97e508c
Some PEP8 fixes
TomNicholas Jan 7, 2019
410b138
Pre-empting a test which will fail with opening uamiv format
TomNicholas Jan 7, 2019
02b6d05
Satisfy pep8speaks bot
TomNicholas Jan 7, 2019
0d6f13a
Python 3.5 compatibile after changing some error string formatting
TomNicholas Jan 7, 2019
18e0074
Order coords using pandas.Index objects
TomNicholas Jan 7, 2019
67f11f3
Fixed performance bug from GH #2662
TomNicholas Jan 15, 2019
3b843f5
Removed ToDos about natural sorting of string coords
TomNicholas Jan 23, 2019
540d3d4
Merged master into branch
TomNicholas Jan 23, 2019
bb98d54
Generalized auto_combine to handle monotonically-decreasing coords too
TomNicholas Jan 24, 2019
e3f7523
Added more examples to docstring for manual_combine
TomNicholas Jan 28, 2019
fc36b74
Merged master - includes py2 deprecation
TomNicholas Jan 28, 2019
d96595e
Added note about globbing aspect of open_mfdataset
TomNicholas Jan 28, 2019
79f09c0
Removed auto-inferring of concatenation dimension in manual_combine
TomNicholas Jan 28, 2019
e32adb3
Added example to docstring for auto_combine
TomNicholas Jan 28, 2019
da4d605
Minor correction to docstring
TomNicholas Jan 28, 2019
c4fe22c
Another very minor docstring correction
TomNicholas Jan 28, 2019
66b4c4f
Added test to guard against issue #2777
TomNicholas Feb 27, 2019
90f0c1d
Started deprecation cycle for auto_combine
TomNicholas Mar 2, 2019
0990dd4
Fully reverted open_mfdataset tests
TomNicholas Mar 3, 2019
d6277be
Updated what's new to match deprecation cycle
TomNicholas Mar 3, 2019
b81e77a
Merge branch 'real_master' into feature/nd_combine_new_api
TomNicholas Mar 3, 2019
bf7d549
Reverted uamiv test
TomNicholas Mar 3, 2019
f00770f
Removed dependency on itertools
TomNicholas Mar 3, 2019
c7c1746
Deprecation tests fixed
TomNicholas Mar 3, 2019
f6192ca
Satisfy pycodestyle
TomNicholas Mar 3, 2019
88f089e
Started deprecation cycle of auto_combine
TomNicholas Mar 18, 2019
2849559
merged changes from master for v0.12
TomNicholas Mar 18, 2019
535bc31
Added specific error for edge case combine_manual can't handle
TomNicholas Mar 18, 2019
5d818e0
Check that global coordinates are monotonic
TomNicholas Mar 18, 2019
42cd05d
Highlighted weird behaviour when concatenating with no data variables
TomNicholas Mar 18, 2019
8a83814
Added test for impossible-to-auto-combine coordinates
TomNicholas Mar 18, 2019
e4acbdc
Removed uneeded test
TomNicholas Mar 18, 2019
8e767e2
Satisfy linter
TomNicholas Mar 18, 2019
3d04112
Added airspeedvelocity benchmark for combining functions
TomNicholas Mar 18, 2019
06ecef6
Benchmark will take longer now
TomNicholas Mar 18, 2019
513764f
Updated version numbers in deprecation warnings to fit with recent re…
TomNicholas Mar 18, 2019
13364ff
Updated api docs for new function names
TomNicholas May 18, 2019
ddfc6dd
Fixed docs build failure
TomNicholas May 18, 2019
e471a42
Revert "Fixed docs build failure"
TomNicholas May 19, 2019
2d5b90f
Updated documentation with section explaining new functions
TomNicholas May 19, 2019
8cbf5e1
Merged master
TomNicholas May 19, 2019
9ead34e
Suppressed deprecation warnings in test suite
TomNicholas May 20, 2019
fab3586
Resolved ToDo by pointing to issue with concat, see #2975
TomNicholas May 20, 2019
9d5e29f
Various docs fixes
TomNicholas May 20, 2019
9a33ac6
Merged master, resolving conflicts with #2964
TomNicholas May 28, 2019
ae7b811
Slightly renamed tests to match new name of tested function
TomNicholas May 28, 2019
f4fc03d
Included minor suggestions from shoyer
TomNicholas May 28, 2019
917ebee
Removed trailing whitespace
TomNicholas May 28, 2019
1e537ba
Simplified error message for case combine_manual can't handle
TomNicholas May 29, 2019
7d6845b
Removed filter for deprecation warnings, and added test for if user d…
TomNicholas May 29, 2019
5083471
Simple fixes suggested by shoyer
TomNicholas Jun 21, 2019
4cc70ae
Change deprecation warning behaviour
TomNicholas Jun 21, 2019
537c405
Merged in recent changes to master
TomNicholas Jun 21, 2019
2f54127
Merge branch 'master' into feature/nd_combine_new_api
dcherian Jun 25, 2019
357531f
linting
TomNicholas Jun 25, 2019
e006875
Merge branch 'feature/nd_combine_new_api' of https://github.com/TomNi…
TomNicholas Jun 25, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
37 changes: 37 additions & 0 deletions asv_bench/benchmarks/combine.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
import numpy as np
import xarray as xr


class Combine:
"""Benchmark concatenating and merging large datasets"""

def setup(self):
"""Create 4 datasets with two different variables"""

t_size, x_size, y_size = 100, 900, 800
t, x, y = np.arange(t_size), np.arange(x_size), np.arange(y_size)
data = np.random.randn(t_size, x_size, y_size)

self.dsA0 = xr.Dataset(
{'A': xr.DataArray(data, coords={'T': t},
dims=('T', 'X', 'Y'))})
self.dsA1 = xr.Dataset(
{'A': xr.DataArray(data, coords={'T': t + t_size},
dims=('T', 'X', 'Y'))})
self.dsB0 = xr.Dataset(
{'B': xr.DataArray(data, coords={'T': t},
dims=('T', 'X', 'Y'))})
self.dsB1 = xr.Dataset(
{'B': xr.DataArray(data, coords={'T': t + t_size},
dims=('T', 'X', 'Y'))})

def time_combine_manual(self):
datasets = [[self.dsA0, self.dsA1], [self.dsB0, self.dsB1]]

xr.combine_manual(datasets, concat_dim=[None, 't'])

def time_auto_combine(self):
"""Also has to load and arrange t coordinate"""
datasets = [self.dsA0, self.dsA1, self.dsB0, self.dsB1]

xr.combine_auto(datasets)
3 changes: 3 additions & 0 deletions doc/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,9 @@ Top-level functions
broadcast
concat
merge
auto_combine
combine_auto
combine_manual
where
set_options
full_like
Expand Down
78 changes: 76 additions & 2 deletions doc/combining.rst
Original file line number Diff line number Diff line change
Expand Up @@ -11,9 +11,10 @@ Combining data
import xarray as xr
np.random.seed(123456)

* For combining datasets or data arrays along a dimension, see concatenate_.
* For combining datasets or data arrays along a single dimension, see concatenate_.
* For combining datasets with different variables, see merge_.
* For combining datasets or data arrays with different indexes or missing values, see combine_.
* For combining datasets or data arrays along multiple dimensions see combining.multi_.

.. _concatenate:

Expand Down Expand Up @@ -77,7 +78,7 @@ Merge
~~~~~

To combine variables and coordinates between multiple ``DataArray`` and/or
``Dataset`` object, use :py:func:`~xarray.merge`. It can merge a list of
``Dataset`` objects, use :py:func:`~xarray.merge`. It can merge a list of
``Dataset``, ``DataArray`` or dictionaries of objects convertible to
``DataArray`` objects:

Expand Down Expand Up @@ -237,3 +238,76 @@ coordinates as long as any non-missing values agree or are disjoint:
Note that due to the underlying representation of missing values as floating
point numbers (``NaN``), variable data type is not always preserved when merging
in this manner.

.. _combining.multi:
dcherian marked this conversation as resolved.
Show resolved Hide resolved

Combining along multiple dimensions
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. note::

There are currently three combining functions with similar names:
:py:func:`~xarray.auto_combine`, :py:func:`~xarray.combine_auto`, and
:py:func:`~xarray.combine_manual`. This is because
``auto_combine`` is in the process of being deprecated in favour of the other
two functions, which are more general. If your code currently relies on
``auto_combine``, then you will be able to get similar functionality by using
``combine_manual``.

For combining many objects along multiple dimensions xarray provides
:py:func:`~xarray.combine_manual`` and :py:func:`~xarray.combine_auto`. These
functions use a combination of ``concat`` and ``merge`` across different
variables to combine many objects into one.

:py:func:`~xarray.combine_manual`` requires specifying the order in which the
objects should be combined, while :py:func:`~xarray.combine_auto` attempts to
infer this ordering automatically from the coordinates in the data.

:py:func:`~xarray.combine_manual` is useful when you know the spatial
relationship between each object in advance. The datasets must be provided in
the form of a nested list, which specifies their relative position and
ordering. A common task is collecting data from a parallelized simulation where
each processor wrote out data to a separate file. A domain which was decomposed
into 4 parts, 2 each along both the x and y axes, requires organising the
datasets into a doubly-nested list, e.g:

.. ipython:: python

arr = xr.DataArray(name='temperature', data=np.random.randint(5, size=(2, 2)), dims=['x', 'y'])
arr
ds_grid = [[arr, arr], [arr, arr]]
xr.combine_manual(ds_grid, concat_dim=['x', 'y'])

:py:func:`~xarray.combine_manual` can also be used to explicitly merge datasets
with different variables. For example if we have 4 datasets, which are divided
along two times, and contain two different variables, we can pass ``None``
to ``'concat_dim'`` to specify the dimension of the nested list over which
we wish to use ``merge`` instead of ``concat``:

.. ipython:: python

temp = xr.DataArray(name='temperature', data=np.random.randn(2), dims=['t'])
precip = xr.DataArray(name='precipitation', data=np.random.randn(2), dims=['t'])
ds_grid = [[temp, precip], [temp, precip]]
xr.combine_manual(ds_grid, concat_dim=['t', None])

:py:func:`~xarray.combine_auto` is for combining objects which have dimension
coordinates which specify their relationship to and order relative to one
another, for example a linearly-increasing 'time' dimension coordinate.

Here we combine two datasets using their common dimension coordinates. Notice
they are concatenated in order based on the values in their dimension
coordinates, not on their position in the list passed to ``combine_auto``.

.. ipython:: python
:okwarning:

x1 = xr.DataArray(name='foo', data=np.random.randn(3), coords=[('x', [0, 1, 2])])
x2 = xr.DataArray(name='foo', data=np.random.randn(3), coords=[('x', [3, 4, 5])])
xr.combine_auto([x2, x1])

These functions can be used by :py:func:`~xarray.open_mfdataset` to open many
files as one dataset. The particular function used is specified by setting the
argument ``'combine'`` to ``'auto'`` or ``'manual'``. This is useful for
situations where your data is split across many files in multiple locations,
which have some known relationship between one another.
8 changes: 6 additions & 2 deletions doc/io.rst
Original file line number Diff line number Diff line change
Expand Up @@ -766,7 +766,10 @@ Combining multiple files

NetCDF files are often encountered in collections, e.g., with different files
corresponding to different model runs. xarray can straightforwardly combine such
files into a single Dataset by making use of :py:func:`~xarray.concat`.
files into a single Dataset by making use of :py:func:`~xarray.concat`,
:py:func:`~xarray.merge`, :py:func:`~xarray.combine_manual` and
:py:func:`~xarray.combine_auto`. For details on the difference between these
functions see :ref:`combining data`.

.. note::

Expand All @@ -779,7 +782,8 @@ files into a single Dataset by making use of :py:func:`~xarray.concat`.
This function automatically concatenates and merges multiple files into a
single xarray dataset.
It is the recommended way to open multiple files with xarray.
For more details, see :ref:`dask.io` and a `blog post`_ by Stephan Hoyer.
For more details, see :ref:`combining.multi`, :ref:`dask.io` and a
`blog post`_ by Stephan Hoyer.

.. _dask: http://dask.pydata.org
.. _blog post: http://stephanhoyer.com/2015/06/11/xray-dask-out-of-core-labeled-arrays/
Expand Down
21 changes: 21 additions & 0 deletions doc/whats-new.rst
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,23 @@ Enhancements
helpful for avoiding file-lock errors when trying to write to files opened
using ``open_dataset()`` or ``open_dataarray()``. (:issue:`2887`)
By `Dan Nowacki <https://github.com/dnowacki-usgs>`_.
- Combining datasets along N dimensions:
Datasets can now be combined along any number of dimensions,
instead of just a one-dimensional list of datasets.

The new ``combine_manual`` will accept the datasets as a a nested
list-of-lists, and combine by applying a series of concat and merge
operations. The new ``combine_auto`` will instead use the dimension
coordinates of the datasets to order them.

``open_mfdataset`` can use either ``combine_manual`` or ``combine_auto`` to
combine datasets along multiple dimensions, by specifying the argument
`combine='manual'` or `combine='auto'`.

This means that the original function ``auto_combine`` is being deprecated.
To avoid FutureWarnings switch to using `combine_manual` or `combine_auto`,
(or set the `combine` argument in `open_mfdataset`). (:issue:`2159`)
By `Tom Nicholas <http://github.com/TomNicholas>`_.

Bug fixes
~~~~~~~~~
Expand Down Expand Up @@ -158,6 +175,10 @@ Other enhancements
report showing what exactly differs between the two objects (dimensions /
coordinates / variables / attributes) (:issue:`1507`).
By `Benoit Bovy <https://github.com/benbovy>`_.
- Resampling of standard and non-standard calendars indexed by
dcherian marked this conversation as resolved.
Show resolved Hide resolved
:py:class:`~xarray.CFTimeIndex` is now possible. (:issue:`2191`).
By `Jwen Fai Low <https://github.com/jwenfai>`_ and
`Spencer Clark <https://github.com/spencerkclark>`_.
- Add ``tolerance`` option to ``resample()`` methods ``bfill``, ``pad``,
``nearest``. (:issue:`2695`)
By `Hauke Schulz <https://github.com/observingClouds>`_.
Expand Down
3 changes: 2 additions & 1 deletion xarray/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,8 @@

from .core.alignment import align, broadcast, broadcast_arrays
from .core.common import full_like, zeros_like, ones_like
from .core.combine import concat, auto_combine
from .core.concat import concat
from .core.combine import combine_auto, combine_manual, auto_combine
from .core.computation import apply_ufunc, dot, where
from .core.extensions import (register_dataarray_accessor,
register_dataset_accessor)
Expand Down
Loading