-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improved default behavior when concatenating DataArrays #2777
Conversation
280ce92
to
a2df249
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a nice usability improvement!
Thanks for the support and quick review @shoyer! Any idea when Xarray 0.12 might be out? I'm teaching some remote sensing workshops in mid-March and would love to have this merged, as a colleague's review of those notebooks prompted this PR 😄 |
ab26245
to
562ffb8
Compare
The docs build failed due to a (transient) http error when loading tutorial data for the docs, so I've also finalised the planned conversion from |
Hmm, it looks like the failure to download the naturalearth coastlines.zip wasn't so transient after all - but it does work on my machine! |
d6e15cc
to
1b51681
Compare
OK! @shoyer, I've got everything passing and it's ready for review. Even the accidental tutorial/docs fixes 😄 |
I'll check on the doc build failure, but I would rather not remove
load_dataset yet -- we didn't deprecate it that long ago.
…On Fri, Feb 22, 2019 at 5:28 AM Zac Hatfield-Dodds ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In xarray/core/combine.py
<#2777 (comment)>:
> - elif name != arr.name:
- if compat == 'identical':
- raise ValueError('array names not identical')
- else:
- arr = arr.rename(name)
- datasets.append(arr._to_temp_dataset())
+ name = result_name(arrays)
+ names = [arr.name for arr in arrays]
+ if compat == 'identical' and len(set(names)) != 1:
+ raise ValueError(
+ "compat='identical', but array names {!r} are not identical"
+ .format(names if len(names) <= 10 else sorted(set(names)))
+ )
+ datasets = [arr.rename(name)._to_temp_dataset() for arr in arrays]
+
+ if isinstance(dim, str) and len(set(names) - {None}) == len(names) \
Not for this - we're checking that names is a list of unique strings.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#2777 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ABKS1nbwyPzYE3HzT30pWhWignDUrbBSks5vP_B_gaJpZM4bCN3h>
.
|
1b51681
to
97ac4a6
Compare
@shoyer - don't worry about the docs build, I'm pretty sure that was just a flaky network from Travis and it's working now in any case. I've left |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me now. I'll merge in a day or two unless anyone else has review comments to add.
Hello @Zac-HD! Thanks for updating the PR. Cheers ! There are no PEP8 issues in this Pull Request. 🍻 Comment last updated on February 27, 2019 at 00:51 Hours UTC |
@pep8speaks seems to have gone hay-wire -- maybe you have a syntax error? Thinking about this a little more, one hazard of converting names into index labels is that we lose the invariant that you get the same result regardless of order in which you call concat, e.g., something like these expressions could now give different results:
vs
If I'm not entirely sure this is a deal-breaker but it makes me a little nervous reluctant. In particular, it might break some the invariants we're relying upon for the next version of |
e0ae758
to
4f3751e
Compare
...yep, an unmatched paren. 😥
I think it's impossible to avoid this when using inference in the general case. Two options I think would be decent-if-unsatisfying:
|
This is really nice to have when using concat to produce faceted plots of various kinds, and harmless when it's useless.
it's still deprecated, but we'll leave it for a bit longer before removal.
4f3751e
to
63da214
Compare
@Zac-HD forgive me for this but I think this PR is unnecessary because what you need basically already exists in the API. Going back to your original example, you could have got the same indexing by creating a DataArray to use as a coordinate to concatenate over: colors = "blue green red".split()
ds = xr.Dataset({
k: xr.DataArray(np.random.random((2, 2)), dims="x y".split(), name=k)
for k in colors
})
band = xr.DataArray(colors, name="band", dims=["band"])
xr.concat([ds.blue, ds.green, ds.red], dim=band).plot.imshow(col="band") This still leaves the wrong label on the colorbar, but that could be fixed separately and has to do with concat using the attrs of the first dataset in the list for the final dataset (a similar problem to #2382). I think it would be easier to change that behaviour instead (perhaps to Creating a new coordinate using a DataArray is in the docstring for
but I think there should be an example there too. (Also I think this is relevant to #1646)
@shoyer I agree, although I like the idea then I think this could introduce all sorts of complex concatentation edge cases. At the very least the new API should have symmetry properties something like: da1 = DataArray(name='a', data=[[0]], dims=['x', 'y'])
da2 = DataArray(name='b', data=[[1]], dims=['x', 'y'])
da3 = DataArray(name='a', data=[[2]], dims=['x', 'y'])
da4 = DataArray(name='b', data=[[3]], dims=['x', 'y'])
xr.manual_combine([[da1, da2], [da3, da4]], concat_dim=['x', 'y'])
# should give the same result as
xr.manual_combine([[da1, da3], [da2, da4]], concat_dim=['y', 'x']) but with this PR I don't think it would. In the first case the x coord would be created with values I think my suggestion for naming would pass this test because the result would be nameless and have no coords either way. I might have got that wrong but I definitely think this kind of change should be carefully considered 😕 (EDIT: I just added this example as a test to #2616) |
I've just submitted a PR which solves this issue in the way I just suggested instead #2792. |
@Zac-HD there's actually another way to get the indexing behaviour you wanted with the current API: colors = "blue green red".split()
das = [xr.DataArray(np.random.random((2, 2)), dims="x y".split(),
coords={"band": k})
for k in colors]
xr.concat(das, dim="band").plot.imshow(col="band") Here instead of using the name attribute to label each band I've used a scalar coordinate called This never touches the names so actually gives the desired output without the need for #2792: |
I guess we should probably roll back the "name to scalar coordinates" part of this change. @Zac-HD do you want to do that here or should we go with @TomNicholas's PR? |
* concatenates along a single dimension * Wrote function to find correct tile_IDs from nested list of datasets * Wrote function to check that combined_tile_ids structure is valid * Added test of 2d-concatenation * Tests now check that dataset ordering is correct * Test concatentation along a new dimension * Started generalising auto_combine to N-D by integrating the N-D concatentation algorithm * All unit tests now passing * Fixed a failing test which I didn't notice because I don't have pseudoNetCDF * Began updating open_mfdataset to handle N-D input * Refactored to remove duplicate logic in open_mfdataset & auto_combine * Implemented Shoyers suggestion in #2553 to rewrite the recursive nested list traverser as an iterator * --amend * Now raises ValueError if input not ordered correctly before concatenation * Added some more prototype tests defining desired behaviour more clearly * Now raises informative errors on invalid forms of input * Refactoring to alos merge along each dimension * Refactored to literally just apply the old auto_combine along each dimension * Added unit tests for open_mfdatset * Removed TODOs * Removed format strings * test_get_new_tile_ids now doesn't assume dicts are ordered * Fixed failing tests on python3.5 caused by accidentally assuming dict was ordered * Test for getting new tile id * Fixed itertoolz import so that it's compatible with older versions * Increased test coverage * Added toolz as an explicit dependency to pass tests on python2.7 * Updated 'what's new' * No longer attempts to shortcut all concatenation at once if concat_dims=None * Rewrote using itertools.groupby instead of toolz.itertoolz.groupby to remove hidden dependency on toolz * Fixed erroneous removal of utils import * Updated docstrings to include an example of multidimensional concatenation * Clarified auto_combine docstring for N-D behaviour * Added unit test for nested list of Datasets with different variables * Minor spelling and pep8 fixes * Started working on a new api with both auto_combine and manual_combine * Wrote basic function to infer concatenation order from coords. Needs better error handling though. * Attempt at finalised version of public-facing API. All the internals still need to be redone to match though. * No longer uses entire old auto_combine internally, only concat or merge * Updated what's new * Removed uneeded addition to what's new for old release * Fixed incomplete merge in docstring for open_mfdataset * Tests for manual combine passing * Tests for auto_combine now passing * xfailed weird behaviour with manual_combine trying to determine concat_dim * Add auto_combine and manual_combine to API page of docs * Tests now passing for open_mfdataset * Completed merge so that #2648 is respected, and added tests. Also moved concat to it's own file to avoid a circular dependency * Separated the tests for concat and both combines * Some PEP8 fixes * Pre-empting a test which will fail with opening uamiv format * Satisfy pep8speaks bot * Python 3.5 compatibile after changing some error string formatting * Order coords using pandas.Index objects * Fixed performance bug from GH #2662 * Removed ToDos about natural sorting of string coords * Generalized auto_combine to handle monotonically-decreasing coords too * Added more examples to docstring for manual_combine * Added note about globbing aspect of open_mfdataset * Removed auto-inferring of concatenation dimension in manual_combine * Added example to docstring for auto_combine * Minor correction to docstring * Another very minor docstring correction * Added test to guard against issue #2777 * Started deprecation cycle for auto_combine * Fully reverted open_mfdataset tests * Updated what's new to match deprecation cycle * Reverted uamiv test * Removed dependency on itertools * Deprecation tests fixed * Satisfy pycodestyle * Started deprecation cycle of auto_combine * Added specific error for edge case combine_manual can't handle * Check that global coordinates are monotonic * Highlighted weird behaviour when concatenating with no data variables * Added test for impossible-to-auto-combine coordinates * Removed uneeded test * Satisfy linter * Added airspeedvelocity benchmark for combining functions * Benchmark will take longer now * Updated version numbers in deprecation warnings to fit with recent release of 0.12 * Updated api docs for new function names * Fixed docs build failure * Revert "Fixed docs build failure" This reverts commit ddfc6dd. * Updated documentation with section explaining new functions * Suppressed deprecation warnings in test suite * Resolved ToDo by pointing to issue with concat, see #2975 * Various docs fixes * Slightly renamed tests to match new name of tested function * Included minor suggestions from shoyer * Removed trailing whitespace * Simplified error message for case combine_manual can't handle * Removed filter for deprecation warnings, and added test for if user doesn't supply concat_dim * Simple fixes suggested by shoyer * Change deprecation warning behaviour * linting
whats-new.rst
for all changes andapi.rst
for new APIThis is really nice to have when producing faceted plots of satellite observations in various bands, and should be somewhere between useful and harmless in other cases.
Example code:
Before - facets have an index, colorbar has misleading label:
After - facets have meaningful labels, colorbar has no label: