-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
merge and align DataArrays/Datasets on different domains #742
Comments
This is actually closer to the functionality of
In cases where each array does not already have the dimension you want to concat along, this already works fine, because you can simply omit |
I'm having a similar issue, expanding the complexity in that I want to concatenate across multiple dimensions. I'm not sure if that's a cogent way to explain it, but here's an example. I have: m = xr.DataArray(data=[[[1.1, 1.2, 1.3]]],
coords={'Dim1': ['A', 'B', 'C'], 'Dim2':['D'], 'Dim3':['F']})
n = xr.DataArray(data=[[[2.1, 2.2, 2.3]]],
coords={'Dim1': ['A', 'B', 'C'], 'Dim2':['E'], 'Dim3':['F']})
o = xr.DataArray(data=[[[3.1, 3.2, 3.3]]],
coords={'Dim1': ['A', 'B', 'C'], 'Dim2':['D'], 'Dim3':['G']})
p = xr.DataArray(data=[[[4.1, 4.2, 4.3]]],
coords={'Dim1': ['A', 'B', 'C'], 'Dim2':['E'], 'Dim3':['G']}) Which I want to merge into a single, fully populated array similar to what I'd get if I did: data =[[[ 1.1, 1.2, 1.3],
[ 3.1, 3.2, 3.3]],
[[ 2.1, 2.2, 2.3],
[ 4.1, 4.2, 4.3]]]
xr.DataArray(data=data,
coords={'Dim1': ['A', 'B', 'C'], 'Dim2':['D', 'E'], 'Dim3':['F', 'G']}) i.e. <xarray.DataArray (Dim2: 2, Dim3: 2, Dim1: 3)>
array([[[ 1.1, 1.2, 1.3],
[ 3.1, 3.2, 3.3]],
[[ 2.1, 2.2, 2.3],
[ 4.1, 4.2, 4.3]]])
Coordinates:
* Dim2 (Dim2) |S1 'D' 'E'
* Dim3 (Dim3) |S1 'F' 'G'
* Dim1 (Dim1) |S1 'A' 'B' 'C' @jcmgray's function is pretty close, although the array indicies are described slightly differently (I'm not sure if this is a big deal or not...). Note the 'object' type for Dim2 and Dim3: <xarray.DataArray (Dim2: 2, Dim3: 2, Dim1: 3)>
array([[[ 1.1, 1.2, 1.3],
[ 3.1, 3.2, 3.3]],
[[ 2.1, 2.2, 2.3],
[ 4.1, 4.2, 4.3]]])
Coordinates:
* Dim2 (Dim2) object 'D' 'E'
* Dim3 (Dim3) object 'F' 'G'
* Dim1 (Dim1) |S1 'A' 'B' 'C' It would be great to have a canonical way to do this. What should I try? |
Just a comment that the appearance of I still use use the |
I think this could make it into merge, which I am in the process of refactoring in #857. The key difference from @jcmgray's implementation that I would want is a check to make sure that the data is all on different domains when using @JamesPHoughton I agree with @jcmgray that the dtype=object is what you should expect here. It's hard to create fixed length strings in xarray/pandas because that precludes the possibility of missing values, so we tend to convert strings to object dtype when merged/concatenated. |
Something akin to the pandas dataframe import pandas as pd
df = pd.DataFrame(index=range(5), columns=['a','b','c','d'])
df2 = pd.DataFrame(index=range(3), columns=['a'], data=range(3))
df.update(df2)
But, not sure if empty array construction is supported? |
Yes following a similar line of thought to you I recently wrote an 'all missing' dataset constructor (rather than 'empty' which I think of as no variables): def all_missing_ds(coords, var_names, var_dims, var_types):
"""
Make a dataset whose data is all missing.
"""
# Empty dataset with appropirate coordinates
ds = xr.Dataset(coords=coords)
for v_name, v_dims, v_type in zip(var_names, var_dims, var_types):
shape = tuple(ds[d].size for d in v_dims)
if v_type == int or v_type == float:
# Warn about up-casting int to float?
nodata = np.tile(np.nan, shape)
elif v_type == complex:
# astype(complex) produces (nan + 0.0j)
nodata = np.tile(np.nan + np.nan*1.0j, shape)
else:
nodata = np.tile(np.nan, shape).astype(object)
ds[v_name] = (v_dims, nodata)
return ds To go with this (and this might be separate issue), a ds.sel(...).var = new_values
ds.sel(...)['var'] = new_values
ds.var.sel(...) = new_values
ds['var'].sel(...) = new_values guarantees assigning a new value, (currently only the last syntax I believe). |
@JamesPHoughton @jcmgray For empty array creation, take a look at #277 and #878 -- this functionality would certainly be welcome.
@jcmgray Beware -- none of these are actually supported! See the big warning here in the docs. If you think a |
Woops - I actually meant to put ds['var'].loc[{...}] in there as the one that works ... my understanding is that this is supported as long as the specified coordinates are 'nice' (according to And yes, default values for DataArray/Dataset would definitely fill the "create_all_missing" need. |
@shoyer My 2 cents for how this might work after 0.8+ (auto-align during import xarray.ufuncs as xrufuncs
def nonnull_compatible(first, second):
""" Check whether two (aligned) datasets have any conflicting non-null values. """
# mask for where both objects are not null
both_not_null = xrufuncs.logical_not(first.isnull() | second.isnull())
# check remaining values are equal
return first.where(both_not_null).equals(second.where(both_not_null)) And then |
@jcmgray Yes, that looks about right to me. The place to add this in would be the I would use |
Fixed by #996 |
Firstly, I think
xarray
is great and for the type of physics simulations I run n-dimensional labelled arrays is exactly what I need. But, and I may be missing something, is there a way to merge (or concatenate/update) DataArrays with different domains on the same coordinates?For example consider this setup:
I would like to aggregate such DataArrays into a new, single DataArray with
nan
padding such that:Here is a quick function I wrote to do such but I would worried about the performance of 'expanding' the new data to the old data's size every iteration (i.e. supposing that the first argument is a large DataArray that you are adding to but doesn't necessarily contain the dimensions already).
Might this be (or is this already!) possible in simpler form in
xarray
? I know Datasets havemerge
andupdate
methods but I couldn't make them work as above.I also notice there are possible plans ( #417 ) to introduce a
merge
function for DataArrays.The text was updated successfully, but these errors were encountered: