-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
open_mfdataset overwrites variables with different values but overlapping coordinates #4077
Comments
Yes, The magic happens here: Line 49 in 2542a63
In your case it just uses the rightmost array (compare (Note that |
Got it, Thanks! |
Raising an error when the start time is equal is certainly a good idea. What I am less sure about is what to do when the end is equal to the start - maybe a warning? The second case would be the following: print(ds0)
print(ds1)
and xr.combine_by_coords([ds0, ds1])
For the first case you can probably check if all elements of Line 99 in 2542a63
ps: Overlapping indices are not a problem - it is checked that the result is monotonic: Line 748 in 2542a63
|
The second part could probably be tested just below this Line 751 in 2542a63
using if not indexes.is_unique:
raise ValueError("") (or a warning) |
Thanks for reporting this @malmans2! There are actually two issues here: The minor one is that it should never have been possible to specify The more complex issue is that you can get the same overwriting problem in That was actually deliberate, @shoyer we discussed that PR (#2616) extensively, but I can't see an explicit record of discussing that particular line? But since then @dcherian has done work on the options which vary the strictness of checking - should EDIT: (sorry for repeating what was said above, I wrote this reply last night and sent it today) |
What is the expected outcome here? An error? The only way I can think of to combine these two datasets without losing data is to do |
We already have the coordinates loaded into memory at this point -- each elements of Looking at the first values makes sense for determining the order, but doesn't guarantee that they are safe to concatenate. The contract of I think we are missing another safety check verifying In my opinion, xarray's combine functions like |
If What about something like this? I think it would cover all possibilities, but maybe it is too expensive? if not indexes[0].append(indexes[1:]).is_unique:
raise ValueError |
Nevermind, it looks like if the check goes into |
@malmans2 are you interested in submitting a pull request to add this? (If not then that's fine!) |
Yup, happy to do it. Just one doubt. I think in cases where xr.merge([datasets[i].isel(dim=-1), datasets[i+1].isel(dim=0)], compat=compat) |
In order to maintain a list of currently relevant issues, we mark issues as stale after a period of inactivity If this issue remains relevant, please comment here or remove the |
In the example below I'm opening and concatenating two datasets using
open_mfdataset
. These datasets have variables with different values but overlapping coordinates. I'm concatenating alongy
, which is0...4
in one dataset and0...5
in the other. They
dimension of the resulting dataset is0...5
which means thatopen_mfdataset
has overwritten some values without showing any error/warning.Is this the expected default behavior? I would expect to get at least a warning, but maybe I'm misunderstanding the default arguments.
I tried to play with the arguments, but I couldn't figure out which argument I should change to get an error in these scenarios.
MCVE Code Sample
Versions
Output of xr.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.8.2 | packaged by conda-forge | (default, Apr 24 2020, 08:20:52)
[GCC 7.3.0]
python-bits: 64
OS: Linux
OS-release: 5.4.0-29-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
libhdf5: 1.10.6
libnetcdf: 4.7.4
xarray: 0.15.1
pandas: 1.0.3
numpy: 1.18.4
scipy: None
netCDF4: 1.5.3
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: 1.1.3
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: 1.3.2
dask: 2.16.0
distributed: 2.16.0
matplotlib: None
cartopy: None
seaborn: None
numbagg: None
setuptools: 46.4.0.post20200518
pip: 20.1
conda: None
pytest: None
IPython: 7.13.0
sphinx: None
The text was updated successfully, but these errors were encountered: