Description
When using open_mfdataset on files which 'should' share a grid, there is often a small mismatch which results in the grid not aligning properly. This happens frequently when trying to read data from large climate models from multiple files of the same variable, same lon,lat grid and different time intervals. This silent behavior means that I always have to check the sizes of the lon,lat grids whenever I rely on mfdataset to concatenate the data in time.
Here is an example in which I create two 1d DataArrays which have slightly different coordinates:
import xarray as xr
import numpy as np
from glob import glob
tol=1e-14
x1 = np.arange(1,6)+ tol*np.random.rand(5)
da1 = xr.DataArray([9, 0, 2, 1, 0], dims=['x'], coords={'x': x1})
x2 = np.arange(1,6) + tol*np.random.rand(5)
da2 = da1.copy()
da2['x'] = x2
print(da1.x,'\n', da2.x)
<xarray.DataArray 'x' (x: 5)>
array([1., 2., 3., 4., 5.])
Coordinates:
* x (x) float64 1.0 2.0 3.0 4.0 5.0
<xarray.DataArray 'x' (x: 5)>
array([1., 2., 3., 4., 5.])
Coordinates:
* x (x) float64 1.0 2.0 3.0 4.0 5.0
First I save both DataArrays as netcdf files and then use open_mfdataset to load them:
da1.to_netcdf('da1.nc',encoding={'x':{'dtype':'float64'}})
da2.to_netcdf('da2.nc',encoding={'x':{'dtype':'float64'}})
db = xr.open_mfdataset(glob('da?.nc'))
db
<xarray.Dataset>
Dimensions: (x: 10)
Coordinates:
* x (x) float64 1.0 2.0 3.0 4.0 5.0 1.0 2.0 ...
Data variables:
__xarray_dataarray_variable__ (x) int64 dask.array<shape=(10,), chunksize=(5,)>
So the x grid is now twice the size. This behavior is the same if I just use align with join='outer':
xr.align(da1,da2,join='outer')
(<xarray.DataArray (x: 10)>
array([nan, 9., nan, 0., 2., nan, nan, 1., 0., nan])
Coordinates:
* x (x) float64 1.0 1.0 2.0 2.0 3.0 3.0 4.0 4.0 5.0 5.0,
<xarray.DataArray (x: 10)>
array([ 9., nan, 0., nan, nan, 2., 1., nan, nan, 0.])
Coordinates:
* x (x) float64 1.0 1.0 2.0 2.0 3.0 3.0 4.0 4.0 5.0 5.0)
Request/ suggestion
What is needed is a user specified tolerance level to give to open_mfdataset and passed to
align which will accept these grids as the same
Possibly related to #2215
thanks, Naomi