Skip to content

tolerance for alignment #2217

Open
@naomi-henderson

Description

@naomi-henderson

When using open_mfdataset on files which 'should' share a grid, there is often a small mismatch which results in the grid not aligning properly. This happens frequently when trying to read data from large climate models from multiple files of the same variable, same lon,lat grid and different time intervals. This silent behavior means that I always have to check the sizes of the lon,lat grids whenever I rely on mfdataset to concatenate the data in time.

Here is an example in which I create two 1d DataArrays which have slightly different coordinates:

import xarray as xr
import numpy as np
from glob import glob

tol=1e-14
x1 = np.arange(1,6)+ tol*np.random.rand(5)
da1 = xr.DataArray([9, 0, 2, 1, 0], dims=['x'], coords={'x': x1})

x2 = np.arange(1,6) + tol*np.random.rand(5)
da2 = da1.copy()
da2['x'] = x2

print(da1.x,'\n', da2.x)
<xarray.DataArray 'x' (x: 5)>
array([1., 2., 3., 4., 5.])
Coordinates:
  * x        (x) float64 1.0 2.0 3.0 4.0 5.0 
 <xarray.DataArray 'x' (x: 5)>
array([1., 2., 3., 4., 5.])
Coordinates:
  * x        (x) float64 1.0 2.0 3.0 4.0 5.0

First I save both DataArrays as netcdf files and then use open_mfdataset to load them:

da1.to_netcdf('da1.nc',encoding={'x':{'dtype':'float64'}})
da2.to_netcdf('da2.nc',encoding={'x':{'dtype':'float64'}})

db = xr.open_mfdataset(glob('da?.nc'))

db
<xarray.Dataset>
Dimensions:                        (x: 10)
Coordinates:
  * x                              (x) float64 1.0 2.0 3.0 4.0 5.0 1.0 2.0 ...
Data variables:
    __xarray_dataarray_variable__  (x) int64 dask.array<shape=(10,), chunksize=(5,)>

So the x grid is now twice the size. This behavior is the same if I just use align with join='outer':

xr.align(da1,da2,join='outer')
(<xarray.DataArray (x: 10)>
 array([nan,  9., nan,  0.,  2., nan, nan,  1.,  0., nan])
 Coordinates:
   * x        (x) float64 1.0 1.0 2.0 2.0 3.0 3.0 4.0 4.0 5.0 5.0,
 <xarray.DataArray (x: 10)>
 array([ 9., nan,  0., nan, nan,  2.,  1., nan, nan,  0.])
 Coordinates:
   * x        (x) float64 1.0 1.0 2.0 2.0 3.0 3.0 4.0 4.0 5.0 5.0)

Request/ suggestion

What is needed is a user specified tolerance level to give to open_mfdataset and passed to
align which will accept these grids as the same

Possibly related to #2215

xr.__version__ '0.10.4'

thanks, Naomi

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions