Skip to content

Indexing preserves outdated attrs which cause trouble downstream #2247

Closed
@leouieda

Description

@leouieda

Code Sample, a copy-pastable example if possible

import xarray as xr

# Load a global grid with 1 degree spacing distributed by GMT
whole = xr.open_dataarray('earth_relief_60m.grd')
print("Global grid:\n", whole)
# The grid coordinates have metadata regarding the range (actual_range).
# This is used to detect if the grid is pixel or node registered.
print("\nMetadata for global coordinates:", whole.lat.attrs)

# Select only between latitudes -40 and 40
part = whole.sel(lat=slice(-40, 40))
print("\nSliced grid:\n", part)
# Slicing preserves the coordinate metadata and now the actual_range is incorrect.
# This is preserved when saving to netCDF and causes errors that are difficult to
# diagnose and fix when passing it along to GMT for plotting.
print("\nMetadata for sliced coordinates:", part.lat.attrs)

Output:

Global grid:
 <xarray.DataArray 'z' (lat: 181, lon: 361)>
array([[ 2762.,  2762.,  2762., ...,  2762.,  2762.,  2762.],
       [ 2983.,  2980.,  2977., ...,  2989.,  2986.,  2983.],
       [ 3074.,  3074.,  3074., ...,  3072.,  3073.,  3074.],
       ...,
       [-3727., -3715., -3706., ..., -3759., -3742., -3727.],
       [-2294., -2282., -2271., ..., -2322., -2308., -2294.],
       [-4181., -4181., -4181., ..., -4181., -4181., -4181.]], dtype=float32)
Coordinates:
  * lon      (lon) float64 -180.0 -179.0 -178.0 -177.0 -176.0 -175.0 -174.0 ...
  * lat      (lat) float64 -90.0 -89.0 -88.0 -87.0 -86.0 -85.0 -84.0 -83.0 ...
Attributes:
    long_name:     z
    actual_range:  [-8425.  5551.]

Metadata for global coordinates: OrderedDict([('long_name', 'latitude'), ('units', 'degrees_north'), ('actual_range', array([-90.,  90.]))])

Sliced grid:
 <xarray.DataArray 'z' (lat: 81, lon: 361)>
array([[-3062., -3451., -3695., ..., -1504., -3226., -3062.],
       [-3559., -3515., -3773., ...,  -128., -2991., -3559.],
       [-3552., -3498., -3459., ...,   519., -1149., -3552.],
       ...,
       [-5425., -5389., -5268., ..., -5162., -5399., -5425.],
       [-5385., -5499., -5580., ..., -5407., -5497., -5385.],
       [-5325., -5937., -5557., ..., -5602., -5572., -5325.]], dtype=float32)
Coordinates:
  * lon      (lon) float64 -180.0 -179.0 -178.0 -177.0 -176.0 -175.0 -174.0 ...
  * lat      (lat) float64 -40.0 -39.0 -38.0 -37.0 -36.0 -35.0 -34.0 -33.0 ...
Attributes:
    long_name:     z
    actual_range:  [-8425.  5551.]

Metadata for sliced coordinates: OrderedDict([('long_name', 'latitude'), ('units', 'degrees_north'), ('actual_range', array([-90.,  90.]))])

Problem description

Indexing seems to preserve the attrs. If it contains information about the values, then this information will be outdated. Some software, like GMT rely on this information for certain operations. It can manage missing metadata but there is no way to guard against incorrect metadata.

Expected Output

I would expect indexing to drop attrs unless keep_attrs is specified. It's better to have no metadata than to have incorrect metadata.

Output of xr.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.6.4.final.0 python-bits: 64 OS: Linux OS-release: 4.15.18-3-MANJARO machine: x86_64 processor: byteorder: little LC_ALL: None LANG: en_US.utf8 LOCALE: en_US.UTF-8

xarray: 0.10.2
pandas: 0.22.0
numpy: 1.14.2
scipy: 1.0.1
netCDF4: 1.3.1
h5netcdf: 0.5.0
h5py: 2.7.1
Nio: None
zarr: None
bottleneck: 1.2.1
cyordereddict: None
dask: 0.17.2
distributed: 1.21.4
matplotlib: 2.2.2
cartopy: None
seaborn: None
setuptools: 39.0.1
pip: 9.0.1
conda: None
pytest: 3.5.0
IPython: 6.2.1
sphinx: 1.7.2

Metadata

Metadata

Assignees

No one assigned

    Labels

    topic-metadataRelating to the handling of metadata (i.e. attrs and encoding)

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions