Skip to content

ValueError when trying to encode time variable in a NetCDF file with CF convensions #3739

Closed
@avatar101

Description

@avatar101
# Imports
import numpy as np
import xarray as xr
import pandas as pd
from glob import glob

# files to be concatenated
files = sorted(glob(path + str(1988) + '/V250*'))
# corrected dates
dates = pd.date_range(start=str(yr), end=str(yr+1), freq='6H', closed='left')

ds_test = xr.open_mfdataset(files[:10], combine='nested', concat_dim='time', decode_cf=False)
# correcting time
ds_test.time.values=dates[:10]
# fixing encoding
ds_test.time.attrs['units'] = "Seconds since 1970-01-01 00:00:00"

# preview of the time variable
print(ds_test.time)

> <xarray.DataArray 'time' (time: 10)>
array(['1988-01-01T00:00:00.000000000', '1988-01-01T06:00:00.000000000',
       '1988-01-01T12:00:00.000000000', '1988-01-01T18:00:00.000000000',
       '1988-01-02T00:00:00.000000000', '1988-01-02T06:00:00.000000000',
       '1988-01-02T12:00:00.000000000', '1988-01-02T18:00:00.000000000',
       '1988-01-03T00:00:00.000000000', '1988-01-03T06:00:00.000000000'],
      dtype='datetime64[ns]')
Coordinates:
  * time     (time) datetime64[ns] 1988-01-01 ... 1988-01-03T06:00:00
Attributes:
    calendar:       proleptic_gregorian
    standard_name:  time
    units:          Seconds since 1970-01-01 00:00:00

ds_test.to_netcdf(path+'test.nc')

>ValueError: failed to prevent overwriting existing key units in attrs on variable 'time'.
 This is probably an encoding field used by xarray to describe how a variable is serialized. 
To proceed, remove this key from the variable's attributes manually.




Expected Output

Correctly encode time such that it saves the file by correctly converting value of time according to the reference units. I have the flexibility of dropping CF-conventions as long as time values are correct but it would also be nice to have a solution which keeps the CF-conventions intact.

Problem Description

I'm trying to concatenate netcdf files which have CF conventions mentioned in their global attributes. These files have an incorrect time dimension which I try to fix with the code above. It seems that some existing encoding is preventing from writing the files back. But when I print the encoding, it doesn't show any such clashing units. I'm not sure if this is a bug or a wrong usage issue. Thus, any usage help on how to correctly encode time such that it saves the time values by correctly converting according to the reference units is much appreciated.

# More diagnostics on the encoding
print(ds_test.encoding)
>{'unlimited_dims': {'time'},
 'source': '/file/to/path/V250_19880101_00'}

# checking any existing time
print(ds_test.time.encoding)
>{}

# another try on setting time encoding
ds_test.time.encoding['units'] = "Seconds since 1970-01-01 00:00:00"
# writing the file gives the same ValueError as above
ds_test.to_netcdf(path+'test.nc')

# ncdump output of one of the files
>netcdf V250_19880101_06 {
dimensions:
	lon = 720 ;
	lat = 361 ;
	lev = 1 ;
	time = UNLIMITED ; // (1 currently)
variables:
	float lon(lon) ;
		lon:long_name = "longitude" ;
		lon:units = "degrees_east" ;
		lon:standard_name = "longitude" ;
		lon:axis = "X" ;
	float lat(lat) ;
		lat:long_name = "latitude" ;
		lat:units = "degrees_north" ;
		lat:standard_name = "latitude" ;
		lat:axis = "Y" ;
	float lev(lev) ;
		lev:long_name = "hybrid level at layer midpoints" ;
		lev:units = "level" ;
		lev:standard_name = "hybrid_sigma_pressure" ;
		lev:positive = "down" ;
		lev:formula = "hyam hybm (mlev=hyam+hybm*aps)" ;
		lev:formula_terms = "ap: hyam b: hybm ps: aps" ;
	float time(time) ;
		time:units = "hours since 1988-01-01 06:00:00" ;
		time:calendar = "proleptic_gregorian" ;
		time:standard_name = "time" ;
	float V(time, lev, lat, lon) ;
		V:long_name = "unknown (please add with NCO)" ;
		V:units = "unknown (please add with NCO)" ;
		V:_FillValue = -999.99f ;

// global attributes:
		:Conventions = "CF" ;
		:constants_file_name = "P19880101_06" ;
		:institution = "IACETH" ;
		:lonmin = -180.f ;
		:lonmax = 179.5f ;
		:latmin = -90.f ;
		:latmax = 90.f ;
		:levmin = 250.f ;
		:levmax = 250.f ;
		:history = "Fri Sep  6 15:59:17 2019: ncatted -a units,time,o,c,hours since 1988-01-01 06:00:00 -a standard_name,time,o,c,time V250_19880101_06" ;
		:NCO = "4.7.2" ;
data:

 time = 6 ;
}

Output of xr.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.7.4 (default, Aug 13 2019, 20:35:49) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 5.0.0-23-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.4 libnetcdf: 4.6.1

xarray: 0.13.0
pandas: 0.25.3
numpy: 1.18.1
scipy: 1.3.2
netCDF4: 1.4.2
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: 1.0.4.2
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: 2.9.2
distributed: 2.9.3
matplotlib: 3.1.0
cartopy: 0.17.0
seaborn: 0.9.0
numbagg: None
setuptools: 44.0.0.post20200106
pip: 19.3.1
conda: None
pytest: None
IPython: 7.11.1
sphinx: None

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions