Closed
Description
This is a new feature proposal not a bug. I'll open a PR against this issue momentarily, it consists of 4 lines of new code.
I've found it highly annoying that one can not set the name of the character array dimension. Looking at the code, I basically found what I expected, except for what I added. Summary: Using a variable's variable.encoding one can decode the name into variable.encoding['char_dim_name']
or one can simply set it when creating data from scratch. The "char_dim_name" can be applied upon encoding. It's simple. All the new code is the same code that already handled character arrays, so there may not be any nasty edge cases.
This shows how it works and the behavoir it changes:
# # Using the proposed changes....
# user@machine-session-1[1]:~/Downloads> ipython
import xarray as xa
char_arr = ['abc', 'def', 'ghi']
ds = xa.Dataset(data_vars={'char_arr': char_arr})
ds.char_arr.encoding.update({"dtype": "S1"})
# Default/current behavior
ds.to_netcdf('char_arr_string.nc')
# New functionality - name the character dimension.
ds.char_arr.encoding.update({"char_dim_name": "char_dim"})
ds.to_netcdf('char_arr_named.nc')
# user@machine-session-2[1]:~/Downloads> ncdump -h char_arr_string.nc
# netcdf char_arr_string {
# dimensions:
# char_arr = 3 ;
# string3 = 3 ;
# variables:
# char char_arr(char_arr, string3) ;
# char_arr:_Encoding = "utf-8" ;
# }
#
# user@machine-session-2[2]:~/Downloads> ncdump -h char_arr_named.nc
# netcdf char_arr_named {
# dimensions:
# char_arr = 3 ;
# char_dim = 3 ;
# variables:
# char char_arr(char_arr, char_dim) ;
# char_arr:_Encoding = "utf-8" ;
# }
# New functionality - when decoding, preserve the character dimension name in the variable encoding for... encoding.
ds_read = xa.open_dataset('char_arr_named.nc')
ds_read.char_arr.encoding
# Out[4]:
# {'_Encoding': 'utf-8',
# 'char_dim_name': 'char_dim',
# 'chunksizes': None,
# 'complevel': 0,
# 'contiguous': True,
# 'dtype': dtype('S1'),
# 'fletcher32': False,
# 'original_shape': (3, 3),
# 'shuffle': False,
# 'source': '/Users/james/Downloads/char_arr_named.nc',
# 'zlib': False}
ds_read.to_netcdf('char_arr_named_2.nc')
exit()
# user@machine-session-1[2]:~/Downloads> ncdump -h char_arr_named_2.nc
# netcdf char_arr_named_2 {
# dimensions:
# char_arr = 3 ;
# char_dim = 3 ;
# variables:
# char char_arr(char_arr, char_dim) ;
# char_arr:_Encoding = "utf-8" ;
# }
# user@machine-session-1[3]:~/Downloads> pip uninstall -y xarray
# user@machine-session-1[4]:~/Downloads> pip install xarray
# user@machine-session-1[5]:~/Downloads> ipython
# The old behavior... does not preserved the char dim name.
import xarray as xa
ds_read = xa.open_dataset('char_arr_named.nc')
ds_read.char_arr.encoding
# Out[4]:
# {'_Encoding': 'utf-8',
# 'chunksizes': None,
# 'complevel': 0,
# 'contiguous': True,
# 'dtype': dtype('S1'),
# 'fletcher32': False,
# 'original_shape': (3, 3),
# 'shuffle': False,
# 'source': '/Users/james/Downloads/char_arr_named.nc',
# 'zlib': False}
ds_read.to_netcdf('char_arr_string_2.nc')
# user@machine-session-2[6]:~/Downloads> ncdump -y char_arr_string_2.nc
# netcdf char_arr_string_2 {
# dimensions:
# char_arr = 3 ;
# string3 = 3 ;
# variables:
# char char_arr(char_arr, string3) ;
# char_arr:_Encoding = "utf-8" ;
# }
Metadata
Metadata
Assignees
Labels
No labels