Skip to content

Set/preserve the character array dimension name #2895

Closed
@jmccreight

Description

@jmccreight

This is a new feature proposal not a bug. I'll open a PR against this issue momentarily, it consists of 4 lines of new code.

I've found it highly annoying that one can not set the name of the character array dimension. Looking at the code, I basically found what I expected, except for what I added. Summary: Using a variable's variable.encoding one can decode the name into variable.encoding['char_dim_name'] or one can simply set it when creating data from scratch. The "char_dim_name" can be applied upon encoding. It's simple. All the new code is the same code that already handled character arrays, so there may not be any nasty edge cases.

This shows how it works and the behavoir it changes:

# # Using the proposed changes.... 
# user@machine-session-1[1]:~/Downloads> ipython

import xarray as xa
char_arr = ['abc', 'def', 'ghi']
ds = xa.Dataset(data_vars={'char_arr': char_arr})
ds.char_arr.encoding.update({"dtype": "S1"})

# Default/current behavior
ds.to_netcdf('char_arr_string.nc')

# New functionality - name the character dimension.
ds.char_arr.encoding.update({"char_dim_name": "char_dim"})
ds.to_netcdf('char_arr_named.nc')

# user@machine-session-2[1]:~/Downloads> ncdump -h char_arr_string.nc
# netcdf char_arr_string {
# dimensions:
# 	char_arr = 3 ;
# 	string3 = 3 ;
# variables:
# 	char char_arr(char_arr, string3) ;
# 		char_arr:_Encoding = "utf-8" ;
# }
#
# user@machine-session-2[2]:~/Downloads> ncdump -h char_arr_named.nc 
# netcdf char_arr_named {
# dimensions:
# 	char_arr = 3 ;
# 	char_dim = 3 ;
# variables:
# 	char char_arr(char_arr, char_dim) ;
# 		char_arr:_Encoding = "utf-8" ;
# }


# New functionality - when decoding, preserve the character dimension name in the variable encoding for... encoding.
ds_read = xa.open_dataset('char_arr_named.nc')
ds_read.char_arr.encoding
# Out[4]: 
# {'_Encoding': 'utf-8',
#  'char_dim_name': 'char_dim',
#  'chunksizes': None,
#  'complevel': 0,
#  'contiguous': True,
#  'dtype': dtype('S1'),
#  'fletcher32': False,
#  'original_shape': (3, 3),
#  'shuffle': False,
#  'source': '/Users/james/Downloads/char_arr_named.nc',
#  'zlib': False}

ds_read.to_netcdf('char_arr_named_2.nc')
exit()


# user@machine-session-1[2]:~/Downloads> ncdump -h char_arr_named_2.nc 
# netcdf char_arr_named_2 {
# dimensions:
# 	char_arr = 3 ;
# 	char_dim = 3 ;
# variables:
# 	char char_arr(char_arr, char_dim) ;
# 		char_arr:_Encoding = "utf-8" ;
# }

# user@machine-session-1[3]:~/Downloads> pip uninstall -y xarray
# user@machine-session-1[4]:~/Downloads> pip install xarray
# user@machine-session-1[5]:~/Downloads> ipython

# The old behavior... does not preserved the char dim name.
import xarray as xa
ds_read = xa.open_dataset('char_arr_named.nc')
ds_read.char_arr.encoding
# Out[4]: 
# {'_Encoding': 'utf-8',
#  'chunksizes': None,
#  'complevel': 0,
#  'contiguous': True,
#  'dtype': dtype('S1'),
#  'fletcher32': False,
#  'original_shape': (3, 3),
#  'shuffle': False,
#  'source': '/Users/james/Downloads/char_arr_named.nc',
#  'zlib': False}

ds_read.to_netcdf('char_arr_string_2.nc')

# user@machine-session-2[6]:~/Downloads> ncdump -y char_arr_string_2.nc 
# netcdf char_arr_string_2 {
# dimensions:
# 	char_arr = 3 ;
# 	string3 = 3 ;
# variables:
# 	char char_arr(char_arr, string3) ;
# 		char_arr:_Encoding = "utf-8" ;
# }

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions