Set/preserve the character array dimension name

This is a new feature proposal not a bug. I'll open a PR against this issue momentarily, it consists of 4 lines of new code.

I've found it highly annoying that one can not set the name of the character array dimension. Looking at the code,  I basically found what I expected, except for what I added. Summary: Using a variable's variable.encoding one can decode the name into `variable.encoding['char_dim_name']` or one can simply set it when creating data from scratch. The "char_dim_name"  can be applied upon encoding. It's simple. All the new code is the same code that already handled character arrays, so there may not be any nasty edge cases. 

This shows how it works and the behavoir it changes:

```
# # Using the proposed changes.... 
# user@machine-session-1[1]:~/Downloads> ipython

import xarray as xa
char_arr = ['abc', 'def', 'ghi']
ds = xa.Dataset(data_vars={'char_arr': char_arr})
ds.char_arr.encoding.update({"dtype": "S1"})

# Default/current behavior
ds.to_netcdf('char_arr_string.nc')

# New functionality - name the character dimension.
ds.char_arr.encoding.update({"char_dim_name": "char_dim"})
ds.to_netcdf('char_arr_named.nc')

# user@machine-session-2[1]:~/Downloads> ncdump -h char_arr_string.nc
# netcdf char_arr_string {
# dimensions:
# 	char_arr = 3 ;
# 	string3 = 3 ;
# variables:
# 	char char_arr(char_arr, string3) ;
# 		char_arr:_Encoding = "utf-8" ;
# }
#
# user@machine-session-2[2]:~/Downloads> ncdump -h char_arr_named.nc 
# netcdf char_arr_named {
# dimensions:
# 	char_arr = 3 ;
# 	char_dim = 3 ;
# variables:
# 	char char_arr(char_arr, char_dim) ;
# 		char_arr:_Encoding = "utf-8" ;
# }


# New functionality - when decoding, preserve the character dimension name in the variable encoding for... encoding.
ds_read = xa.open_dataset('char_arr_named.nc')
ds_read.char_arr.encoding
# Out[4]: 
# {'_Encoding': 'utf-8',
#  'char_dim_name': 'char_dim',
#  'chunksizes': None,
#  'complevel': 0,
#  'contiguous': True,
#  'dtype': dtype('S1'),
#  'fletcher32': False,
#  'original_shape': (3, 3),
#  'shuffle': False,
#  'source': '/Users/james/Downloads/char_arr_named.nc',
#  'zlib': False}

ds_read.to_netcdf('char_arr_named_2.nc')
exit()


# user@machine-session-1[2]:~/Downloads> ncdump -h char_arr_named_2.nc 
# netcdf char_arr_named_2 {
# dimensions:
# 	char_arr = 3 ;
# 	char_dim = 3 ;
# variables:
# 	char char_arr(char_arr, char_dim) ;
# 		char_arr:_Encoding = "utf-8" ;
# }

# user@machine-session-1[3]:~/Downloads> pip uninstall -y xarray
# user@machine-session-1[4]:~/Downloads> pip install xarray
# user@machine-session-1[5]:~/Downloads> ipython

# The old behavior... does not preserved the char dim name.
import xarray as xa
ds_read = xa.open_dataset('char_arr_named.nc')
ds_read.char_arr.encoding
# Out[4]: 
# {'_Encoding': 'utf-8',
#  'chunksizes': None,
#  'complevel': 0,
#  'contiguous': True,
#  'dtype': dtype('S1'),
#  'fletcher32': False,
#  'original_shape': (3, 3),
#  'shuffle': False,
#  'source': '/Users/james/Downloads/char_arr_named.nc',
#  'zlib': False}

ds_read.to_netcdf('char_arr_string_2.nc')

# user@machine-session-2[6]:~/Downloads> ncdump -y char_arr_string_2.nc 
# netcdf char_arr_string_2 {
# dimensions:
# 	char_arr = 3 ;
# 	string3 = 3 ;
# variables:
# 	char char_arr(char_arr, string3) ;
# 		char_arr:_Encoding = "utf-8" ;
# }

```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Set/preserve the character array dimension name #2895

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Set/preserve the character array dimension name #2895

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions