Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

XArray does not support Latin characters in netCDF file names #9282

Open
5 tasks done
devos0024 opened this issue Jul 26, 2024 · 8 comments
Open
5 tasks done

XArray does not support Latin characters in netCDF file names #9282

devos0024 opened this issue Jul 26, 2024 · 8 comments

Comments

@devos0024
Copy link

devos0024 commented Jul 26, 2024

What happened?

When you try to open an existing netCDF file named "bépo.nc", for example, you get the following error:

FileNotFoundError: [Errno 2] No such file or directory: 'E:\\temp\\bépo.nc'

What did you expect to happen?

Internally, netCDF transforms the file path into an array of bytes.
This transformation can be configured by setting the appropriate encoding in netCDF4.Dataset constructor
What I expect is to be able to pass this encoding to netCDF when the file is opened with xarray.

Perhaps the solution would be to send the encoding along with the backend_kwargs parameter :

dataset = xr.open_dataset(r"E:\temp\bépo.nc", mode="r", engine="netcdf4", backend_kwargs={'encoding': 'latin-1'})

Transmitting the encoding would also be necessary in the to_netcdf() function.

Minimal Complete Verifiable Example

import os
import tempfile as tmp

import netCDF4 as nc
import xarray as xr

if __name__ == "__main__":
    with tmp.TemporaryDirectory() as temp_dir:
        # Creating a netCDF file
        tmp_folder = os.path.join(temp_dir, "bèpo")
        os.mkdir(tmp_folder)
        file_path = os.path.join(tmp_folder, "bépo.nc")
        print(f"Try to created {file_path}")
        with nc.Dataset(file_path, mode="w", encoding="Latin-1") as ds:
            print(f"{file_path} successfully created")

        # Open with netCDF
        with nc.Dataset(file_path, mode="r", encoding="Latin-1"):
            print(f"{file_path} successfully opened with netCDF")

        # Open with xarray
        with xr.open_dataset(file_path, mode="r", engine="netcdf4") as xr_ds:
            print(f"{file_path} successfully opened")

MVCE confirmation

  • Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • Complete example — the example is self-contained, including all data and the text of any traceback.
  • Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • New issue — a search of GitHub Issues suggests this is not a duplicate.
  • Recent environment — the issue occurs with the latest version of xarray and its dependencies.

Relevant log output

E:\temp\bépo.nc successfully created
E:\temp\bépo.nc successfully opened with netCDF
Traceback (most recent call last):
  File "E:\Tools\Anaconda3\envs\myenv\lib\site-packages\xarray\backends\file_manager.py", line 211, in _acquire_with_cache_info
    file = self._cache[self._key]
  File "E:\Tools\Anaconda3\envs\myenv\lib\site-packages\xarray\backends\lru_cache.py", line 56, in __getitem__
    value = self._cache[key]
KeyError: [<class 'netCDF4._netCDF4.Dataset'>, ('E:\\temp\\bépo.nc',), 'r', (('clobber', True), ('diskless', False), ('format', 'NETCDF4'), ('persist', False)), '0916cd5f-58f9-49df-8e74-6c9b109a77cf']

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "e:\temp\test_open_dataset.py", line 19, in <module>
    with xr.open_dataset(file_path, mode="r", engine="netcdf4") as xr_ds:
  File "E:\Tools\Anaconda3\envs\myenv\lib\site-packages\xarray\backends\api.py", line 571, in open_dataset
    backend_ds = backend.open_dataset(
  File "E:\Tools\Anaconda3\envs\myenv\lib\site-packages\xarray\backends\netCDF4_.py", line 645, in open_dataset
    store = NetCDF4DataStore.open(
  File "E:\Tools\Anaconda3\envs\myenv\lib\site-packages\xarray\backends\netCDF4_.py", line 408, in open
    return cls(manager, group=group, mode=mode, lock=lock, autoclose=autoclose)
  File "E:\Tools\Anaconda3\envs\myenv\lib\site-packages\xarray\backends\netCDF4_.py", line 355, in __init__
    self.format = self.ds.data_model
  File "E:\Tools\Anaconda3\envs\myenv\lib\site-packages\xarray\backends\netCDF4_.py", line 417, in ds
    return self._acquire()
  File "E:\Tools\Anaconda3\envs\myenv\lib\site-packages\xarray\backends\netCDF4_.py", line 411, in _acquire
    with self._manager.acquire_context(needs_lock) as root:
  File "E:\Tools\Anaconda3\envs\myenv\lib\contextlib.py", line 135, in __enter__
    return next(self.gen)
  File "E:\Tools\Anaconda3\envs\myenv\lib\site-packages\xarray\backends\file_manager.py", line 199, in acquire_context
    file, cached = self._acquire_with_cache_info(needs_lock)
  File "E:\Tools\Anaconda3\envs\myenv\lib\site-packages\xarray\backends\file_manager.py", line 217, in _acquire_with_cache_info
    file = self._opener(*self._args, **kwargs)
  File "src\\netCDF4\\_netCDF4.pyx", line 2469, in netCDF4._netCDF4.Dataset.__init__
  File "src\\netCDF4\\_netCDF4.pyx", line 2028, in netCDF4._netCDF4._ensure_nc_success
FileNotFoundError: [Errno 2] No such file or directory: 'E:\\temp\\bépo.nc'

Anything else we need to know?

No response

Environment

INSTALLED VERSIONS

commit: None
python: 3.10.13 | packaged by conda-forge | (main, Oct 26 2023, 18:01:37) [MSC v.1935 64 bit (AMD64)]
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 158 Stepping 9, GenuineIntel
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('fr_FR', 'cp1252')
libhdf5: 1.14.0
libnetcdf: 4.9.1

xarray: 2024.6.0
pandas: 2.2.2
numpy: 1.26.4
scipy: 1.13.1
netCDF4: 1.6.3
pydap: None
h5netcdf: None
h5py: 3.9.0
zarr: None
cftime: 1.6.4
nc_time_axis: None
iris: None
bottleneck: 1.4.0
dask: 2024.7.1
distributed: 2024.7.1
matplotlib: 3.9.1
cartopy: None
seaborn: 0.13.2
numbagg: None
fsspec: 2024.6.1
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 71.0.4
pip: 24.0
conda: 24.5.0
pytest: 8.3.2
mypy: None
IPython: None
sphinx: 7.4.7

@devos0024 devos0024 added bug needs triage Issue that has not been reviewed by xarray team member labels Jul 26, 2024
Copy link

welcome bot commented Jul 26, 2024

Thanks for opening your first issue here at xarray! Be sure to follow the issue template!
If you have an idea for a solution, we would really welcome a Pull Request with proposed changes.
See the Contributing Guide for more.
It may take us a while to respond here, but we really value your contribution. Contributors like you help make xarray better.
Thank you!

@max-sixty
Copy link
Collaborator

I get an error in netCDF — any ideas why yours succeeds in nc? Is it Windows vs Mac?

(also note I needed to change the MCVE path, worth updating the example)

---------------------------------------------------------------------------
UnicodeDecodeError                        Traceback (most recent call last)
Cell In[3], line 11
      7 with tmp.TemporaryDirectory() as temp_dir:
      8
      9     # Creating a netCDF file
     10     file_path = Path(temp_dir) / "bépo.nc"
---> 11     with nc.Dataset(file_path, mode="w", encoding="Latin-1") as ds:
     12         print(f"{file_path} successfully created")
     14     # Open with netCDF

File src/netCDF4/_netCDF4.pyx:2469, in netCDF4._netCDF4.Dataset.__init__()

File src/netCDF4/_netCDF4.pyx:2027, in netCDF4._netCDF4._ensure_nc_success()

INSTALLED VERSIONS

commit: 42ed6d3
python: 3.11.9 (main, Apr 2 2024, 08:25:04) [Clang 15.0.0 (clang-1500.3.9.4)]
python-bits: 64
OS: Darwin
OS-release: 23.5.0
machine: arm64
processor: arm
byteorder: little
LC_ALL: en_US.UTF-8
LANG: None
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.12.2
libnetcdf: 4.9.3-development

xarray: 2024.3.1.dev31+gb9163a6f
pandas: 2.2.2
numpy: 1.26.4
scipy: 1.13.0
netCDF4: 1.6.5
pydap: None
h5netcdf: 1.3.0
h5py: 3.11.0
zarr: 2.17.2
cftime: 1.6.3
nc_time_axis: 1.4.1
iris: None
bottleneck: 1.3.8
dask: 2024.4.1
distributed: 2024.4.1
matplotlib: 3.8.4
cartopy: None
seaborn: 0.13.2
numbagg: 0.8.1
fsspec: 2024.3.1
cupy: None
pint: 0.23
sparse: None
flox: None
numpy_groupies: 0.10.2
setuptools: 69.2.0
pip: 24.0
conda: None
pytest: 8.1.1
mypy: 1.8.0
IPython: 8.24.0
sphinx: None

@devos0024
Copy link
Author

Yes, there is an error in the path. Sorry about that.

What I see is that Windows can't manage without the encoding parameter or the file created on the file system is bépo.nc.

With Linux, you don't need to specify it, the file created is correct. Creating a netCDF file with Latin-1 encoding doesn't cause an error, but it ends up as b?po.nc on the file system

I don't have a Mac, so I can't do the test. But it's possible that encoding on Mac is managed differently from Windows and Linux...

@max-sixty
Copy link
Collaborator

Yes, there is an error in the path. Sorry about that.

Is the example you posted correct? If not, could you update it?

@devos0024
Copy link
Author

Example fixed (using temporary folder from context manager)

@max-sixty
Copy link
Collaborator

Thanks.

FYI I still get this error — I'm guessing that's down to it being a Mac vs Linux issue...

Try to created /var/folders/wf/s6ycxvvs4ln8qsdbfx40hnc40000gn/T/tmp8rywv9ge/bèpo/bépo.nc
---------------------------------------------------------------------------
UnicodeDecodeError                        Traceback (most recent call last)
Cell In[1], line 14
     12 file_path = os.path.join(tmp_folder, "bépo.nc")
     13 print(f"Try to created {file_path}")
---> 14 with nc.Dataset(file_path, mode="w", encoding="Latin-1") as ds:
     15     print(f"{file_path} successfully created")
     17 # Open with netCDF

File src/netCDF4/_netCDF4.pyx:2469, in netCDF4._netCDF4.Dataset.__init__()

File src/netCDF4/_netCDF4.pyx:2027, in netCDF4._netCDF4._ensure_nc_success()

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe8 in position 62: invalid continuation byte

@devos0024
Copy link
Author

This test demonstrates the issue on a Windows platform configured as CP1252.
To make it work, such a Windows is required.

@dcherian dcherian added contrib-help-wanted topic-backends and removed needs triage Issue that has not been reviewed by xarray team member labels Aug 26, 2024
@dcherian
Copy link
Contributor

Forwarding encoding to netCDF4 seems like a good idea in general. Though since that clashes with an Xarray kwargs perhaps rename it to filename_encoding at the Xarray level./

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants