Skip to content

Check and fix chararray string dimension names #10395

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

kmuehlbauer
Copy link
Contributor

@kmuehlbauer kmuehlbauer commented Jun 4, 2025

@kmuehlbauer kmuehlbauer changed the title Fix chararray strings Check and fix chararray string dimension names Jun 4, 2025
@kmuehlbauer
Copy link
Contributor Author

This is ready for review.

In case of character arrays when encoding the PR aims to:

  1. check encoding["char_dim_name"] for something like string10 -> which gets splitted into string (prefix) and 10 (strlen)
  2. if the size of the last dimension of the character array isn't equal to strlen, a warning for possible dimension naming clash is issued and a new dimension name is created with the actual array size
  3. if no trailing number is found, we check if encoding["original_shape"] is set, if it doesn't corresponds to the actual string length, a warning for possible dimension naming clash is issued and a new dimension name is created with the actual array size
  4. if encoding["char_dim_name"] isn't set, we continue creating new dim names with stringN naming scheme.

At this place each variable is handled separately so we do not have the full picture of all needed character array dimensions. Since we now issue warnings and create fitting dimensions names to prevent breakages there are chances we get false positives (eg in the case of several variables with the same size which have all received the same change in size and have encoding["char_dim_name"] = "somestring" without numbering and also original_shape set). I think that additional warnings in those cases might help users in identifying issues and are acceptable.

@kmuehlbauer
Copy link
Contributor Author

kmuehlbauer commented Jun 12, 2025

Pydap test failure in all-but-numba CI run might be real. Need to investigate.

Update: Fails in other PR, too. Pydap backend errors are most likely related to recent changes in numpy (2.2.6 works, 2.3.0 fails)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

to_netcdf from subsetted Dataset with strings loaded from char array netCDF can sometimes fail
1 participant