Skip to content

Conversation

@abel-bzz
Copy link
Collaborator

@abel-bzz abel-bzz commented Jan 16, 2024

  • Include documentation when adding new features.
  • Include new tests or update existing tests when applicable.

This PR fixes #64

  • This adds an optional group argument to open_ncml
  • When present, only the given group of the ncml will be read.
  • When absent, the root group '/' is read.
    The above is similar to xarray's open_dataset.

In addition, using group='*', it flattens every group into a single datasets (somewhat preserving the current behavior).
The names conflicts between groups are solved by appending an incrementing __n to the variable names, where n is a number.
Plus, an attribute group_path is added to variables in order to retrieve their original path once they have been flatten.
This can be useful to recreate the original structure.

Abel Aoun added 5 commits January 11, 2024 20:01
When open_ncml is called with group="*", every group will be read
and they will be flatten in the resulting dataset.
If names are conflicting, the dimensions and varaibles names are
appended with __n where n is the number of existing simlar names.
@abel-bzz abel-bzz marked this pull request as draft January 16, 2024 15:22
@abel-bzz
Copy link
Collaborator Author

Converting PR to draft as I discovered flatten is not working properly at the moment.

xncml/parser.py Outdated
' to preserve the type.'
)
return xr.Variable(data=None, dims=())
default_value = nctype(var_type)()
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed (again) the behavior of empty scalar parsing here.
For context, here the ncml describe a scalar with a certain type but without providing any value. We would ideally like to create a placeholder for a scalar variable with a numpy type.
But numpy doesn't allow this for scalar, only with an ndarray we can create a typed empty array of a certain shape.

I first though it would be better to loose the type and create a empty scalar with the value None but not having this type can mess with subsequent processing.
Now I think it's better to preserve the dtype and fill the scalar with the default value of this dtype.

I would appreciate comments/suggestions here.

Abel Aoun added 3 commits February 2, 2024 10:23
- improved performances by parsing ncml reprsentation only once
- fixed issues of rewritting content when read multiple times
@abel-bzz abel-bzz marked this pull request as ready for review February 5, 2024 12:37
@abel-bzz abel-bzz requested a review from huard February 5, 2024 12:45
abel-bzz and others added 2 commits February 5, 2024 18:10
@abel-bzz abel-bzz merged commit dc7ab52 into main Feb 6, 2024
@abel-bzz abel-bzz deleted the enh/manage_groups branch February 6, 2024 08:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature request: Create a Datatree instead of a Dataset when there are groups

3 participants