-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
open_dict_of_datasets function to open any file containing nested groups #9137
Comments
I'm not sure if we'd want to use exactly the same API, but an additional use-case opening a dict of Dataset objects is opening a glob/directory of netCDF files, i.e., the first part of |
just to throw out more ideas: |
I quite like |
I also like open_groups!
…On Wed, Jul 3, 2024 at 8:04 AM Tom Nicholas ***@***.***> wrote:
I quite like open_groups! It's succinct, communicates that if you don't
have groups you won't need this, doesn't use the word datatree, and is
plural to indicate that you will get back multiple objects. Only downside
is that it breaks the pattern that open_dataset opens a dataset, because
we don't have a "group" object.
—
Reply to this email directly, view it on GitHub
<#9137 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAJJFVWI6BIWYXUUL3OZXSLZKQHILAVCNFSM6AAAAABJSNXPNCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMBWGQ2TGNJWGQ>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Talked to Tom about this but thought I would post it here too. I am working on a PR for this! |
Okay so I have a question about getting |
I think guessing the engine would be useful, but remembering that if the implementation of Also note that that the way to do this should presumably be to add an def open_datatree(path, ...) -> DataTree:
dict_of_datasets = self.open_groups(path, ...)
# if the user is trying to open a file with groups that can't be represented as a tree it will raise here
tree_root = DataTree.from_dict(dict_of_datasets)
return tree_root |
Is your feature request related to a problem?
In #9077 (comment) I suggested the idea of a function which could open any netCDF file with groups as a dictionary mapping group path strings to
xr.Dataset
objects.The motivation is as follows:
xarray.DataTree
class to support inheriting coordinates from parent groups,xr.align
),DataTree
construction time,open_datatree
directly, as doing so would raise an alignment error,xarray.Dataset
objects doesn't enforce alignment, so can represent any file.DataTree
object viaDataTree.from_dict
if they like.Describe the solution you'd like
Add a function like this:
This would live inside
backends.api.py
, and be exposed publicly as a top-level function along with the rest ofopen_datatree
/DataTree
etc. as part of #9033.The actual implementation could re-use the code for opening many groups of the same file performantly from #9014. Indeed we could add a
open_dict_of_datasets
method to theBackendEntryPoint
class, which uses pretty much the same code as the existingopen_datatree
method added in #9014 but just doesn't actually create aDataTree
object.Describe alternatives you've considered
Really the main alternative to this is not to have coordinate inheritance in
DataTree
at all (see 9077), in which caseopen_datatree
would be sufficient to open any file.The name of the function is up for debate. I prefer nothing with the word "datatree" in it since this doesn't actually create a
DataTree
object at any point. (In fact we could and perhaps should have implemented this function years ago, even without the newDataTree
class.) The reason for not calling it "open_as_dict_of_datasets
" is that we don't use "as" in the existingopen_dataset
/open_dataarray
etc.Additional context
cc @eni-awowale @flamingbear @owenlittlejohns @keewis @shoyer @autydp
The text was updated successfully, but these errors were encountered: