-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BackendEntrypoint bug chunking data #8810
Comments
Thanks for opening your first issue here at xarray! Be sure to follow the issue template! |
Thanks for the issue, but we really need a more focused example to make progress here |
I agree that debugging the somewhat complex piece of code you linked to is a bit much to ask for an issue (especially given that we're already somewhat behind in answering some issues). However, glancing over the code, I would point you to the preferred chunksizes section in the docs, in particular this sentence:
In other words, your backend is supposed to use the lazy indexing classes mentioned in that document, and you will get the |
Of course, debugging that code is not what I wanted you to do - you guys have different things to do. Thanks for the hint with the lazy indexing classes. I am gonna check that out. I wrote a shorter example to show that this is a more general problem.
In this case, chunking from xarray in the backend is not forwarded. |
thanks for the minimal example. That reduces this issue to just "Should we allow backends to return chunked datasets?", which is much easier to discuss. Back when we introduced the backends the current behavior was intentional, but that doesn't mean we can't change it (we'll first have to figure out whether that's a good idea, though). |
Why don't you use There's a nice tutorial of how to set up lazy handling of backends here: |
Yeah I have updated the code already making use of the lazy loading structure. I did use the mf function in the past already. It is just more conviniend if the user doesn't have to define args that are the same always (like parallel=True, concat_dim='t' etc.). There are also some other reasons for using the original function instead of mf, but this would go to far. |
What happened?
Hi together,
I have written a class for the BackendEntrypoint. Since the data can be quite large, I wrote huge parts by making use of the dask.delayed function. As one of the last steps I am redefining the chunks. The code is loading single files and concatenates them into a xarray. As a result, it makes sense to chunk for each single file. This chunking is not forwarded to the main program, where the class is used:
dset = dset.chunk(dict(zip(dims, list(reversed([list(reversed(list(shape)))[i] if i < 4 else 1 for i in range(len(list(shape)))])))))
return dset
https://github.com/timvgl/mumaxXR/blob/daa4345b482197b65a7bd88df1552d9fa48b8bf7/src/mumaxXR/OvfEngine.py#L356-L357
Between dset = .... and return dset dset.chunksizes gives the expcted values. However, in the main program, the chunking is not available anymore and dset.chunksizes gives Frozen({}).
Thanks for your help!
Tim
What did you expect to happen?
No response
Minimal Complete Verifiable Example
No response
MVCE confirmation
Relevant log output
No response
Anything else we need to know?
No response
Environment
xarray version: 2023.1.0
Linux
The text was updated successfully, but these errors were encountered: