Description
What is your issue?
xr.Dataset
implements a bunch of dask-specific methods, such as __dask_tokenize__
and __dask_graph__
. It also obviously has public methods that involve dask such as .compute()
and .load()
.
In DataTree
on the other hand, I haven't yet implemented any methods like these, or even written any tests that involve dask! You can probably still use dask with datatree right now, but from dask's perspective the datatree is presumably merely a set of unconnected Dataset
objects.
We could choose to implement methods like .load()
as just a mapping over the tree, i.e.
def load(self):
for node in self.subtree:
if node.has_data:
node.ds.load()
Most of that should just already work (or work very easily) using map_over_subtree
.
There are also special double-underscore methods defined on Dataset
https://docs.dask.org/en/stable/custom-collections.html
Xarray objects satisfy this Collections protocol, so you can do dask.tokenize(xarray_thing)
, dask.compute(xarray_thing)
etc (visualize
, persist
).
We could add these, but it would be rather nice if someone who understands the double-underscore dask methods really well just took this on. @darothen helpfully started this in xarray-contrib/datatree#196 but it stalled.
@jrbourbeau are you/Coiled interested in submitting a PR to get xarray.DataTree
fully integrated with dask?