Closed
Description
We should write a blog post advertising the (upcoming) release of xarray-datatree in xarray main
. (We're getting very close! - see pydata/xarray#8572 (comment))
This doesn't block our release of datatree, we can publish this after quietly adding datatree into xarray main
(might be better to have a pseudo-staged rollout anyway).
Content Ideas:
- Motivate why users wanted a hierarchical structure (e.g. Feature Request: Hierarchical storage and processing in xarray pydata/xarray#4118 and Dataset groups pydata/xarray#1092)
- Very brief explanation of the solution we have ended up with
- Doesn't need to explain much about actually using datatree - that should be covered by pointing people to the docs.
- Emphasise that this is a big deal
- Arguably the single largest feature added to xarray in 10 years? (I think it is by LoC)
- For a decade there have been 2+1 (public + private) xarray data structures, now there are 3+1. (
DataArray
,Dataset
,DataTree
, the semi-private one isVariable
)
- Mention the prototype in xarray-contrib/datatree repo
- Explain how old repository is now archived
- And link to migration guide Migration guide for old users of xarray-contrib/datatree pydata/xarray#8807 (comment)
- Story of development
- Originally applied for CZI funding to do this but didn't get it
- Prototyped by Tom in separate repository
- Iterated there until it mostly solidified, people started using it quite a lot even though it was marked "experimental"
- Sat there for ~2 years until NASA group came along
- They were already using the experimental version but wanted (a) more guarantees of support and (b) more representation/integration of their staff with datatree project
- Amazingly they already had permission to allocate developer time
- Owen, Matt & Eni then worked on migrating datatree into xarray upstream, with supervision from Tom, Stephan, and Justus
- Allowed us to reduce bus factor and sanity check approach
- Also gave us a chance to make big change to design (especially coordinate inheritance)
- Took a bit longer than anticipated but otherwise worked out quite well
- Got 3 new xarray core developers now - so NASA has more explicit representation
- Was a lot easier for xarray team not to have to write a proposal to get developer time
- This approach could work again in future!
- Implore people to try datatree out, but also to report bugs / suggestions as it's still being built up to its full potential.
I'm happy to write this post, unless anyone else particularly wants to.
cc the datatree migration team, i.e. @shoyer, @keewis, @owenlittlejohns , @eni-awowale , @flamingbear
also @briannapagan in case you want to add any perspective about telling the story of collaboration here