Skip to content

Track merging datatree into xarray #8572

Closed
@TomNicholas

Description

@TomNicholas

What is your issue?

Master issue to track progress of merging xarray-datatree into xarray main. Would close #4118 (and many similar issues), as well as one of the goals of our development roadmap.

Also see the project board for DataTree integration.


On calls in the last few dev meetings, we decided to forget about a temporary cross-repo from xarray import datatree (so this issue supercedes #7418), and just begin merging datatree into xarray main directly.

Weekly meeting

See #8747

Task list:

To happen in order:

  • open_datatree in xarray. This doesn't need to be performant initially, and it would initially return a datatree.DataTree object. EDIT: We decided it should return an xarray.DataTree object, or even xarray.core.datatree.DataTree object. So we can start by just copying the basic version in datatree/io.py right now which just calls open_dataset many times. add open_datatree to xarray #8697

  • Triage and fix issues: figure out which of the issues on xarray-contrib/datatree need to be fixed before the merge (if any).

  • Merge in code for DataTree class. I suggest we do this by making one PR for each module, and ideally discussing and merging each before opening a PR for the next module. (Open to other workflow suggestions though.) The main aim here being lowering the bus factor on the code, confirming high-level design decisions, and improving details of the implementation as it goes in.

    Suggested order of modules to merge:

  • Expose datatree API publicly. Actually expose open_datatree and DataTree in xarray's public API as top-level imports. The full list of things to expose is:

    • open_datatree
    • DataTree
    • map_over_subtree
    • assert_isomorphic
    • register_datatree_accessor
  • Refactor class inheritance - Dataset/DataArray share some mixin classes (e.g. DataWithCoords), and we could probably refactor DataTree to use these too. This is low-priority but would reduce code duplication.

Can happen basically at any time or maybe in parallel with other efforts:


Anyone is welcome to help with any of this, including but not limited to @owenlittlejohns , @eni-awowale, @flamingbear (@etienneschalk maybe?).

cc also @shoyer @keewis for any thoughts as to the process.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions