Skip to content

Merge dask and distributed repos? #402

Open
@fjetter

Description

@fjetter

I frequently feel pain from having two distinct repositories with dask/dask and dask/distributed. Lately we've been working much more on changes that affect both repos and synchronizing PRs across repos is painful and cumbersome. With the addition of dask-expr this adds to a third repo and there are occasionally changes that span all three repos (e.g. sending Expr classes to the scheduler without materializing client side).

Additionally, documentation, maintenance and release procedures add additional work per repo.

The code is currently hard locked anyhow so we essentially sacrificed almost all flexibility of having multiple repos already and are pretty much paying for the disadvantage.

I would like to propose to merge the two (three) repos into a single one. We should still maintain multiple python packages so nothing would change for the end user other than having a single issue tracker to report issues to.

The problems I suspect we'll be running into are

  • CI runtime for distributed is relatively high. We can remove some redundant tests once both are in the same repo but would still have longer runtime if everything is tested. We'd likely still maintain separate github workflows for testing that use appropriate paths/paths-ignore to somewhat decouple this.
  • CI of distributed is flaky and has been for a long time. However, if the code only runs selectively depending on the path changes, this would not change by merging the repos
  • This would likely impact all existing PRs since we'd likely have to change directory structure to support multiple pyproject.toml files

Are there problems I haven't thought about? Any other reasons why the two code bases should remain separate? I'm not very familiar with packaging. Is there anything in this realm that needs consideration?

cc @mrocklin @jacobtomlinson @quasiben @jrbourbeau @rjzamora @charlesbluca @hendrikmakait @phofl

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions