Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Separate scheduling API from dask implementation #30

Merged
merged 12 commits into from
Jul 27, 2020

Commits on Jul 24, 2020

  1. Separate chunking API from scheduling

    xref pangeo-data#29
    
    This PR moves the dask specific scheduling logic into a separate `dask.py`
    file, as a first step for adding support for alternative schedulers. (I'm
    particularly interested in supporting Apache Beam.)
    
    The existing tests pass (with minor modifications), but the documentation still
    needs updating.
    
    Notes:
    
    - I put `staged_copy` into a single function, but perhaps there are other
      generic methods (`execute`?) that would justify using a class?
    - `Rechunked` no longer inherits from `dask.delayed.Delayed`, and no longer has
      any dask specific logic at all. I think this is important for generic
      scheduler support, but it does means make it a little less reusable in larger
      pipelines. `_delayed` is currently a private attribute, but we should
      probably expose the scheduler equivalent of "delayed" objects in some way.
      I guess this is a use-case for class-based interface from the previous
      bullet.
    - `Rechunked` now always contains zarr arrays/groups rather than dask arrays.
      This makes the repr a little less informative, e.g., it no longer shows
      chunk size. This should probably be fixed before merging.
    - Will "two stage" copying always suffice? The interface I wrote for
      `staged_copy` supports any number of stages (in theory). That might be useful
      in the future, or it might be unnecessary complexity.
    - To verify that adding a new scheduler is not too painful, I should probably
      write at least a second example. I'll start with a naive "reference"
      scheduler in pure Python (this could go in the docs) and think about adding
      a Beam implementation as well. Beam is perhaps a nice example because it's
      execution models is so different from dask (based on higher level transforms
      like "map" rather than individual tasks).
    shoyer committed Jul 24, 2020
    Configuration menu
    Copy the full SHA
    7aa8308 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    720fe00 View commit details
    Browse the repository at this point in the history
  3. black format

    shoyer committed Jul 24, 2020
    Configuration menu
    Copy the full SHA
    8adf7b1 View commit details
    Browse the repository at this point in the history
  4. make black more verbose

    shoyer committed Jul 24, 2020
    Configuration menu
    Copy the full SHA
    10cd406 View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    16d1fa8 View commit details
    Browse the repository at this point in the history
  6. black format

    shoyer committed Jul 24, 2020
    Configuration menu
    Copy the full SHA
    25167fe View commit details
    Browse the repository at this point in the history

Commits on Jul 25, 2020

  1. Configuration menu
    Copy the full SHA
    b6b4a52 View commit details
    Browse the repository at this point in the history
  2. Fixup docstrings/comments

    shoyer committed Jul 25, 2020
    Configuration menu
    Copy the full SHA
    ff37794 View commit details
    Browse the repository at this point in the history

Commits on Jul 27, 2020

  1. Configuration menu
    Copy the full SHA
    34960c1 View commit details
    Browse the repository at this point in the history
  2. Documentation

    shoyer committed Jul 27, 2020
    Configuration menu
    Copy the full SHA
    24e9ba5 View commit details
    Browse the repository at this point in the history
  3. remove outdated comment

    shoyer committed Jul 27, 2020
    Configuration menu
    Copy the full SHA
    2c9c3bf View commit details
    Browse the repository at this point in the history
  4. update comment

    shoyer committed Jul 27, 2020
    Configuration menu
    Copy the full SHA
    6732366 View commit details
    Browse the repository at this point in the history