Skip to content

Locks and chunked storage #5083

Closed
Closed
@d-v-b

Description

@d-v-b

dask.array.store takes an optional argument, lock, which (to my understanding) avoids write contention by forcing workers to request access to the write target before writing. But for chunked storage like zarr arrays, write contention happens at the level of individual chunks, not the entire array. So perhaps a lock for chunked writes should have the granularity of the chunk structure of the storage target, thereby allowing the scheduler to control access to individual chunks. Does this make sense?

The context for this question is my attempt to optimize storing multiple enormous dask arrays in zarr containers with a minimum amount of rechunking, which makes the locking mechanism attractive, as long as the lock happens at the right place.

cc @jakirkham in case you have ideas about this.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions