Scheduling based on labels

Maybe scheduling could also work on custom labels. For example, I could run a worker saying that it runs on a machine with `datasets_available` and then I could have a job which would require that it is scheduled only on a machine/worker with `datasets_available` label.

So initially we thought of using a distributed file system mounted on every worker's machine and then have workers access datasets as needed. But that would require scheduling to know if a dataset is already moved to a worker by the distributed file system or not (to schedule jobs close to data, if possible). But I think it could be simpler if only one machine has datasets and then jobs read it as needed and store them into object store and then Ray can transport those objects around as needed. We are working anyway with datasets which can fit into memory, so it seems this is the best approach anyway. And if we get to the phase of larger datasets, locally serializing them from object store to a cache on local drive seems even much better than whole dataset parsing from scratch. (Although, it depends on compression; uncompressed images can be much larger than compressed ones.)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Scheduling based on labels #695

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Scheduling based on labels #695

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions