Skip to content

Scheduler.total_occupancy is significant runtime cost #7256

Closed
@gjoseph92

Description

@gjoseph92

Profiling the scheduler during a benchmark workload on a 128-worker (2cpu each) cluster, I noticed total_occupancy taking 32% of total scheduler runtime!

Profile (go to left-heavy view):

Subjectively, the dashboard was also extremely laggy. (This is real time, not an artifact of the screen recording or network delay. Go to the end to see the dashboard becomes responsive once most tasks are done.)

amon-mean-128-no-queue-pyspy.mp4

cc @hendrikmakait @fjetter

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions