Description
As part of the MCAD load test that we performed, we observed a significant difference between how the default scheduler and MCAD schedule workload on the Pods.
This plot shows how MCAD scheduled 150 Pods with low CPU requirement (all the Pods could fit on the available nodes):
The test ran in 14.2 minutes
.
This plot shows the result of the same test, but with Job
resources instead of AppWrappers
.
The test ran in 6.3 minutes
.
Note that in both cases, the Pods ran for 5 minutes
, so the default scheduler scheduling confirms the expectation that all the Pods fit simultaneously on the cluster.
This plot shows a similar result, with 200 Pods requesting each 1 GPU.
There is a total of 200 GPU resources available in the system (2 physical GPUs, each time-sliced into 100 GPU resources).
The test took 21.6 minutes
to run.
This plot shows how the default scheduler performed.
The test took 13.9 minutes
to run.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status