Skip to content

[Perf] MCAD take a significant time to process and schedule AppWrappers #510

Open
@kpouget

Description

@kpouget

As part of the MCAD load test that we performed, we observed a significant difference between how the default scheduler and MCAD schedule workload on the Pods.

This plot shows how MCAD scheduled 150 Pods with low CPU requirement (all the Pods could fit on the available nodes):

image

The test ran in 14.2 minutes.

This plot shows the result of the same test, but with Job resources instead of AppWrappers.

image

The test ran in 6.3 minutes.

Note that in both cases, the Pods ran for 5 minutes, so the default scheduler scheduling confirms the expectation that all the Pods fit simultaneously on the cluster.


image

This plot shows a similar result, with 200 Pods requesting each 1 GPU.
There is a total of 200 GPU resources available in the system (2 physical GPUs, each time-sliced into 100 GPU resources).
The test took 21.6 minutes to run.

image
This plot shows how the default scheduler performed.
The test took 13.9 minutes to run.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions