Open
Description
What you would like to be added?
I would like to consider the TrainJob .spec.trainer.resourcePerNode
when calculating total resource requests for PodGroup in coscheduling plugin.
Why is this needed?
In the current coscheduling implementation, the PodGroup does not work well or causes a deadlock when they specify the .spec.trainer.resourcePerNode
in the TrainJob.
Love this feature?
Give it a 👍 We prioritize the features with most 👍