Skip to content

Support TrainJob ResourcePerNode in CoScheduling plugin #2525

Open
@tenzen-y

Description

@tenzen-y

What you would like to be added?

I would like to consider the TrainJob .spec.trainer.resourcePerNode when calculating total resource requests for PodGroup in coscheduling plugin.

https://github.com/tenzen-y/trainer/blob/b36fe46dd46fcac5754ab5ed4bbc552ac5bb8d5a/pkg/runtime/framework/plugins/coscheduling/coscheduling.go#L122-L132

Why is this needed?

In the current coscheduling implementation, the PodGroup does not work well or causes a deadlock when they specify the .spec.trainer.resourcePerNode in the TrainJob.

Love this feature?

Give it a 👍 We prioritize the features with most 👍

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions