Skip to content
This repository was archived by the owner on Jun 6, 2024. It is now read-only.
This repository was archived by the owner on Jun 6, 2024. It is now read-only.

Change to use Hash(podUid, portName, portIndex) to calculate the port number, avoid always retry #4384

@Binyang2014

Description

@Binyang2014

Currently, task port number is assigned by rest-server.

In some situation, the port number will conflict with other job.
Then it will cause job retry util conflict resolved. (since the retried job may be scheduled to the same node, port number may conflict again)

Current solution:
Runtime/Restserver use Hash(podUid, portName, portIndex) to generate port number, podUid as the seed. For distributed job, since one task can get other tasks' podUid, portName and portIndex from framework, it can calculate other tasks' port number independently.

If port number conflict, job failed. New pods are created with different UID. Then calculate new port number.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions