Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[xray] raylet scheduling mechanism with a simple spillback policy #2749

Merged
merged 44 commits into from
Aug 28, 2018

Conversation

atumanov
Copy link
Contributor

What do these changes do?

  • distribute load and resource information on a heartbeat
  • for each raylet, maintain total and available resource capacity as well as measure of current load
  • this PR introduces a new notion of load, defined as a sum of all resource demand induced by queued ready tasks on the local raylet. This provides a heterogeneity-aware measure of load that supersedes legacy Ray's task count as a proxy for load.
  • modify the scheduling policy to perform capacity-based, load-aware, optimistically concurrent resource allocation
  • perform task spillover to the heartbeating node in response to a heartbeat, implementing heterogeneity-aware late-binding/work-stealing.

atumanov and others added 30 commits August 25, 2018 14:53
@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/7787/
Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/7790/
Test FAILed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/7789/
Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/7791/
Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/7792/
Test PASSed.

/// resource labels to this set.
///
/// \param other: The other resource set to add.
void AddResources(const ResourceSet &other);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

\return Void.

/// information for some subset of the cluster. For all client IDs in the returned
/// placement map, the corresponding SchedulingResources::resources_load_ is
/// incremented by the aggregate resource demand of the tasks assigned to it.
///
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there shouldn't be an extra /// here

const std::unordered_map<ClientID, SchedulingResources> &cluster_resources,
const ClientID &local_client_id, const std::vector<ClientID> &others);
std::unordered_map<ClientID, SchedulingResources> &cluster_resources,
const ClientID &local_client_id);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

document local_client_id

std::unordered_map<ClientID, SchedulingResources> local_resource_map(
{{local_client_id, cluster_resource_map_[local_client_id]}});
// Invoke the scheduling policy only on local resources.
// ScheduleTasks(local_resource_map);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this and the comment

@robertnishihara
Copy link
Collaborator

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/7806/
Test PASSed.

@robertnishihara robertnishihara force-pushed the raylet-scheduling-simple branch from bfa4406 to aec71d5 Compare August 28, 2018 04:52
@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/7812/
Test FAILed.

@atumanov
Copy link
Contributor Author

Jenkins retest this please

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/7814/
Test PASSed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/7816/
Test FAILed.

@robertnishihara robertnishihara merged commit de047da into ray-project:master Aug 28, 2018
@robertnishihara robertnishihara deleted the raylet-scheduling-simple branch August 28, 2018 07:03
@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/7818/
Test PASSed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants