Skip to content

Conversation

AdamGleave
Copy link
Contributor

What do these changes do?

When launching a node, picks a subnet round-robin from the configured subnets (previously: only one subnet allowed). The subnets default to all those matching a specified availability zones (previously: a single subnet matching the single availability zone).

It might be cleaner to push some of this logic up into the root autoscaler, so other node providers can benefit. This looks to be a more substantial change and isn't needed for my use case so I haven't done it, but I'd welcome anyone else who wants to give it a stab :) (Note I expect it to be a little tricky to get right as the semantics of AZs can vary between providers.)

Related issue number

Fixes #2177.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/6063/
Test FAILed.

Copy link
Contributor

@ericl ericl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Could you add an example usage and a comment to the aws-full example yaml?


subnet_ids = [s.subnet_id for s in subnets]
subnet_descr = [(s.subnet_id, s.availability_zone) for s in subnets]
if "SubnetIds" not in config["head_node"]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to keep the map_public_ip_on_launch filter, at least for the head node? I know we don't require it for worker nodes any more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check is already performed on line 149; the later checks I believe are redundant so I removed them.

"Key": k,
"Value": v,
})
subnet_ids = conf.pop('SubnetIds')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: double quotes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

"Key": k,
"Value": v,
})
subnet_ids = conf.pop('SubnetIds')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment here that SubnetIds is not a real config key, and we need to resolve it before handing it to AWS

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed, comment added

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/6074/
Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/6075/
Test PASSed.

@ericl ericl merged commit 3068444 into ray-project:master Jun 20, 2018
@AdamGleave AdamGleave deleted the multiple-azs branch June 20, 2018 04:05
royf added a commit to royf/ray that referenced this pull request Jun 22, 2018
* 'master' of https://github.com/ray-project/ray: (157 commits)
  Fix build failure while using make -j1. Issue 2257 (ray-project#2279)
  Cast locator with index type (ray-project#2274)
  fixing zero length partitions (ray-project#2237)
  Make actor handles work in Python mode. (ray-project#2283)
  [xray] Add error table and push error messages to driver through node manager. (ray-project#2256)
  addressing comments (ray-project#2210)
  Re-enable some actor tests. (ray-project#2276)
  Experimental: enable automatic GCS flushing with configurable policy. (ray-project#2266)
  [xray] Sets good object manager defaults. (ray-project#2255)
  [tune] Update Trainable doc to expose interface (ray-project#2272)
  [rllib] Add a simple REST policy server and client example (ray-project#2232)
  [asv] Pushing to s3 (ray-project#2246)
  [rllib] Remove need to pass around registry (ray-project#2250)
  Support multiple availability zones in AWS (fix ray-project#2177) (ray-project#2254)
  [rllib] Add squash_to_range model option (ray-project#2239)
  Mitigate randomly building failure: adding gen_local_scheduler_fbs to raylet lib. (ray-project#2271)
  [rllib] Refactor Multi-GPU for PPO (ray-project#1646)
  [rllib] Envs for vectorized execution, async execution, and policy serving (ray-project#2170)
  [Dataframe] Change pandas and ray.dataframe imports (ray-project#1942)
  [Java] Replace binary rewrite with Remote Lambda Cache (SerdeLambda) (ray-project#2245)
  ...
@ijrsvt ijrsvt mentioned this pull request Oct 4, 2021
6 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants