-
Notifications
You must be signed in to change notification settings - Fork 6.9k
Support multiple availability zones in AWS (fix #2177) #2254
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Test FAILed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Could you add an example usage and a comment to the aws-full example yaml?
|
||
subnet_ids = [s.subnet_id for s in subnets] | ||
subnet_descr = [(s.subnet_id, s.availability_zone) for s in subnets] | ||
if "SubnetIds" not in config["head_node"]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it make sense to keep the map_public_ip_on_launch filter, at least for the head node? I know we don't require it for worker nodes any more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This check is already performed on line 149; the later checks I believe are redundant so I removed them.
"Key": k, | ||
"Value": v, | ||
}) | ||
subnet_ids = conf.pop('SubnetIds') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: double quotes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
"Key": k, | ||
"Value": v, | ||
}) | ||
subnet_ids = conf.pop('SubnetIds') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comment here that SubnetIds is not a real config key, and we need to resolve it before handing it to AWS
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed, comment added
Test PASSed. |
Test PASSed. |
* 'master' of https://github.com/ray-project/ray: (157 commits) Fix build failure while using make -j1. Issue 2257 (ray-project#2279) Cast locator with index type (ray-project#2274) fixing zero length partitions (ray-project#2237) Make actor handles work in Python mode. (ray-project#2283) [xray] Add error table and push error messages to driver through node manager. (ray-project#2256) addressing comments (ray-project#2210) Re-enable some actor tests. (ray-project#2276) Experimental: enable automatic GCS flushing with configurable policy. (ray-project#2266) [xray] Sets good object manager defaults. (ray-project#2255) [tune] Update Trainable doc to expose interface (ray-project#2272) [rllib] Add a simple REST policy server and client example (ray-project#2232) [asv] Pushing to s3 (ray-project#2246) [rllib] Remove need to pass around registry (ray-project#2250) Support multiple availability zones in AWS (fix ray-project#2177) (ray-project#2254) [rllib] Add squash_to_range model option (ray-project#2239) Mitigate randomly building failure: adding gen_local_scheduler_fbs to raylet lib. (ray-project#2271) [rllib] Refactor Multi-GPU for PPO (ray-project#1646) [rllib] Envs for vectorized execution, async execution, and policy serving (ray-project#2170) [Dataframe] Change pandas and ray.dataframe imports (ray-project#1942) [Java] Replace binary rewrite with Remote Lambda Cache (SerdeLambda) (ray-project#2245) ...
What do these changes do?
When launching a node, picks a subnet round-robin from the configured subnets (previously: only one subnet allowed). The subnets default to all those matching a specified availability zones (previously: a single subnet matching the single availability zone).
It might be cleaner to push some of this logic up into the root autoscaler, so other node providers can benefit. This looks to be a more substantial change and isn't needed for my use case so I haven't done it, but I'd welcome anyone else who wants to give it a stab :) (Note I expect it to be a little tricky to get right as the semantics of AZs can vary between providers.)
Related issue number
Fixes #2177.