Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Auto-scale ray clusters based on GCS load metrics #1348

Merged
merged 36 commits into from
Dec 31, 2017
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
a8eb626
wip gc metrics autoscaling
ericl Dec 17, 2017
4c09b68
add load metrics debug string
ericl Dec 17, 2017
c6a9c2c
add ray ip
ericl Dec 17, 2017
cc1d722
wire it end to end
ericl Dec 17, 2017
6732223
wip dev
ericl Dec 17, 2017
419ca59
fix bug
ericl Dec 17, 2017
a8240b8
update
ericl Dec 17, 2017
c49a35a
Sun Dec 17 14:19:03 PST 2017
ericl Dec 17, 2017
9a69baf
wip
ericl Dec 17, 2017
03f47de
wip
ericl Dec 19, 2017
97dfa7f
add update throttling; reorg init commands
ericl Dec 21, 2017
3ccb5fc
Wed Dec 20 23:44:56 PST 2017
ericl Dec 21, 2017
d805f90
Wed Dec 20 23:51:37 PST 2017
ericl Dec 21, 2017
13436dd
Wed Dec 20 23:58:18 PST 2017
ericl Dec 21, 2017
f6d96d1
Thu Dec 21 00:03:19 PST 2017
ericl Dec 21, 2017
7fe7ec7
Thu Dec 21 00:04:33 PST 2017
ericl Dec 21, 2017
60a487b
Thu Dec 21 00:05:32 PST 2017
ericl Dec 21, 2017
68df74a
Thu Dec 21 00:15:11 PST 2017
ericl Dec 21, 2017
ead1984
Thu Dec 21 00:30:54 PST 2017
ericl Dec 21, 2017
49433bf
Thu Dec 21 00:39:34 PST 2017
ericl Dec 21, 2017
ef435c9
Thu Dec 21 01:16:11 PST 2017
ericl Dec 21, 2017
c2d9efc
Thu Dec 21 01:25:24 PST 2017
ericl Dec 21, 2017
2f977a4
update
ericl Dec 21, 2017
b816480
update example
ericl Dec 21, 2017
4d4d9d1
Merge remote-tracking branch 'upstream/master' into load-metrics
ericl Dec 25, 2017
006ef46
fix tests
ericl Dec 26, 2017
f40253d
unit tests
ericl Dec 26, 2017
40ddf3c
Mon Dec 25 16:39:56 PST 2017
ericl Dec 26, 2017
c88df7d
fi xlint
ericl Dec 26, 2017
d9e7df6
fix np ceil
ericl Dec 27, 2017
4280e53
Fix path for development-example.yaml
robertnishihara Dec 28, 2017
1969bcf
Remove unnecessary line.
robertnishihara Dec 29, 2017
bef8e29
fix idempotent
ericl Dec 30, 2017
4093407
Update autoscaler.py
ericl Dec 30, 2017
b5df753
fix nondterministic test
ericl Dec 30, 2017
3237986
Update autoscaler.py
ericl Dec 31, 2017
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
update
  • Loading branch information
ericl committed Dec 17, 2017
commit a8240b8fbf0d1669dcd4e5416c52562cf9212919
2 changes: 2 additions & 0 deletions python/ray/autoscaler/autoscaler.py
Original file line number Diff line number Diff line change
Expand Up @@ -87,6 +87,7 @@ def __init__(self):
self.last_used_time_by_ip = {}
self.static_resources_by_ip = {}
self.dynamic_resources_by_ip = {}
self.local_ip = services.get_node_ip_address()

def update(self, ip, static_resources, dynamic_resources):
self.static_resources_by_ip[ip] = static_resources
Expand All @@ -97,6 +98,7 @@ def update(self, ip, static_resources, dynamic_resources):

def prune_active_ips(self, active_ips):
active_ips = set(active_ips)
active_ips.add(self.local_ip)

def prune(mapping):
unwanted = set(mapping) - active_ips
Expand Down
2 changes: 1 addition & 1 deletion python/ray/autoscaler/aws/example.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,7 @@ head_init_commands:
# - pip3 install --user -U https://s3-us-west-2.amazonaws.com/ray-wheels/f5ea44338eca392df3a868035df3901829cc2ca1/ray-0.3.0-cp35-cp35m-manylinux1_x86_64.whl
# - pip3 install --user cython
# - git clone https://github.com/ericl/ray.git || true
- cd ray && git fetch && git checkout db01d3e0
- cd ray && git fetch && git checkout 419ca59b
# - cd ray/python && python setup.py develop --user
- yes | ~/anaconda3/bin/conda install boto3=1.4.8 # 1.4.8 adds InstanceMarketOptions
- ray stop
Expand Down
16 changes: 14 additions & 2 deletions python/ray/autoscaler/aws/node_provider.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,8 @@ class AWSNodeProvider(NodeProvider):
def __init__(self, provider_config, cluster_name):
NodeProvider.__init__(self, provider_config, cluster_name)
self.ec2 = boto3.resource("ec2", region_name=provider_config["region"])
self.internal_ip_cache = {}
self.external_ip_cache = {}

def nodes(self, tag_filters):
filters = [
Expand Down Expand Up @@ -49,12 +51,22 @@ def node_tags(self, node_id):
return tags

def external_ip(self, node_id):
if node_id in self.external_ip_cache:
return self.external_ip_cache[node_id]
node = self._node(node_id)
return node.public_ip_address
ip = node.public_ip_address
if ip:
self.external_ip_cache[node_id] = ip
return ip

def internal_ip(self, node_id):
if node_id in self.internal_ip_cache:
return self.internal_ip_cache[node_id]
node = self._node(node_id)
return node.private_ip_address
ip = node.private_ip_address
if ip:
self.internal_ip_cache[node_id] = ip
return ip

def set_node_tags(self, node_id, tags):
node = self._node(node_id)
Expand Down