Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-6193] [EC2] Push group filter up to EC2 #4922

Closed
wants to merge 3 commits into from

Conversation

nchammas
Copy link
Contributor

@nchammas nchammas commented Mar 6, 2015

When looking for a cluster, spark-ec2 currently pulls down info for all instances and filters locally. When working on an AWS account with hundreds of active instances, this step alone can take over 10 seconds.

This PR improves how spark-ec2 searches for clusters by pushing the filter up to EC2.

Basically, the problem (and solution) look like this:

>>> timeit.timeit('blah = conn.get_all_reservations()', setup='from __main__ import conn', number=10)
116.96390509605408
>>> timeit.timeit('blah = conn.get_all_reservations(filters={"instance.group-name": ["my-cluster-master"]})', setup='from __main__ import conn', number=10)
4.629754066467285

Translated to a user-visible action, this looks like (against an AWS account with ~200 active instances):

# master
$ python -m timeit -n 3 --setup 'import subprocess' 'subprocess.call("./spark-ec2 get-master my-cluster --region us-west-2", shell=True)'
...
3 loops, best of 3: 9.83 sec per loop

# this PR
$ python -m timeit -n 3 --setup 'import subprocess' 'subprocess.call("./spark-ec2 get-master my-cluster --region us-west-2", shell=True)'
...
3 loops, best of 3: 1.47 sec per loop

This PR also refactors get_existing_cluster() to make it, I hope, simpler.

Finally, this PR fixes some minor grammar issues related to printing status to the user. 🎩 👏

@SparkQA
Copy link

SparkQA commented Mar 6, 2015

Test build #28321 has started for PR 4922 at commit f2a5b9f.

  • This patch merges cleanly.

@nchammas
Copy link
Contributor Author

nchammas commented Mar 6, 2015

cc @shivaram

@SparkQA
Copy link

SparkQA commented Mar 6, 2015

Test build #28321 has finished for PR 4922 at commit f2a5b9f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28321/
Test PASSed.

reservations = conn.get_all_reservations(
filters={"instance.group-name": group_names})
instances = itertools.chain.from_iterable(r.instances for r in reservations)
return [i for i in instances if i.state != "terminated"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comment: this seems a little different from the old check we had. We were originally checking if it was one of 'pending', 'running', 'stopping', 'stopped' while right now we check if its not terminated. There are other states as shown in http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-instance-lifecycle.html but I don't think it should matter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The doc referenced in the comment notes these instance states:

0 : pending
16 : running
32 : shutting-down
48 : terminated
64 : stopping
80 : stopped

So it looks like the only state we're missing is shutting-down, which is the intermediate state right before terminated. I can add that in to be consistent with the previous behavior.

Alternately, we can leave it as-is and re-terminate instances even if they are shutting-down. You know, zombies and stuff. 👹 🔫

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it doesn't look like shutting-down is a major issue, but it might just be safer to keep existing behavior

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, fixified.

@shivaram
Copy link
Contributor

shivaram commented Mar 6, 2015

This is a nice change @nchammas -- Code looks good to me, but I'd just like to try it out once on my machine.

@shivaram
Copy link
Contributor

shivaram commented Mar 6, 2015

Just tried this out and it works fine. LGTM pending the minor inline comment

@SparkQA
Copy link

SparkQA commented Mar 6, 2015

Test build #28353 has started for PR 4922 at commit 18802f1.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Mar 6, 2015

Test build #28353 has finished for PR 4922 at commit 18802f1.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28353/
Test PASSed.

@nchammas
Copy link
Contributor Author

nchammas commented Mar 7, 2015

Inline comment addressed.

@srowen
Copy link
Member

srowen commented Mar 8, 2015

Sounds like a good improvement, changes look OK to my mildly informed eyes, and you have both reviewed and tested the change. LGTM.

@asfgit asfgit closed this in 52ed7da Mar 8, 2015
@nchammas nchammas deleted the get-existing-cluster-faster branch March 8, 2015 23:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants