Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-6191] [EC2] Generalize ability to download libs #4919

Closed
wants to merge 4 commits into from

Conversation

nchammas
Copy link
Contributor

@nchammas nchammas commented Mar 5, 2015

Right now we have a method to specifically download boto. This PR generalizes it so it's easy to download additional libraries if we want.

For example, adding new external libraries for spark-ec2 is now as simple as:

external_libs = [
    {
         "name": "boto",
         "version": "2.34.0",
         "md5": "5556223d2d0cc4d06dd4829e671dcecd"
    },
    {
        "name": "PyYAML",
        "version": "3.11",
        "md5": "f50e08ef0fe55178479d3a618efe21db"
    },
    {
        "name": "argparse",
        "version": "1.3.0",
        "md5": "9bcf7f612190885c8c85e30ba41db3c7"
    }
]

Likely use cases:

  • Downloading PyYAML to allow spark-ec2 configs to be persisted as a YAML file. (SPARK-925)
  • Downloading argparse to clean up / modernize our option parsing.

First run output, with PyYAML and argparse added just for demonstration purposes:

$ ./spark-ec2 --version
Downloading external libraries that spark-ec2 needs from PyPI to /path/to/spark/ec2/lib...
This should be a one-time operation.
 - Downloading boto...
 - Finished downloading boto.
 - Downloading PyYAML...
 - Finished downloading PyYAML.
 - Downloading argparse...
 - Finished downloading argparse.
spark-ec2 1.2.1

Output thereafter:

$ ./spark-ec2 --version
spark-ec2 1.2.1

@SparkQA
Copy link

SparkQA commented Mar 5, 2015

Test build #28309 has started for PR 4919 at commit 8eb9069.

  • This patch merges cleanly.

@nchammas nchammas changed the title [SPARK-6191] Generalize ability to download libs [SPARK-6191] [EC2] Generalize ability to download libs Mar 5, 2015
@SparkQA
Copy link

SparkQA commented Mar 5, 2015

Test build #28310 has started for PR 4919 at commit 60d8c23.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Mar 5, 2015

Test build #28311 has started for PR 4919 at commit 5448845.

  • This patch merges cleanly.

@nchammas
Copy link
Contributor Author

nchammas commented Mar 5, 2015

cc @JoshRosen

@SparkQA
Copy link

SparkQA commented Mar 5, 2015

Test build #28309 has finished for PR 4919 at commit 8eb9069.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28309/
Test FAILed.

@SparkQA
Copy link

SparkQA commented Mar 5, 2015

Test build #28310 has finished for PR 4919 at commit 60d8c23.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28310/
Test FAILed.

@SparkQA
Copy link

SparkQA commented Mar 5, 2015

Test build #28311 has finished for PR 4919 at commit 5448845.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28311/
Test FAILed.

@SparkQA
Copy link

SparkQA commented Mar 6, 2015

Test build #28322 has started for PR 4919 at commit c95fb7d.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Mar 6, 2015

Test build #28322 has finished for PR 4919 at commit c95fb7d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28322/
Test PASSed.

@SparkQA
Copy link

SparkQA commented Mar 6, 2015

Test build #28343 has started for PR 4919 at commit a077955.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Mar 6, 2015

Test build #28343 has finished for PR 4919 at commit a077955.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28343/
Test PASSed.

@srowen
Copy link
Member

srowen commented Mar 9, 2015

Obviously I'd like to get another actual active EC2 user to review this, but the principle looks fine. this is refactoring the boto-specific mechanism to be general and at the moment does not change behavior.

@nchammas
Copy link
Contributor Author

nchammas commented Mar 9, 2015

Yeah, if @JoshRosen (who wrote the original setup_boto() function) can't take a look, maybe @shivaram can give this a look.

@JoshRosen
Copy link
Contributor

This seems fine to me. I guess the alternatives would be

  1. storing the libraries in our source tree, which is a bad option for several reasons, including licensing, file size, upgradability, etc.
  2. requiring the users to install the libraries themselves using a pip requirements file, but that adds another dependency on pip

I think that this is fine for now. As part of our binary release packaging scripts, we could download and include these archives so that only users who build from source will need to perform these downloads.

@asfgit asfgit closed this in d14df06 Mar 10, 2015
@nchammas nchammas deleted the setup-ec2-libs branch March 10, 2015 22:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants