This repository was archived by the owner on Jan 9, 2020. It is now read-only.
forked from apache/spark
-
Notifications
You must be signed in to change notification settings - Fork 117
Change the API contract for uploading local files #107
Merged
ash211
merged 5 commits into
k8s-support-alternate-incremental
from
change-add-files-format
Feb 16, 2017
Merged
Changes from all commits
Commits
Show all changes
5 commits
Select commit
Hold shift + click to select a range
9411194
Change the API contract for uploading local jars.
mccheah c916ae2
Merge remote-tracking branch 'apache-spark-on-k8s/k8s-support-alterna…
mccheah 10f84e4
Address comments
mccheah c4e06bb
Merge remote-tracking branch 'apache-spark-on-k8s/k8s-support-alterna…
mccheah 720036c
Fix test
mccheah File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -51,87 +51,15 @@ connect without SSL on a different port, the master would be set to `k8s://http: | |
|
||
Note that applications can currently only be executed in cluster mode, where the driver and its executors are running on | ||
the cluster. | ||
|
||
### Adding Other JARs | ||
|
||
Spark allows users to provide dependencies that are bundled into the driver's Docker image, or that are on the local | ||
disk of the submitter's machine. These two types of dependencies are specified via different configuration options to | ||
`spark-submit`: | ||
|
||
* Local jars provided by specifying the `--jars` command line argument to `spark-submit`, or by setting `spark.jars` in | ||
the application's configuration, will be treated as jars that are located on the *disk of the driver container*. This | ||
only applies to jar paths that do not specify a scheme or that have the scheme `file://`. Paths with other schemes are | ||
fetched from their appropriate locations. | ||
* Local jars provided by specifying the `--upload-jars` command line argument to `spark-submit`, or by setting | ||
`spark.kubernetes.driver.uploads.jars` in the application's configuration, will be treated as jars that are located on | ||
the *disk of the submitting machine*. These jars are uploaded to the driver docker container before executing the | ||
application. | ||
* A main application resource path that does not have a scheme or that has the scheme `file://` is assumed to be on the | ||
*disk of the submitting machine*. This resource is uploaded to the driver docker container before executing the | ||
application. A remote path can still be specified and the resource will be fetched from the appropriate location. | ||
* A main application resource path that has the scheme `container://` is assumed to be on the *disk of the driver | ||
container*. | ||
|
||
In all of these cases, the jars are placed on the driver's classpath, and are also sent to the executors. Below are some | ||
examples of providing application dependencies. | ||
|
||
To submit an application with both the main resource and two other jars living on the submitting user's machine: | ||
|
||
bin/spark-submit \ | ||
--deploy-mode cluster \ | ||
--class com.example.applications.SampleApplication \ | ||
--master k8s://192.168.99.100 \ | ||
--upload-jars /home/exampleuser/exampleapplication/dep1.jar,/home/exampleuser/exampleapplication/dep2.jar \ | ||
--conf spark.kubernetes.driver.docker.image=registry-host:5000/spark-driver:latest \ | ||
--conf spark.kubernetes.executor.docker.image=registry-host:5000/spark-executor:latest \ | ||
/home/exampleuser/exampleapplication/main.jar | ||
|
||
Note that since passing the jars through the `--upload-jars` command line argument is equivalent to setting the | ||
`spark.kubernetes.driver.uploads.jars` Spark property, the above will behave identically to this command: | ||
|
||
bin/spark-submit \ | ||
--deploy-mode cluster \ | ||
--class com.example.applications.SampleApplication \ | ||
--master k8s://192.168.99.100 \ | ||
--conf spark.kubernetes.driver.uploads.jars=/home/exampleuser/exampleapplication/dep1.jar,/home/exampleuser/exampleapplication/dep2.jar \ | ||
--conf spark.kubernetes.driver.docker.image=registry-host:5000/spark-driver:latest \ | ||
--conf spark.kubernetes.executor.docker.image=registry-host:5000/spark-executor:latest \ | ||
/home/exampleuser/exampleapplication/main.jar | ||
|
||
To specify a main application resource that can be downloaded from an HTTP service, and if a plugin for that application | ||
is located in the jar `/opt/spark-plugins/app-plugin.jar` on the docker image's disk: | ||
|
||
bin/spark-submit \ | ||
--deploy-mode cluster \ | ||
--class com.example.applications.PluggableApplication \ | ||
--master k8s://192.168.99.100 \ | ||
--jars /opt/spark-plugins/app-plugin.jar \ | ||
--conf spark.kubernetes.driver.docker.image=registry-host:5000/spark-driver-custom:latest \ | ||
--conf spark.kubernetes.executor.docker.image=registry-host:5000/spark-executor:latest \ | ||
http://example.com:8080/applications/sparkpluggable/app.jar | ||
|
||
Note that since passing the jars through the `--jars` command line argument is equivalent to setting the `spark.jars` | ||
Spark property, the above will behave identically to this command: | ||
|
||
bin/spark-submit \ | ||
--deploy-mode cluster \ | ||
--class com.example.applications.PluggableApplication \ | ||
--master k8s://192.168.99.100 \ | ||
--conf spark.jars=file:///opt/spark-plugins/app-plugin.jar \ | ||
--conf spark.kubernetes.driver.docker.image=registry-host:5000/spark-driver-custom:latest \ | ||
--conf spark.kubernetes.executor.docker.image=registry-host:5000/spark-executor:latest \ | ||
http://example.com:8080/applications/sparkpluggable/app.jar | ||
|
||
To specify a main application resource that is in the Docker image, and if it has no other dependencies: | ||
|
||
bin/spark-submit \ | ||
--deploy-mode cluster \ | ||
--class com.example.applications.PluggableApplication \ | ||
--master k8s://192.168.99.100:8443 \ | ||
--conf spark.kubernetes.driver.docker.image=registry-host:5000/spark-driver-custom:latest \ | ||
--conf spark.kubernetes.executor.docker.image=registry-host:5000/spark-executor:latest \ | ||
container:///home/applications/examples/example.jar | ||
### Dependency Management and Docker Containers | ||
|
||
Spark supports specifying JAR paths that are either on the submitting host's disk, or are located on the disk of the | ||
driver and executors. Refer to the [application submission](submitting-applications.html#advanced-dependency-management) | ||
section for details. Note that files specified with the `local` scheme should be added to the container image of both | ||
the driver and the executors. Files without a scheme or with the scheme `file://` are treated as being on the disk of | ||
the submitting machine, and are uploaded to the driver running in Kubernetes before launching the application. | ||
|
||
### Setting Up SSL For Submitting the Driver | ||
|
||
When submitting to Kubernetes, a pod is started for the driver, and the pod starts an HTTP server. This HTTP server | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. need to update the |
||
|
@@ -146,9 +74,9 @@ pod in starting the application, set `spark.ssl.kubernetes.submit.trustStore`. | |
|
||
One note about the keyStore is that it can be specified as either a file on the client machine or a file in the | ||
container image's disk. Thus `spark.ssl.kubernetes.submit.keyStore` can be a URI with a scheme of either `file:` | ||
or `container:`. A scheme of `file:` corresponds to the keyStore being located on the client machine; it is mounted onto | ||
or `local:`. A scheme of `file:` corresponds to the keyStore being located on the client machine; it is mounted onto | ||
the driver container as a [secret volume](https://kubernetes.io/docs/user-guide/secrets/). When the URI has the scheme | ||
`container:`, the file is assumed to already be on the container's disk at the appropriate path. | ||
`local:`, the file is assumed to already be on the container's disk at the appropriate path. | ||
|
||
### Kubernetes Clusters and the authenticated proxy endpoint | ||
|
||
|
@@ -241,24 +169,6 @@ from the other deployment modes. See the [configuration page](configuration.html | |
executor pods from the API server. | ||
</td> | ||
</tr> | ||
<tr> | ||
<td><code>spark.kubernetes.driver.uploads.jars</code></td> | ||
<td>(none)</td> | ||
<td> | ||
Comma-separated list of jars to send to the driver and all executors when submitting the application in cluster | ||
mode. Refer to <a href="running-on-kubernetes.html#adding-other-jars">adding other jars</a> for more information. | ||
</td> | ||
</tr> | ||
<tr> | ||
<td><code>spark.kubernetes.driver.uploads.files</code></td> | ||
<td>(none)</td> | ||
<td> | ||
Comma-separated list of files to send to the driver and all executors when submitting the application in cluster | ||
mode. The files are added in a flat hierarchy to the current working directory of the driver, having the same | ||
names as the names of the original files. Note that two files with the same name cannot be added, even if they | ||
were in different source directories on the client disk. | ||
</td> | ||
</tr> | ||
<tr> | ||
<td><code>spark.kubernetes.executor.memoryOverhead</code></td> | ||
<td>executorMemory * 0.10, with minimum of 384 </td> | ||
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you get rid of the examples? I think it would still be useful to have examples for:
link to advanced dependency management is great though!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The examples would merely be re-hashing what is already stated in the advanced dependency management system. The additional library jars on executor containers are just files specified with
local://
and those files also have to be on the driver container... which is also covered in the advanced dependency management section. Similarly with added local files.The SSL configuration is interesting however - I don't think the docs have been updated to reflect that.