Skip to content

Commit 67fcffc

Browse files
committed
Added cluster mode + supervise example to submitting application guide.
1 parent e45453b commit 67fcffc

File tree

1 file changed

+26
-10
lines changed

1 file changed

+26
-10
lines changed

docs/submitting-applications.md

Lines changed: 26 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -43,17 +43,18 @@ Some of the commonly used options are:
4343

4444
* `--class`: The entry point for your application (e.g. `org.apache.spark.examples.SparkPi`)
4545
* `--master`: The [master URL](#master-urls) for the cluster (e.g. `spark://23.195.26.187:7077`)
46-
* `--deploy-mode`: Whether to deploy your driver on the worker nodes (`cluster`) or locally as an external client (`client`) (default: `client`)*
46+
* `--deploy-mode`: Whether to deploy your driver on the worker nodes (`cluster`) or locally as an external client (`client`) (default: `client`) <b> &#8224; </b>
4747
* `--conf`: Arbitrary Spark configuration property in key=value format. For values that contain spaces wrap "key=value" in quotes (as shown).
4848
* `application-jar`: Path to a bundled jar including your application and all dependencies. The URL must be globally visible inside of your cluster, for instance, an `hdfs://` path or a `file://` path that is present on all nodes.
4949
* `application-arguments`: Arguments passed to the main method of your main class, if any
5050

51-
*A common deployment strategy is to submit your application from a gateway machine that is
51+
<b>&#8224;</b> A common deployment strategy is to submit your application from a gateway machine
52+
that is
5253
physically co-located with your worker machines (e.g. Master node in a standalone EC2 cluster).
5354
In this setup, `client` mode is appropriate. In `client` mode, the driver is launched directly
54-
within the client `spark-submit` process, with the input and output of the application attached
55-
to the console. Thus, this mode is especially suitable for applications that involve the REPL
56-
(e.g. Spark shell).
55+
within the `spark-submit` process which acts as a *client* to the cluster. The input and
56+
output of the application is attached to the console. Thus, this mode is especially suitable
57+
for applications that involve the REPL (e.g. Spark shell).
5758

5859
Alternatively, if your application is submitted from a machine far from the worker machines (e.g.
5960
locally on your laptop), it is common to use `cluster` mode to minimize network latency between
@@ -63,8 +64,12 @@ clusters, Mesos clusters, or python applications.
6364
For Python applications, simply pass a `.py` file in the place of `<application-jar>` instead of a JAR,
6465
and add Python `.zip`, `.egg` or `.py` files to the search path with `--py-files`.
6566

66-
To enumerate all options available to `spark-submit` run it with `--help`. Here are a few
67-
examples of common options:
67+
There are a few options available that are specific to the
68+
[cluster manager](#cluster-overview.html#cluster-manager-types) that is being used.
69+
For example, with a [Spark Standalone](#spark-standalone) cluster with `cluster` deploy mode,
70+
you can also specify `--supervise` to make sure that the driver is automatically restarted if it
71+
fails with non-zero exit code. To enumerate all such options available to `spark-submit`,
72+
run it with `--help`. Here are a few examples of common options:
6873

6974
{% highlight bash %}
7075
# Run application locally on 8 cores
@@ -74,7 +79,7 @@ examples of common options:
7479
/path/to/examples.jar \
7580
100
7681

77-
# Run on a Spark standalone cluster
82+
# Run on a Spark Standalone cluster in client deploy mode
7883
./bin/spark-submit \
7984
--class org.apache.spark.examples.SparkPi \
8085
--master spark://207.184.161.138:7077 \
@@ -83,6 +88,17 @@ examples of common options:
8388
/path/to/examples.jar \
8489
1000
8590

91+
# Run on a Spark Standalone cluster in cluster deploy mode with supervise
92+
./bin/spark-submit \
93+
--class org.apache.spark.examples.SparkPi \
94+
--master spark://207.184.161.138:7077 \
95+
--deploy-mode cluster
96+
--supervise
97+
--executor-memory 20G \
98+
--total-executor-cores 100 \
99+
/path/to/examples.jar \
100+
1000
101+
86102
# Run on a YARN cluster
87103
export HADOOP_CONF_DIR=XXX
88104
./bin/spark-submit \
@@ -93,7 +109,7 @@ export HADOOP_CONF_DIR=XXX
93109
/path/to/examples.jar \
94110
1000
95111

96-
# Run a Python application on a cluster
112+
# Run a Python application on a Spark Standalone cluster
97113
./bin/spark-submit \
98114
--master spark://207.184.161.138:7077 \
99115
examples/src/main/python/pi.py \
@@ -163,5 +179,5 @@ to executors.
163179

164180
# More Information
165181

166-
Once you have deployed your application, the [cluster mode overview](cluster-overview.html) describes
182+
Once you have deployed your application, the [cluster mode overview](cluster-overview.html) describes
167183
the components involved in distributed execution, and how to monitor and debug applications.

0 commit comments

Comments
 (0)