You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/submitting-applications.md
+26-10Lines changed: 26 additions & 10 deletions
Original file line number
Diff line number
Diff line change
@@ -43,17 +43,18 @@ Some of the commonly used options are:
43
43
44
44
*`--class`: The entry point for your application (e.g. `org.apache.spark.examples.SparkPi`)
45
45
*`--master`: The [master URL](#master-urls) for the cluster (e.g. `spark://23.195.26.187:7077`)
46
-
*`--deploy-mode`: Whether to deploy your driver on the worker nodes (`cluster`) or locally as an external client (`client`) (default: `client`)*
46
+
*`--deploy-mode`: Whether to deploy your driver on the worker nodes (`cluster`) or locally as an external client (`client`) (default: `client`) <b> † </b>
47
47
*`--conf`: Arbitrary Spark configuration property in key=value format. For values that contain spaces wrap "key=value" in quotes (as shown).
48
48
*`application-jar`: Path to a bundled jar including your application and all dependencies. The URL must be globally visible inside of your cluster, for instance, an `hdfs://` path or a `file://` path that is present on all nodes.
49
49
*`application-arguments`: Arguments passed to the main method of your main class, if any
50
50
51
-
*A common deployment strategy is to submit your application from a gateway machine that is
51
+
<b>†</b> A common deployment strategy is to submit your application from a gateway machine
52
+
that is
52
53
physically co-located with your worker machines (e.g. Master node in a standalone EC2 cluster).
53
54
In this setup, `client` mode is appropriate. In `client` mode, the driver is launched directly
54
-
within the client `spark-submit` process, with the input and output of the application attached
55
-
to the console. Thus, this mode is especially suitable for applications that involve the REPL
56
-
(e.g. Spark shell).
55
+
within the `spark-submit` process which acts as a *client* to the cluster. The input and
56
+
output of the application is attached to the console. Thus, this mode is especially suitable
57
+
for applications that involve the REPL (e.g. Spark shell).
57
58
58
59
Alternatively, if your application is submitted from a machine far from the worker machines (e.g.
59
60
locally on your laptop), it is common to use `cluster` mode to minimize network latency between
@@ -63,8 +64,12 @@ clusters, Mesos clusters, or python applications.
63
64
For Python applications, simply pass a `.py` file in the place of `<application-jar>` instead of a JAR,
64
65
and add Python `.zip`, `.egg` or `.py` files to the search path with `--py-files`.
65
66
66
-
To enumerate all options available to `spark-submit` run it with `--help`. Here are a few
67
-
examples of common options:
67
+
There are a few options available that are specific to the
68
+
[cluster manager](#cluster-overview.html#cluster-manager-types) that is being used.
69
+
For example, with a [Spark Standalone](#spark-standalone) cluster with `cluster` deploy mode,
70
+
you can also specify `--supervise` to make sure that the driver is automatically restarted if it
71
+
fails with non-zero exit code. To enumerate all such options available to `spark-submit`,
72
+
run it with `--help`. Here are a few examples of common options:
68
73
69
74
{% highlight bash %}
70
75
# Run application locally on 8 cores
@@ -74,7 +79,7 @@ examples of common options:
74
79
/path/to/examples.jar \
75
80
100
76
81
77
-
# Run on a Spark standalone cluster
82
+
# Run on a Spark Standalone cluster in client deploy mode
78
83
./bin/spark-submit \
79
84
--class org.apache.spark.examples.SparkPi \
80
85
--master spark://207.184.161.138:7077 \
@@ -83,6 +88,17 @@ examples of common options:
83
88
/path/to/examples.jar \
84
89
1000
85
90
91
+
# Run on a Spark Standalone cluster in cluster deploy mode with supervise
92
+
./bin/spark-submit \
93
+
--class org.apache.spark.examples.SparkPi \
94
+
--master spark://207.184.161.138:7077 \
95
+
--deploy-mode cluster
96
+
--supervise
97
+
--executor-memory 20G \
98
+
--total-executor-cores 100 \
99
+
/path/to/examples.jar \
100
+
1000
101
+
86
102
# Run on a YARN cluster
87
103
export HADOOP_CONF_DIR=XXX
88
104
./bin/spark-submit \
@@ -93,7 +109,7 @@ export HADOOP_CONF_DIR=XXX
93
109
/path/to/examples.jar \
94
110
1000
95
111
96
-
# Run a Python application on a cluster
112
+
# Run a Python application on a Spark Standalone cluster
97
113
./bin/spark-submit \
98
114
--master spark://207.184.161.138:7077 \
99
115
examples/src/main/python/pi.py \
@@ -163,5 +179,5 @@ to executors.
163
179
164
180
# More Information
165
181
166
-
Once you have deployed your application, the [cluster mode overview](cluster-overview.html) describes
182
+
Once you have deployed your application, the [cluster mode overview](cluster-overview.html) describes
167
183
the components involved in distributed execution, and how to monitor and debug applications.
0 commit comments