Skip to content

SPARK-1680: use configs for specifying environment variables on YARN #1512

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 5 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions docs/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -206,6 +206,14 @@ Apart from these, the following properties are also available, and may be useful
used during aggregation goes above this amount, it will spill the data into disks.
</td>
</tr>
<tr>
<td><code>spark.executorEnv.[EnvironmentVariableName]</code></td>
<td>(none)</td>
<td>
Add the environment variable specified by <code>EnvironmentVariableName</code> to the Executor
process. The user can specify multiple of these and to set multiple environment variables.
</td>
</tr>
</table>

#### Shuffle Behavior
Expand Down
22 changes: 17 additions & 5 deletions docs/running-on-yarn.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,10 +17,6 @@ To build Spark yourself, refer to the [building with Maven guide](building-with-

Most of the configs are the same for Spark on YARN as for other deployment modes. See the [configuration page](configuration.html) for more information on those. These are configs that are specific to Spark on YARN.

#### Environment Variables

* `SPARK_YARN_USER_ENV`, to add environment variables to the Spark processes launched on YARN. This can be a comma separated list of environment variables, e.g. `SPARK_YARN_USER_ENV="JAVA_HOME=/jdk64,FOO=bar"`.

#### Spark Properties

<table class="table">
Expand Down Expand Up @@ -110,7 +106,23 @@ Most of the configs are the same for Spark on YARN as for other deployment modes
<td><code>spark.yarn.access.namenodes</code></td>
<td>(none)</td>
<td>
A list of secure HDFS namenodes your Spark application is going to access. For example, `spark.yarn.access.namenodes=hdfs://nn1.com:8032,hdfs://nn2.com:8032`. The Spark application must have acess to the namenodes listed and Kerberos must be properly configured to be able to access them (either in the same realm or in a trusted realm). Spark acquires security tokens for each of the namenodes so that the Spark application can access those remote HDFS clusters.
A list of secure HDFS namenodes your Spark application is going to access. For
example, `spark.yarn.access.namenodes=hdfs://nn1.com:8032,hdfs://nn2.com:8032`.
The Spark application must have acess to the namenodes listed and Kerberos must
be properly configured to be able to access them (either in the same realm or in
a trusted realm). Spark acquires security tokens for each of the namenodes so that
the Spark application can access those remote HDFS clusters.
</td>
</tr>
<tr>
<td><code>spark.yarn.appMasterEnv.[EnvironmentVariableName]</code></td>
<td>(none)</td>
<td>
Add the environment variable specified by <code>EnvironmentVariableName</code> to the
Application Master process launched on YARN. The user can specify multiple of
these and to set multiple environment variables. In yarn-cluster mode this controls
the environment of the SPARK driver and in yarn-client mode it only controls
the environment of the executor launcher.
</td>
</tr>
</table>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -259,6 +259,14 @@ trait ClientBase extends Logging {
localResources
}

/** Get all application master environment variables set on this SparkConf */
def getAppMasterEnv: Seq[(String, String)] = {
val prefix = "spark.yarn.appMasterEnv."
sparkConf.getAll.filter{case (k, v) => k.startsWith(prefix)}
.map{case (k, v) => (k.substring(prefix.length), v)}
}


def setupLaunchEnv(
localResources: HashMap[String, LocalResource],
stagingDir: String): HashMap[String, String] = {
Expand All @@ -276,6 +284,11 @@ trait ClientBase extends Logging {
distCacheMgr.setDistFilesEnv(env)
distCacheMgr.setDistArchivesEnv(env)

getAppMasterEnv.foreach { case (key, value) =>
YarnSparkHadoopUtil.addToEnvironment(env, key, value, File.pathSeparator)
}

// Keep this for backwards compatibility but users should move to the config
sys.env.get("SPARK_YARN_USER_ENV").foreach { userEnvs =>
// Allow users to specify some environment variables.
YarnSparkHadoopUtil.setEnvFromInputString(env, userEnvs, File.pathSeparator)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -171,7 +171,11 @@ trait ExecutorRunnableUtil extends Logging {
val extraCp = sparkConf.getOption("spark.executor.extraClassPath")
ClientBase.populateClasspath(null, yarnConf, sparkConf, env, extraCp)

// Allow users to specify some environment variables
sparkConf.getExecutorEnv.foreach { case (key, value) =>
YarnSparkHadoopUtil.addToEnvironment(env, key, value, File.pathSeparator)
}

// Keep this for backwards compatibility but users should move to the config
YarnSparkHadoopUtil.setEnvFromInputString(env, System.getenv("SPARK_YARN_USER_ENV"),
File.pathSeparator)

Expand Down