Skip to content

Commit c29d132

Browse files
jiale_tandongjoon-hyun
andcommitted
[SPARK-47495][CORE] Fix primary resource jar added to spark.jars twice under k8s cluster mode
### What changes were proposed in this pull request? In `SparkSubmit`, for `isKubernetesClusterModeDriver` code path, stop appending primary resource to `spark.jars` to avoid duplicating the primary resource jar in `spark.jars`. ### Why are the changes needed? #### Context: To submit spark jobs to Kubernetes under cluster mode, the spark-submit will be called twice. The first time SparkSubmit will run under k8s cluster mode, it will append primary resource to `spark.jars` and call `KubernetesClientApplication::start` to create a driver pod. The driver pod will run spark-submit again with the updated configurations (with the same application jar but that jar will also be in the `spark.jars`). This time the SparkSubmit will run under client mode with `spark.kubernetes.submitInDriver` as `true`. Under this mode, all the jars in `spark.jars` will be downloaded to driver and jars' urls will be replaced by the driver local paths. Later SparkSubmit will append primary resource to `spark.jars` again. So in this case, `spark.jars` will have 2 paths of duplicate copies of primary resource, one with the original url user submit with, the other with the driver local file path. Later when driver starts the `SparkContext` it will copy all the `spark.jars` to `spark.app.initial.jar.urls`, and replace the driver local jars paths in `spark.app.initial.jar.urls` with driver file service paths, with which the executor can download those driver local jars. #### Issues: The executor will download 2 duplicate copies of primary resource, one with the original url user submit with, the other with the driver local file path, which leads to resource waste. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Unit test added. ### Was this patch authored or co-authored using generative AI tooling? No Closes #45607 from leletan/fix_k8s_submit_jar_distribution. Lead-authored-by: jiale_tan <jiale_tan@apple.com> Co-authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
1 parent b9335b9 commit c29d132

File tree

2 files changed

+21
-1
lines changed

2 files changed

+21
-1
lines changed

core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -732,9 +732,10 @@ private[spark] class SparkSubmit extends Logging {
732732
}
733733

734734
// Add the application jar automatically so the user doesn't have to call sc.addJar
735+
// For isKubernetesClusterModeDriver, the jar is already added in the previous spark-submit
735736
// For YARN cluster mode, the jar is already distributed on each node as "app.jar"
736737
// For python and R files, the primary resource is already distributed as a regular file
737-
if (!isYarnCluster && !args.isPython && !args.isR) {
738+
if (!isKubernetesClusterModeDriver && !isYarnCluster && !args.isPython && !args.isR) {
738739
var jars = sparkConf.get(JARS)
739740
if (isUserJar(args.primaryResource)) {
740741
jars = jars ++ Seq(args.primaryResource)

core/src/test/scala/org/apache/spark/deploy/SparkSubmitSuite.scala

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -504,6 +504,25 @@ class SparkSubmitSuite
504504
}
505505
}
506506

507+
test("SPARK-47495: Not to add primary resource to jars again" +
508+
" in k8s client mode & driver runs inside a POD") {
509+
val clArgs = Seq(
510+
"--deploy-mode", "client",
511+
"--proxy-user", "test.user",
512+
"--master", "k8s://host:port",
513+
"--executor-memory", "1g",
514+
"--class", "org.SomeClass",
515+
"--driver-memory", "1g",
516+
"--conf", "spark.kubernetes.submitInDriver=true",
517+
"--jars", "src/test/resources/TestUDTF.jar",
518+
"/home/jarToIgnore.jar",
519+
"arg1")
520+
val appArgs = new SparkSubmitArguments(clArgs)
521+
val (_, _, sparkConf, _) = submit.prepareSubmitEnvironment(appArgs)
522+
sparkConf.get("spark.jars").contains("jarToIgnore") shouldBe false
523+
sparkConf.get("spark.jars").contains("TestUDTF") shouldBe true
524+
}
525+
507526
test("SPARK-33782: handles k8s files download to current directory") {
508527
val clArgs = Seq(
509528
"--deploy-mode", "client",

0 commit comments

Comments
 (0)