Commit c29d132
[SPARK-47495][CORE] Fix primary resource jar added to spark.jars twice under k8s cluster mode
### What changes were proposed in this pull request?
In `SparkSubmit`, for `isKubernetesClusterModeDriver` code path, stop appending primary resource to `spark.jars` to avoid duplicating the primary resource jar in `spark.jars`.
### Why are the changes needed?
#### Context:
To submit spark jobs to Kubernetes under cluster mode, the spark-submit will be called twice. The first time SparkSubmit will run under k8s cluster mode, it will append primary resource to `spark.jars` and call `KubernetesClientApplication::start` to create a driver pod. The driver pod will run spark-submit again with the updated configurations (with the same application jar but that jar will also be in the `spark.jars`). This time the SparkSubmit will run under client mode with `spark.kubernetes.submitInDriver` as `true`. Under this mode, all the jars in `spark.jars` will be downloaded to driver and jars' urls will be replaced by the driver local paths. Later SparkSubmit will append primary resource to `spark.jars` again. So in this case, `spark.jars` will have 2 paths of duplicate copies of primary resource, one with the original url user submit with, the other with the driver local file path. Later when driver starts the `SparkContext` it will copy all the `spark.jars` to `spark.app.initial.jar.urls`, and replace the driver local jars paths in `spark.app.initial.jar.urls` with driver file service paths, with which the executor can download those driver local jars.
#### Issues:
The executor will download 2 duplicate copies of primary resource, one with the original url user submit with, the other with the driver local file path, which leads to resource waste.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Unit test added.
### Was this patch authored or co-authored using generative AI tooling?
No
Closes #45607 from leletan/fix_k8s_submit_jar_distribution.
Lead-authored-by: jiale_tan <jiale_tan@apple.com>
Co-authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>1 parent b9335b9 commit c29d132
File tree
2 files changed
+21
-1
lines changed- core/src
- main/scala/org/apache/spark/deploy
- test/scala/org/apache/spark/deploy
2 files changed
+21
-1
lines changedLines changed: 2 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
732 | 732 | | |
733 | 733 | | |
734 | 734 | | |
| 735 | + | |
735 | 736 | | |
736 | 737 | | |
737 | | - | |
| 738 | + | |
738 | 739 | | |
739 | 740 | | |
740 | 741 | | |
| |||
Lines changed: 19 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
504 | 504 | | |
505 | 505 | | |
506 | 506 | | |
| 507 | + | |
| 508 | + | |
| 509 | + | |
| 510 | + | |
| 511 | + | |
| 512 | + | |
| 513 | + | |
| 514 | + | |
| 515 | + | |
| 516 | + | |
| 517 | + | |
| 518 | + | |
| 519 | + | |
| 520 | + | |
| 521 | + | |
| 522 | + | |
| 523 | + | |
| 524 | + | |
| 525 | + | |
507 | 526 | | |
508 | 527 | | |
509 | 528 | | |
| |||
0 commit comments