Commit 02ae154
[SPARK-52334][CORE][K8S] update all files, jars, and pyFiles to reference the working directory after they are downloaded
### What changes were proposed in this pull request?
This PR fixes a bug where submitting a Spark job using the --files option and also calling SparkContext.addFile() for a file with the same name causes Spark to throw an exception
`Exception in thread "main" java.lang.IllegalArgumentException: requirement failed: File a.text was already registered with a different path (old path = /tmp/spark-6aa5129d-5bbb-464a-9e50-5b6ffe364ffb/a.text, new path = /opt/spark/work-dir/a.text`
### Why are the changes needed?
1. Submit a Spark application using spark-submit with the --files option:
`bin/spark-submit --files s3://bucket/a.text --class testDemo app.jar `
2. In the testDemo application code, call:
`sc.addFile("a.text", true)`
This works correctly in YARN mode, but throws an error in Kubernetes mode.
After [SPARK-33782](https://issues.apache.org/jira/browse/SPARK-33782), in Kubernetes mode, --files, --jars, --archiveFiles, and --pyFiles are all downloaded to the working directory.
However, in the code, args.files = filesLocalFiles, and filesLocalFiles refers to a temporary download path, not the working directory.
This causes issues when user code like testDemo calls sc.addFile("a.text", true), resulting in an error such as:
`Exception in thread "main" java.lang.IllegalArgumentException: requirement failed: File a.text was already registered with a different path (old path = /tmp/spark-6aa5129d-5bbb-464a-9e50-5b6ffe364ffb/a.text, new path = /opt/spark/work-dir/a.text`
### Does this PR introduce _any_ user-facing change?
This issue can be resolved after this PR.
### How was this patch tested?
Existed UT
### Was this patch authored or co-authored using generative AI tooling?
no
Closes #51037 from TongWei1105/SPARK-52334.
Authored-by: TongWei1105 <vvtwow@gmail.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit dd418e3)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>1 parent cbbe7d4 commit 02ae154
File tree
2 files changed
+46
-2
lines changed- core/src
- main/scala/org/apache/spark/deploy
- test/scala/org/apache/spark/deploy
2 files changed
+46
-2
lines changedLines changed: 4 additions & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
448 | 448 | | |
449 | 449 | | |
450 | 450 | | |
451 | | - | |
| 451 | + | |
452 | 452 | | |
| 453 | + | |
453 | 454 | | |
454 | 455 | | |
| 456 | + | |
455 | 457 | | |
456 | 458 | | |
457 | 459 | | |
458 | | - | |
| 460 | + | |
459 | 461 | | |
460 | 462 | | |
461 | 463 | | |
| |||
Lines changed: 42 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1845 | 1845 | | |
1846 | 1846 | | |
1847 | 1847 | | |
| 1848 | + | |
| 1849 | + | |
| 1850 | + | |
| 1851 | + | |
| 1852 | + | |
| 1853 | + | |
| 1854 | + | |
| 1855 | + | |
| 1856 | + | |
| 1857 | + | |
| 1858 | + | |
| 1859 | + | |
| 1860 | + | |
| 1861 | + | |
| 1862 | + | |
| 1863 | + | |
| 1864 | + | |
| 1865 | + | |
| 1866 | + | |
| 1867 | + | |
| 1868 | + | |
| 1869 | + | |
| 1870 | + | |
| 1871 | + | |
| 1872 | + | |
| 1873 | + | |
| 1874 | + | |
| 1875 | + | |
| 1876 | + | |
| 1877 | + | |
| 1878 | + | |
| 1879 | + | |
| 1880 | + | |
| 1881 | + | |
| 1882 | + | |
| 1883 | + | |
| 1884 | + | |
| 1885 | + | |
| 1886 | + | |
| 1887 | + | |
| 1888 | + | |
| 1889 | + | |
1848 | 1890 | | |
1849 | 1891 | | |
1850 | 1892 | | |
| |||
0 commit comments