-
Notifications
You must be signed in to change notification settings - Fork 28.6k
[SPARK-18160][CORE][YARN] spark.files & spark.jars should not be passed to driver in yarn mode #15669
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Test build #67699 has finished for PR 15669 at commit
|
@vanzin @JoshRosen @yanboliang @brkyvz Please help review. |
I thought we had fixed this in 2.0 to make it consistent that anytime --files is specified with yarn it goes through distributed cache. It shouldn't be using the spark addFiles method to distribute. Perhaps that broke? @vanzin do you remember |
Quite strange |
@jerryshao spark.files is always passed to driver so SparkContext.addFile is called in yarn-cluster.
Seems the issue is that spark.files don't need to be passed to driver in yarn-cluster mode. In that case it can be fixed in SparkSubmit.scala, another thing is that I notice some suspicious code in SparkContext.addJar, Is the following code still needed ?
|
From the code what I can see is that OptionAssigner(args.files, LOCAL | STANDALONE | MESOS, ALL_DEPLOY_MODES,
sysProp = "spark.files"), |
spark.files would still be passed to driver even in yarn-cluster if you check the following code.
|
yeah so I think the fix here should be that we don't pass it through or don't handle it in sparkContext. All the yarn stuff should go through the distributed cache. |
I agree with Tom, for yarn cluster mode we should rely on distributed cache |
If I understand the issue here, the problem is not I think the correct fix would be in either |
that's correct, it is due to |
…n yarn-cluster mode" This reverts commit 67a5ccf.
case exc: FileNotFoundException => | ||
logError(s"Jar not found at $path") | ||
null | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are obsoleted code IMO, so I remove them.
Test build #67868 has finished for PR 15669 at commit
|
Test build #67885 has finished for PR 15669 at commit
|
I think this issue also existed in yarn client mode, files will be added twice (once with Spark's file server, another with distributed cache), because both file server and yarn#client could find this file, then there's no exception. But this will be failed in yarn cluster mode since spark file server could not find out file with specified path. |
That's correct, this PR will also fix the yarn-client case. PR title is updated. |
@@ -1202,7 +1202,9 @@ private object Client extends Logging { | |||
// Note that any env variable with the SPARK_ prefix gets propagated to all (remote) processes | |||
System.setProperty("SPARK_YARN_MODE", "true") | |||
val sparkConf = new SparkConf | |||
|
|||
// yarn would use dist cache to distribute files & jars, so remove them from sparkConf |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you update this comment to mention that SparkSubmit uses the YARN cache to distribute these in YARN mode? It looked weird at first that these were just being removed, but the code in SparkSubmit/SparkSubmitArguments takes care of setting the right config.
One minor comment update, otherwise LGTM. |
I'm wondering is it enough to address issue in yarn client mode, since we will have to properties "spark.files" and "spark.yarn.dist.files", the former will be handled by SparkContext and the latter will be handled by yarn#client, while yarn#client is created after SparkContext is launched, so is it too late to remove "spark.conf" in yarn#client, also yarn#client only honors "spark.yarn.dist.files", this will still add files again. |
hmm, noticed spark.files is still passed to SparkContext in yarn-client mode, seems I need to do that in SparkSubmit |
I think it would be better to treat that as a separate issue. spark.files has always worked like this in client mode; the behavior being changed here fixes an actual bug in cluster mode. I'd consider using the cache for spark.files in client mode a separate enhancement. |
Test build #67941 has finished for PR 15669 at commit
|
Merging to master / 2.0. |
…ed to driver in yarn mode ## What changes were proposed in this pull request? spark.files is still passed to driver in yarn mode, so SparkContext will still handle it which cause the error in the jira desc. ## How was this patch tested? Tested manually in a 5 node cluster. As this issue only happens in multiple node cluster, so I didn't write test for it. Author: Jeff Zhang <zjffdu@apache.org> Closes #15669 from zjffdu/SPARK-18160. (cherry picked from commit 3c24299) Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
…ed to driver in yarn mode ## What changes were proposed in this pull request? spark.files is still passed to driver in yarn mode, so SparkContext will still handle it which cause the error in the jira desc. ## How was this patch tested? Tested manually in a 5 node cluster. As this issue only happens in multiple node cluster, so I didn't write test for it. Author: Jeff Zhang <zjffdu@apache.org> Closes #15669 from zjffdu/SPARK-18160. (cherry picked from commit 3c24299) Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
Looks like there's a branch-2.1 now, so I merge into that branch too. |
…ed to driver in yarn mode ## What changes were proposed in this pull request? spark.files is still passed to driver in yarn mode, so SparkContext will still handle it which cause the error in the jira desc. ## How was this patch tested? Tested manually in a 5 node cluster. As this issue only happens in multiple node cluster, so I didn't write test for it. Author: Jeff Zhang <zjffdu@apache.org> Closes apache#15669 from zjffdu/SPARK-18160.
What changes were proposed in this pull request?
spark.files is still passed to driver in yarn mode, so SparkContext will still handle it which cause the error in the jira desc.
How was this patch tested?
Tested manually in a 5 node cluster. As this issue only happens in multiple node cluster, so I didn't write test for it.