Skip to content

[SPARK-18160][CORE][YARN] spark.files & spark.jars should not be passed to driver in yarn mode #15669

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from

Conversation

zjffdu
Copy link
Contributor

@zjffdu zjffdu commented Oct 28, 2016

What changes were proposed in this pull request?

spark.files is still passed to driver in yarn mode, so SparkContext will still handle it which cause the error in the jira desc.

How was this patch tested?

Tested manually in a 5 node cluster. As this issue only happens in multiple node cluster, so I didn't write test for it.

@SparkQA
Copy link

SparkQA commented Oct 28, 2016

Test build #67699 has finished for PR 15669 at commit 67a5ccf.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@zjffdu
Copy link
Contributor Author

zjffdu commented Oct 28, 2016

@vanzin @JoshRosen @yanboliang @brkyvz Please help review.

@tgravescs
Copy link
Contributor

I thought we had fixed this in 2.0 to make it consistent that anytime --files is specified with yarn it goes through distributed cache. It shouldn't be using the spark addFiles method to distribute. Perhaps that broke? @vanzin do you remember

@jerryshao
Copy link
Contributor

jerryshao commented Oct 30, 2016

Quite strange --files in yarn-cluster will fall into SparkContext? AFAIK it will be handled by yarn#client to add to distributed cache, it will not involve SparkContext's addFiles.

@zjffdu
Copy link
Contributor Author

zjffdu commented Oct 31, 2016

@jerryshao spark.files is always passed to driver so SparkContext.addFile is called in yarn-cluster.

// Load any properties specified through --conf and the default properties file

    // Load any properties specified through --conf and the default properties file
    for ((k, v) <- args.sparkProperties) {
      sysProps.getOrElseUpdate(k, v)
    }

Seems the issue is that spark.files don't need to be passed to driver in yarn-cluster mode. In that case it can be fixed in SparkSubmit.scala, another thing is that I notice some suspicious code in SparkContext.addJar, Is the following code still needed ?

if (master == "yarn" && deployMode == "cluster") {

 if (master == "yarn" && deployMode == "cluster") {
              // In order for this to work in yarn cluster mode the user must specify the
              // --addJars option to the client to upload the file into the distributed cache
              // of the AM to make it show up in the current working directory.
              val fileName = new Path(uri.getPath).getName()
              try {
                env.rpcEnv.fileServer.addJar(new File(fileName))
              } catch {
                case e: Exception =>
                  // For now just log an error but allow to go through so spark examples work.
                  // The spark examples don't really need the jar distributed since its also
                  // the app jar.
                  logError("Error adding jar (" + e + "), was the --addJars option used?")
                  null
              }
            } else {

@jerryshao
Copy link
Contributor

jerryshao commented Oct 31, 2016

From the code what I can see is that spark.files will not be set in yarn mode.

      OptionAssigner(args.files, LOCAL | STANDALONE | MESOS, ALL_DEPLOY_MODES,
        sysProp = "spark.files"),

@zjffdu
Copy link
Contributor Author

zjffdu commented Oct 31, 2016

spark.files would still be passed to driver even in yarn-cluster if you check the following code.

// Load any properties specified through --conf and the default properties file

    // Load any properties specified through --conf and the default properties file
    for ((k, v) <- args.sparkProperties) {
      sysProps.getOrElseUpdate(k, v)
    }

@tgravescs
Copy link
Contributor

yeah so I think the fix here should be that we don't pass it through or don't handle it in sparkContext. All the yarn stuff should go through the distributed cache.

@mridulm
Copy link
Contributor

mridulm commented Oct 31, 2016

I agree with Tom, for yarn cluster mode we should rely on distributed cache

@vanzin
Copy link
Contributor

vanzin commented Oct 31, 2016

If I understand the issue here, the problem is not --files but setting spark.files through --conf (or the config file). The former is translated into spark.yarn.dist.files by SparkSubmit but it seems the latter isn't, and to me that's the bug. So basically what Tom and Mridul have said.

I think the correct fix would be in either SparkSubmit.scala or YARN's Client.scala - probably the latter to keep the YARN-specific stuff in one place.

@zjffdu
Copy link
Contributor Author

zjffdu commented Oct 31, 2016

that's correct, it is due to spark.files, jira has been updated. Will update the PR soon.

@zjffdu zjffdu changed the title [SPARK-18160][CORE][YARN] SparkContext.addFile doesn't work in yarn-cluster mode [SPARK-18160][CORE][YARN] spark.files should not be passed to driver in yarn-cluster mode Nov 1, 2016
case exc: FileNotFoundException =>
logError(s"Jar not found at $path")
null
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are obsoleted code IMO, so I remove them.

@SparkQA
Copy link

SparkQA commented Nov 1, 2016

Test build #67868 has finished for PR 15669 at commit 230d56c.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@zjffdu zjffdu closed this Nov 1, 2016
@zjffdu zjffdu reopened this Nov 1, 2016
@SparkQA
Copy link

SparkQA commented Nov 1, 2016

Test build #67885 has finished for PR 15669 at commit b8aa1fb.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@jerryshao
Copy link
Contributor

I think this issue also existed in yarn client mode, files will be added twice (once with Spark's file server, another with distributed cache), because both file server and yarn#client could find this file, then there's no exception. But this will be failed in yarn cluster mode since spark file server could not find out file with specified path.

@zjffdu
Copy link
Contributor Author

zjffdu commented Nov 1, 2016

That's correct, this PR will also fix the yarn-client case. PR title is updated.

@zjffdu zjffdu changed the title [SPARK-18160][CORE][YARN] spark.files should not be passed to driver in yarn-cluster mode [SPARK-18160][CORE][YARN] spark.files should not be passed to driver in yarn mode Nov 1, 2016
@zjffdu zjffdu changed the title [SPARK-18160][CORE][YARN] spark.files should not be passed to driver in yarn mode [SPARK-18160][CORE][YARN] spark.files & spark.jars should not be passed to driver in yarn mode Nov 1, 2016
@@ -1202,7 +1202,9 @@ private object Client extends Logging {
// Note that any env variable with the SPARK_ prefix gets propagated to all (remote) processes
System.setProperty("SPARK_YARN_MODE", "true")
val sparkConf = new SparkConf

// yarn would use dist cache to distribute files & jars, so remove them from sparkConf
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you update this comment to mention that SparkSubmit uses the YARN cache to distribute these in YARN mode? It looked weird at first that these were just being removed, but the code in SparkSubmit/SparkSubmitArguments takes care of setting the right config.

@vanzin
Copy link
Contributor

vanzin commented Nov 1, 2016

One minor comment update, otherwise LGTM.

@jerryshao
Copy link
Contributor

I'm wondering is it enough to address issue in yarn client mode, since we will have to properties "spark.files" and "spark.yarn.dist.files", the former will be handled by SparkContext and the latter will be handled by yarn#client, while yarn#client is created after SparkContext is launched, so is it too late to remove "spark.conf" in yarn#client, also yarn#client only honors "spark.yarn.dist.files", this will still add files again.

@zjffdu
Copy link
Contributor Author

zjffdu commented Nov 2, 2016

hmm, noticed spark.files is still passed to SparkContext in yarn-client mode, seems I need to do that in SparkSubmit

@vanzin
Copy link
Contributor

vanzin commented Nov 2, 2016

I think it would be better to treat that as a separate issue. spark.files has always worked like this in client mode; the behavior being changed here fixes an actual bug in cluster mode.

I'd consider using the cache for spark.files in client mode a separate enhancement.

@SparkQA
Copy link

SparkQA commented Nov 2, 2016

Test build #67941 has finished for PR 15669 at commit 0dd5486.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@vanzin
Copy link
Contributor

vanzin commented Nov 2, 2016

Merging to master / 2.0.

asfgit pushed a commit that referenced this pull request Nov 2, 2016
…ed to driver in yarn mode

## What changes were proposed in this pull request?

spark.files is still passed to driver in yarn mode, so SparkContext will still handle it which cause the error in the jira desc.

## How was this patch tested?

Tested manually in a 5 node cluster. As this issue only happens in multiple node cluster, so I didn't write test for it.

Author: Jeff Zhang <zjffdu@apache.org>

Closes #15669 from zjffdu/SPARK-18160.

(cherry picked from commit 3c24299)
Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
asfgit pushed a commit that referenced this pull request Nov 2, 2016
…ed to driver in yarn mode

## What changes were proposed in this pull request?

spark.files is still passed to driver in yarn mode, so SparkContext will still handle it which cause the error in the jira desc.

## How was this patch tested?

Tested manually in a 5 node cluster. As this issue only happens in multiple node cluster, so I didn't write test for it.

Author: Jeff Zhang <zjffdu@apache.org>

Closes #15669 from zjffdu/SPARK-18160.

(cherry picked from commit 3c24299)
Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
@vanzin
Copy link
Contributor

vanzin commented Nov 2, 2016

Looks like there's a branch-2.1 now, so I merge into that branch too.

@asfgit asfgit closed this in 3c24299 Nov 2, 2016
uzadude pushed a commit to uzadude/spark that referenced this pull request Jan 27, 2017
…ed to driver in yarn mode

## What changes were proposed in this pull request?

spark.files is still passed to driver in yarn mode, so SparkContext will still handle it which cause the error in the jira desc.

## How was this patch tested?

Tested manually in a 5 node cluster. As this issue only happens in multiple node cluster, so I didn't write test for it.

Author: Jeff Zhang <zjffdu@apache.org>

Closes apache#15669 from zjffdu/SPARK-18160.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants