-
Notifications
You must be signed in to change notification settings - Fork 28.6k
[SPARK-25880][CORE] user set's hadoop conf should not overwrite by sparkcontext's conf #22887
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-25880][CORE] user set's hadoop conf should not overwrite by sparkcontext's conf #22887
Conversation
test this please |
Hi @gjhkael , |
|
user set hadoop conf can't overwrite spark-defaults.conf SparkHadoopUtil.get.appendS3AndSparkHadoopConfigurations overwrite the user-set spark.hadoop with the default configuration (sparkSession.sparkContext.conf) @gengliangwang @cloud-fan @gatorsmile
test:spark-defaults.conf
spark-shellval hadoopConfKey="mapreduce.input.fileinputformat.split.maxsize"
spark.conf.get("spark.hadoop."+hadoopConfKey) // 2
var hadoopConf=spark.sessionState.newHadoopConf
hadoopConf.get(hadoopConfKey) // 2
spark.conf.set(hadoopConfKey,1) // set 1
hadoopConf=spark.sessionState.newHadoopConf
hadoopConf.get(hadoopConfKey) // 1
//org.apache.spark.sql.hive.HadoopTableReader append Conf
org.apache.spark.deploy.SparkHadoopUtil.get.appendS3AndSparkHadoopConfigurations(spark.sparkContext.getConf, hadoopConf)
//org.apache.spark.sql.hive.HadoopTableReader _broadcastedHadoopConf
hadoopConf.get("mapreduce.input.fileinputformat.split.maxsize") // 2
|
looks reasonable, cc @gatorsmile |
Sorry, this is a breaking change. It changes the behavior from "I can currently override any Hadoop configs, even final ones, using spark.hadoop.*" to "I can never do that". If there's an issue with the SQL "set" command that needs to be addressed, this is the wrong place to do it. Basically, if my "core-size.xml" says "mapreduce.input.fileinputformat.split.maxsize" is 2, and my Spark conf says "spark.hadoop.mapreduce.input.fileinputformat.split.maxsize" is 3, then the value from the config generated by the method you're changing must be 3. |
@vanzin Thanks for you review, I add a new commit to let the user's "set" command take effect. Let me know if you have an easier way. Thanks. |
I think this is what this PR tries to fix? the |
To be fair I'm not sure I fully understand the PR description. But I know that the previous patch (which I commented on) broke the functionality I described - not in the context of SQL, but in the context of everything else in Spark that calls that code. |
@vanzin @cloud-fan |
ok, that makes sense as in I understand what you're saying, but not sure it's what you actually want? Why shouldn't "set spark.hadoop.*" override spark-defaults.conf? But, in any case, it seems like the patch for SPARK-26060 (#23031) should take care of this (by raising an error). |
Spark SQL SET command can't update any static config or Spark core configs, but I think hadoop configs are different. It's not static as users can update it via So it's reasonble for users to expect that, if they set hadoop config via the SQL SET command, it should override the one in Looking back at For the first case, nothing we need to worry about. For the second case, I think the hadoop config should take priority, as it contains the configs specified by users at rutime. |
I agree with that. But the previous explanation seemed to be saying that's the undesired behavior. Maybe I'm just having trouble with understanding what @gjhkael wrote. |
@gjhkael can you clarify further what the undesirable behavior is, and what behavior you are looking for? |
Can one of the admins verify this patch? |
@gjhkael can you please clarify this or close it if it is no longer relavent? |
What changes were proposed in this pull request?
Hadoop conf which is set by user which is use sparksql's set command should not overwrite by sparkContext's conf which is read from spark-default.conf.
How was this patch tested?
manually verified with 2.2.0