-
Notifications
You must be signed in to change notification settings - Fork 28.6k
[SPARK-31692][SQL][2.4] Pass hadoop confs specifed via Spark confs to URLStreamHandlerfactory #28529
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…treamHandlerfactory Pass hadoop confs specifed via Spark confs to URLStreamHandlerfactory **BEFORE** ``` ➜ spark git:(SPARK-31692) ✗ ./bin/spark-shell --conf spark.hadoop.fs.file.impl=org.apache.hadoop.fs.RawLocalFileSystem scala> spark.sharedState res0: org.apache.spark.sql.internal.SharedState = org.apache.spark.sql.internal.SharedState5793cd84 scala> new java.net.URL("file:///tmp/1.txt").openConnection.getInputStream res1: java.io.InputStream = org.apache.hadoop.fs.ChecksumFileSystem$FSDataBoundedInputStream22846025 scala> import org.apache.hadoop.fs._ import org.apache.hadoop.fs._ scala> FileSystem.get(new Path("file:///tmp/1.txt").toUri, spark.sparkContext.hadoopConfiguration) res2: org.apache.hadoop.fs.FileSystem = org.apache.hadoop.fs.LocalFileSystem5a930c03 ``` **AFTER** ``` ➜ spark git:(SPARK-31692) ✗ ./bin/spark-shell --conf spark.hadoop.fs.file.impl=org.apache.hadoop.fs.RawLocalFileSystem scala> spark.sharedState res0: org.apache.spark.sql.internal.SharedState = org.apache.spark.sql.internal.SharedState5c24a636 scala> new java.net.URL("file:///tmp/1.txt").openConnection.getInputStream res1: java.io.InputStream = org.apache.hadoop.fs.FSDataInputStream2ba8f528 scala> import org.apache.hadoop.fs._ import org.apache.hadoop.fs._ scala> FileSystem.get(new Path("file:///tmp/1.txt").toUri, spark.sparkContext.hadoopConfiguration) res2: org.apache.hadoop.fs.FileSystem = LocalFS scala> FileSystem.get(new Path("file:///tmp/1.txt").toUri, spark.sparkContext.hadoopConfiguration).getClass res3: Class[_ <: org.apache.hadoop.fs.FileSystem] = class org.apache.hadoop.fs.RawLocalFileSystem ``` The type of FileSystem object created(you can check the last statement in the above snippets) in the above two cases are different, which should not have been the case No Tested locally. Added Unit test Closes apache#28516 from karuppayya/SPARK-31692. Authored-by: Karuppayya Rajendran <karuppayya1990@gmail.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org> (cherry picked from commit 7260146)
Backport commit for SPARK-31692 @dongjoon-hyun |
ok to test |
Thank you, @karuppayya ! I updated the PR description like the original PR. |
@@ -157,7 +157,13 @@ private[sql] class SharedState(val sparkContext: SparkContext) extends Logging { | |||
|
|||
object SharedState extends Logging { | |||
try { | |||
URL.setURLStreamHandlerFactory(new FsUrlStreamHandlerFactory()) | |||
SparkSession.getActiveSession match { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ur, is it the same with branch-3.0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this because we don't have a configuration patch in branch-2.4?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we dont have the configuration patch
Test build #122625 has finished for PR 28529 at commit
|
Test build #122627 has started for PR 28529 at commit |
Hmm. In this case, I believe we had better have the conf patch because we can disable it when we have a regression on this PR. Let me take a look at that. |
Retest this please |
Test build #122748 has finished for PR 28529 at commit
|
Thank you for your patience, @karuppayya . |
@karuppayya . I cherry-picked your original commit from |
What changes were proposed in this pull request?
Pass hadoop confs specifed via Spark confs to URLStreamHandlerfactory
Why are the changes needed?
BEFORE
AFTER
The type of FileSystem object created(you can check the last statement in the above snippets) in the above two cases are different, which should not have been the case
Does this PR introduce any user-facing change?
No
How was this patch tested?
Pass the jenkins with newly added test cases.