[SPARK-31692][SQL][2.4] Pass hadoop confs specifed via Spark confs to URLStreamHandlerfactory #28529

karuppayya · 2020-05-14T17:06:48Z

What changes were proposed in this pull request?

Pass hadoop confs specifed via Spark confs to URLStreamHandlerfactory

Why are the changes needed?

BEFORE

➜  spark git:(SPARK-31692) ✗ ./bin/spark-shell --conf spark.hadoop.fs.file.impl=org.apache.hadoop.fs.RawLocalFileSystem

scala> spark.sharedState
res0: org.apache.spark.sql.internal.SharedState = org.apache.spark.sql.internal.SharedState@5793cd84

scala> new java.net.URL("file:///tmp/1.txt").openConnection.getInputStream
res1: java.io.InputStream = org.apache.hadoop.fs.ChecksumFileSystem$FSDataBoundedInputStream@22846025

scala> import org.apache.hadoop.fs._
import org.apache.hadoop.fs._

scala>  FileSystem.get(new Path("file:///tmp/1.txt").toUri, spark.sparkContext.hadoopConfiguration)
res2: org.apache.hadoop.fs.FileSystem = org.apache.hadoop.fs.LocalFileSystem@5a930c03

AFTER

➜  spark git:(SPARK-31692) ✗ ./bin/spark-shell --conf spark.hadoop.fs.file.impl=org.apache.hadoop.fs.RawLocalFileSystem

scala> spark.sharedState
res0: org.apache.spark.sql.internal.SharedState = org.apache.spark.sql.internal.SharedState@5c24a636

scala> new java.net.URL("file:///tmp/1.txt").openConnection.getInputStream
res1: java.io.InputStream = org.apache.hadoop.fs.FSDataInputStream@2ba8f528

scala> import org.apache.hadoop.fs._
import org.apache.hadoop.fs._

scala>  FileSystem.get(new Path("file:///tmp/1.txt").toUri, spark.sparkContext.hadoopConfiguration)
res2: org.apache.hadoop.fs.FileSystem = LocalFS

scala>  FileSystem.get(new Path("file:///tmp/1.txt").toUri, spark.sparkContext.hadoopConfiguration).getClass
res3: Class[_ <: org.apache.hadoop.fs.FileSystem] = class org.apache.hadoop.fs.RawLocalFileSystem

The type of FileSystem object created(you can check the last statement in the above snippets) in the above two cases are different, which should not have been the case

Does this PR introduce any user-facing change?

No

How was this patch tested?

Pass the jenkins with newly added test cases.

…treamHandlerfactory Pass hadoop confs specifed via Spark confs to URLStreamHandlerfactory **BEFORE** ``` ➜ spark git:(SPARK-31692) ✗ ./bin/spark-shell --conf spark.hadoop.fs.file.impl=org.apache.hadoop.fs.RawLocalFileSystem scala> spark.sharedState res0: org.apache.spark.sql.internal.SharedState = org.apache.spark.sql.internal.SharedState5793cd84 scala> new java.net.URL("file:///tmp/1.txt").openConnection.getInputStream res1: java.io.InputStream = org.apache.hadoop.fs.ChecksumFileSystem$FSDataBoundedInputStream22846025 scala> import org.apache.hadoop.fs._ import org.apache.hadoop.fs._ scala> FileSystem.get(new Path("file:///tmp/1.txt").toUri, spark.sparkContext.hadoopConfiguration) res2: org.apache.hadoop.fs.FileSystem = org.apache.hadoop.fs.LocalFileSystem5a930c03 ``` **AFTER** ``` ➜ spark git:(SPARK-31692) ✗ ./bin/spark-shell --conf spark.hadoop.fs.file.impl=org.apache.hadoop.fs.RawLocalFileSystem scala> spark.sharedState res0: org.apache.spark.sql.internal.SharedState = org.apache.spark.sql.internal.SharedState5c24a636 scala> new java.net.URL("file:///tmp/1.txt").openConnection.getInputStream res1: java.io.InputStream = org.apache.hadoop.fs.FSDataInputStream2ba8f528 scala> import org.apache.hadoop.fs._ import org.apache.hadoop.fs._ scala> FileSystem.get(new Path("file:///tmp/1.txt").toUri, spark.sparkContext.hadoopConfiguration) res2: org.apache.hadoop.fs.FileSystem = LocalFS scala> FileSystem.get(new Path("file:///tmp/1.txt").toUri, spark.sparkContext.hadoopConfiguration).getClass res3: Class[_ <: org.apache.hadoop.fs.FileSystem] = class org.apache.hadoop.fs.RawLocalFileSystem ``` The type of FileSystem object created(you can check the last statement in the above snippets) in the above two cases are different, which should not have been the case No Tested locally. Added Unit test Closes apache#28516 from karuppayya/SPARK-31692. Authored-by: Karuppayya Rajendran <karuppayya1990@gmail.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org> (cherry picked from commit 7260146)

karuppayya · 2020-05-14T17:07:20Z

Backport commit for SPARK-31692 @dongjoon-hyun

dongjoon-hyun · 2020-05-14T17:29:52Z

ok to test

dongjoon-hyun · 2020-05-14T17:30:02Z

Thank you, @karuppayya ! I updated the PR description like the original PR.

dongjoon-hyun · 2020-05-14T17:32:53Z

sql/core/src/main/scala/org/apache/spark/sql/internal/SharedState.scala

@@ -157,7 +157,13 @@ private[sql] class SharedState(val sparkContext: SparkContext) extends Logging {

 object SharedState extends Logging {
  try {
-    URL.setURLStreamHandlerFactory(new FsUrlStreamHandlerFactory())
+    SparkSession.getActiveSession match {


Ur, is it the same with branch-3.0?

Is this because we don't have a configuration patch in branch-2.4?

Yes, we dont have the configuration patch

SparkQA · 2020-05-14T17:35:36Z

Test build #122625 has finished for PR 28529 at commit e7b378f.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2020-05-14T17:48:06Z

Test build #122627 has started for PR 28529 at commit c96d6b0.

dongjoon-hyun · 2020-05-14T18:01:35Z

Hmm. In this case, I believe we had better have the conf patch because we can disable it when we have a regression on this PR. Let me take a look at that.

dongjoon-hyun · 2020-05-16T23:17:53Z

Retest this please

SparkQA · 2020-05-17T03:41:14Z

Test build #122748 has finished for PR 28529 at commit c96d6b0.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

dongjoon-hyun · 2020-05-19T00:04:21Z

Thank you for your patience, @karuppayya .
For this PR, I backported two commits additionally as a preparation.

dongjoon-hyun · 2020-05-19T00:15:30Z

@karuppayya . I cherry-picked your original commit from branch-3.0 to branch-2.4 and resolved the conflicts. Thanks!

probot-autolabeler bot added the SQL label May 14, 2020

Fix: Add back whitelines in imports

e7b378f

dongjoon-hyun reviewed May 14, 2020

View reviewed changes

Fix: Fix stylechecks

c96d6b0

dongjoon-hyun changed the title ~~[SPARK-31692][SQL] Pass hadoop confs specifed via Spark confs to URLStreamHandlerfactory~~ [SPARK-31692][SQL][2.4] Pass hadoop confs specifed via Spark confs to URLStreamHandlerfactory May 15, 2020

This was referenced May 18, 2020

[SPARK-25694][SQL] Add a config for URL.setURLStreamHandlerFactory #26530

Closed

[SPARK-25694][SQL][FOLLOW-UP] Move 'spark.sql.defaultUrlStreamHandlerFactory.enabled' into StaticSQLConf.scala #26570

Closed

dongjoon-hyun closed this May 19, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-31692][SQL][2.4] Pass hadoop confs specifed via Spark confs to URLStreamHandlerfactory #28529

[SPARK-31692][SQL][2.4] Pass hadoop confs specifed via Spark confs to URLStreamHandlerfactory #28529

Uh oh!

karuppayya commented May 14, 2020 •

edited by dongjoon-hyun

Loading

Uh oh!

karuppayya commented May 14, 2020

Uh oh!

dongjoon-hyun commented May 14, 2020

Uh oh!

dongjoon-hyun commented May 14, 2020 •

edited

Loading

Uh oh!

dongjoon-hyun May 14, 2020

Uh oh!

dongjoon-hyun May 14, 2020

Uh oh!

karuppayya May 14, 2020

Uh oh!

SparkQA commented May 14, 2020

Uh oh!

SparkQA commented May 14, 2020

Uh oh!

dongjoon-hyun commented May 14, 2020 •

edited

Loading

Uh oh!

dongjoon-hyun commented May 16, 2020

Uh oh!

SparkQA commented May 17, 2020

Uh oh!

dongjoon-hyun commented May 19, 2020

Uh oh!

dongjoon-hyun commented May 19, 2020

Uh oh!

Uh oh!

[SPARK-31692][SQL][2.4] Pass hadoop confs specifed via Spark confs to URLStreamHandlerfactory #28529

[SPARK-31692][SQL][2.4] Pass hadoop confs specifed via Spark confs to URLStreamHandlerfactory #28529

Uh oh!

Conversation

karuppayya commented May 14, 2020 • edited by dongjoon-hyun Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

karuppayya commented May 14, 2020

Uh oh!

dongjoon-hyun commented May 14, 2020

Uh oh!

dongjoon-hyun commented May 14, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dongjoon-hyun May 14, 2020

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun May 14, 2020

Choose a reason for hiding this comment

Uh oh!

karuppayya May 14, 2020

Choose a reason for hiding this comment

Uh oh!

SparkQA commented May 14, 2020

Uh oh!

SparkQA commented May 14, 2020

Uh oh!

dongjoon-hyun commented May 14, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dongjoon-hyun commented May 16, 2020

Uh oh!

SparkQA commented May 17, 2020

Uh oh!

dongjoon-hyun commented May 19, 2020

Uh oh!

dongjoon-hyun commented May 19, 2020

Uh oh!

Uh oh!

karuppayya commented May 14, 2020 •

edited by dongjoon-hyun

Loading

dongjoon-hyun commented May 14, 2020 •

edited

Loading

dongjoon-hyun commented May 14, 2020 •

edited

Loading