-
Notifications
You must be signed in to change notification settings - Fork 28.6k
[SPARK-25694][SQL] Add a config for URL.setURLStreamHandlerFactory
#26530
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
ok to test |
Thank you for making your first PR, @jiangzho . |
URL.setURLStreamHandlerFactory
This comment has been minimized.
This comment has been minimized.
sql/core/src/main/scala/org/apache/spark/sql/internal/SharedState.scala
Outdated
Show resolved
Hide resolved
Retest this please. |
This comment has been minimized.
This comment has been minimized.
Retest this please |
This comment has been minimized.
This comment has been minimized.
Add @steveloughran for more feedback. |
7eb9697
to
1970254
Compare
sql/core/src/main/scala/org/apache/spark/sql/internal/SharedState.scala
Outdated
Show resolved
Hide resolved
sql/core/src/main/scala/org/apache/spark/sql/internal/SharedState.scala
Outdated
Show resolved
Hide resolved
sql/core/src/main/scala/org/apache/spark/sql/internal/SharedState.scala
Outdated
Show resolved
Hide resolved
This comment has been minimized.
This comment has been minimized.
sql/core/src/main/scala/org/apache/spark/sql/internal/SharedState.scala
Outdated
Show resolved
Hide resolved
sql/core/src/main/scala/org/apache/spark/sql/internal/SharedState.scala
Outdated
Show resolved
Hide resolved
sql/core/src/main/scala/org/apache/spark/sql/internal/SharedState.scala
Outdated
Show resolved
Hide resolved
sql/core/src/main/scala/org/apache/spark/sql/internal/SharedState.scala
Outdated
Show resolved
Hide resolved
Test build #113970 has finished for PR 26530 at commit
|
Test build #113971 has finished for PR 26530 at commit
|
Now, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK by me. I think the checking outside the synchronized block isn't that important here as there won't be contention, but this is OK.
Test build #113974 has finished for PR 26530 at commit
|
Test build #113976 has finished for PR 26530 at commit
|
Merged into master. Thanks all! |
Test build #113979 has finished for PR 26530 at commit
|
|
||
package object config { | ||
|
||
private[spark] val DEFAULT_URL_STREAM_HANDLER_FACTORY_ENABLED = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is it in a separate file not in StaticSQLConf
or SQLConf
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made a followup - #26570
Thank you for merging, @dbtsai . Thanks for the follow-up, @HyukjinKwon . @jiangzho . You are added to the Apache Spark Contributor group. |
Test build #113985 has finished for PR 26530 at commit
|
…Factory.enabled' into StaticSQLConf.scala ### What changes were proposed in this pull request? This PR is a followup of #26530 and proposes to move the configuration `spark.sql.defaultUrlStreamHandlerFactory.enabled` to `StaticSQLConf.scala` for consistency. ### Why are the changes needed? To put the similar configurations together and for readability. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Manually tested as described in #26530. Closes #26570 from HyukjinKwon/SPARK-25694. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: HyukjinKwon <gurwls223@apache.org>
Sorry, missed this. I don't think the Hadoop HTTP FS should be registering its classes as the handlers of HTTP/HTTPS requests at all, as it breaks all the code which expects to cast it to well known types. HADOOP-14598 fixes it for the https connections -really people should be using a version of the hadoop libraries with spark. These are old, fixed, bugs you are trying to work around here. |
Thank you, @steveloughran . Sure. |
Hi, @jiangzho , @dbtsai , @holdenk and all.
This PR is also helpful and safe. I'll backport this to |
Add a property `spark.fsUrlStreamHandlerFactory.enabled` to allow users turn off the default registration of `org.apache.hadoop.fs.FsUrlStreamHandlerFactory` This [SPARK-25694](https://issues.apache.org/jira/browse/SPARK-25694) is a long-standing issue. Originally, [[SPARK-12868][SQL] Allow adding jars from hdfs](#17342 ) added this for better Hive support. However, this have a side-effect when the users use Apache Spark without `-Phive`. This causes exceptions when the users tries to use another custom factories or 3rd party library (trying to set this). This configuration will unblock those non-hive users. Yes. This provides a new user-configurable property. By default, the behavior is unchanged. Manual testing. **BEFORE** ``` $ build/sbt package $ bin/spark-shell scala> sql("show tables").show +--------+---------+-----------+ |database|tableName|isTemporary| +--------+---------+-----------+ +--------+---------+-----------+ scala> java.net.URL.setURLStreamHandlerFactory(new org.apache.hadoop.fs.FsUrlStreamHandlerFactory()) java.lang.Error: factory already defined at java.net.URL.setURLStreamHandlerFactory(URL.java:1134) ... 47 elided ``` **AFTER** ``` $ build/sbt package $ bin/spark-shell --conf spark.sql.defaultUrlStreamHandlerFactory.enabled=false scala> sql("show tables").show +--------+---------+-----------+ |database|tableName|isTemporary| +--------+---------+-----------+ +--------+---------+-----------+ scala> java.net.URL.setURLStreamHandlerFactory(new org.apache.hadoop.fs.FsUrlStreamHandlerFactory()) ``` Closes #26530 from jiangzho/master. Lead-authored-by: Zhou Jiang <zhou_jiang@apple.com> Co-authored-by: Dongjoon Hyun <dhyun@apple.com> Co-authored-by: zhou-jiang <zhou_jiang@apple.com> Signed-off-by: DB Tsai <d_tsai@apple.com> (cherry picked from commit ee3bd6d) Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
…Factory.enabled' into StaticSQLConf.scala This PR is a followup of #26530 and proposes to move the configuration `spark.sql.defaultUrlStreamHandlerFactory.enabled` to `StaticSQLConf.scala` for consistency. To put the similar configurations together and for readability. No. Manually tested as described in #26530. Closes #26570 from HyukjinKwon/SPARK-25694. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: HyukjinKwon <gurwls223@apache.org> (cherry picked from commit 8469614) Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
Add a property `spark.fsUrlStreamHandlerFactory.enabled` to allow users turn off the default registration of `org.apache.hadoop.fs.FsUrlStreamHandlerFactory` This [SPARK-25694](https://issues.apache.org/jira/browse/SPARK-25694) is a long-standing issue. Originally, [[SPARK-12868][SQL] Allow adding jars from hdfs](apache#17342 ) added this for better Hive support. However, this have a side-effect when the users use Apache Spark without `-Phive`. This causes exceptions when the users tries to use another custom factories or 3rd party library (trying to set this). This configuration will unblock those non-hive users. Yes. This provides a new user-configurable property. By default, the behavior is unchanged. Manual testing. **BEFORE** ``` $ build/sbt package $ bin/spark-shell scala> sql("show tables").show +--------+---------+-----------+ |database|tableName|isTemporary| +--------+---------+-----------+ +--------+---------+-----------+ scala> java.net.URL.setURLStreamHandlerFactory(new org.apache.hadoop.fs.FsUrlStreamHandlerFactory()) java.lang.Error: factory already defined at java.net.URL.setURLStreamHandlerFactory(URL.java:1134) ... 47 elided ``` **AFTER** ``` $ build/sbt package $ bin/spark-shell --conf spark.sql.defaultUrlStreamHandlerFactory.enabled=false scala> sql("show tables").show +--------+---------+-----------+ |database|tableName|isTemporary| +--------+---------+-----------+ +--------+---------+-----------+ scala> java.net.URL.setURLStreamHandlerFactory(new org.apache.hadoop.fs.FsUrlStreamHandlerFactory()) ``` Closes apache#26530 from jiangzho/master. Lead-authored-by: Zhou Jiang <zhou_jiang@apple.com> Co-authored-by: Dongjoon Hyun <dhyun@apple.com> Co-authored-by: zhou-jiang <zhou_jiang@apple.com> Signed-off-by: DB Tsai <d_tsai@apple.com> (cherry picked from commit ee3bd6d) [SPARK-25694][SQL][FOLLOW-UP] Move 'spark.sql.defaultUrlStreamHandlerFactory.enabled' into StaticSQLConf.scala This PR is a followup of apache#26530 and proposes to move the configuration `spark.sql.defaultUrlStreamHandlerFactory.enabled` to `StaticSQLConf.scala` for consistency. To put the similar configurations together and for readability. No. Manually tested as described in apache#26530. Closes apache#26570 from HyukjinKwon/SPARK-25694. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: HyukjinKwon <gurwls223@apache.org> (cherry picked from commit 8469614) RB=1906279 BUG=LIHADOOP-50577 G=superfriends-reviewers R=zolin,yezhou,chsingh,mshen,fli,latang A=chsingh
What changes were proposed in this pull request?
Add a property
spark.fsUrlStreamHandlerFactory.enabled
to allow users turn off the default registration oforg.apache.hadoop.fs.FsUrlStreamHandlerFactory
Why are the changes needed?
This SPARK-25694 is a long-standing issue. Originally, [SPARK-12868][SQL] Allow adding jars from hdfs added this for better Hive support. However, this have a side-effect when the users use Apache Spark without
-Phive
. This causes exceptions when the users tries to use another custom factories or 3rd party library (trying to set this). This configuration will unblock those non-hive users.Does this PR introduce any user-facing change?
Yes. This provides a new user-configurable property.
By default, the behavior is unchanged.
How was this patch tested?
Manual testing.
BEFORE
AFTER