-
Notifications
You must be signed in to change notification settings - Fork 28.6k
[SPARK-22282][SQL] Rename OrcRelation to OrcFileFormat and remove ORC_COMPRESSION #19502
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -256,9 +257,6 @@ private[orc] class OrcOutputWriter( | |||
} | |||
|
|||
private[orc] object OrcRelation extends HiveInspectors { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dongjoon-hyun, mind changing this to OrcFileFormat
while we are here (see #14529)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. No problem. Thank you for review, @HyukjinKwon !
LGTM btw. |
Thank you, @HyukjinKwon ! |
@@ -42,7 +44,7 @@ private[orc] class OrcOptions( | |||
val compressionCodec: String = { | |||
// `compression`, `orc.compress`, and `spark.sql.orc.compression.codec` are | |||
// in order of precedence from highest to lowest. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please update the comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for review, @gatorsmile .
The name orc.compress and precedence order is not changed. Which part do you want to change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
COMPRESS.getAttribute
is not in our code base. Please change it to
//
compression
,orc.compress
(i.e., OrcConf.COMPRESS), andspark.sql.orc.compression.codec
are
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see.
Test build #82771 has finished for PR 19502 at commit
|
@@ -180,7 +181,7 @@ class OrcQuerySuite extends QueryTest with BeforeAndAfterAll with OrcTest { | |||
// Respect `orc.compress`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we change this comment too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also the test name Respect orc.compress option when compression is unset
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. Thank you for review, @viirya .
Test build #82775 has finished for PR 19502 at commit
|
@@ -19,6 +19,7 @@ package org.apache.spark.sql.hive.orc | |||
|
|||
import java.io.File | |||
|
|||
import org.apache.orc.OrcConf.COMPRESS | |||
import org.scalatest.BeforeAndAfterAll | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another test in OrcSourceSuite
:
spark/sql/hive/src/test/scala/org/apache/spark/sql/hive/orc/OrcSourceSuite.scala
Line 153 in d8f4540
assert(new OrcOptions(Map("Orc.Compress" -> "NONE"), conf).compressionCodec == "NONE") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is case sensitivity test case. We should not change this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I was thinking something like COMPRESS.getAttribute.toUpperCase
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want, no problem at all. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just like to be consistent. Thanks. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep!
@@ -255,10 +256,7 @@ private[orc] class OrcOutputWriter( | |||
} | |||
} | |||
|
|||
private[orc] object OrcRelation extends HiveInspectors { | |||
// The references of Hive's classes will be minimized. | |||
val ORC_COMPRESSION = "orc.compress" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have documented orc.compress
as option name explicitly in
* `orc.compress` and `spark.sql.parquet.compression.codec`. If `orc.compress` is given, |
Now we depends on the configuration name from an external library. But I think the configuration name should not be changed at all. So looks should be fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep. I agree. I don't think Apache ORC changes this in the future since this is a primitive configuration.
BTW, Thank you for pointing this doc. I'll fix some typos here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, right, the parquet
.
Test build #82778 has finished for PR 19502 at commit
|
Test build #82782 has finished for PR 19502 at commit
|
Test build #82783 has finished for PR 19502 at commit
|
Test build #82781 has finished for PR 19502 at commit
|
retest this please |
Test build #82788 has finished for PR 19502 at commit
|
Thank you for retesting, @HyukjinKwon ! |
@HyukjinKwon , @gatorsmile , @viirya . |
Thanks! Merged to master. |
Thank you, @gatorsmile , @HyukjinKwon , and @viirya ! |
What changes were proposed in this pull request?
This PR aims to
OrcRelation
toOrcFileFormat
object.OrcRelation.ORC_COMPRESSION
withorg.apache.orc.OrcConf.COMPRESS
. Since SPARK-21422, we can useOrcConf.COMPRESS
instead of Hive's.How was this patch tested?
Pass the Jenkins with the existing and updated test cases.