Skip to content

[SPARK-22282][SQL] Rename OrcRelation to OrcFileFormat and remove ORC_COMPRESSION #19502

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 6 commits into from
Closed

Conversation

dongjoon-hyun
Copy link
Member

@dongjoon-hyun dongjoon-hyun commented Oct 16, 2017

What changes were proposed in this pull request?

This PR aims to

  • Rename OrcRelation to OrcFileFormat object.
  • Replace OrcRelation.ORC_COMPRESSION with org.apache.orc.OrcConf.COMPRESS. Since SPARK-21422, we can use OrcConf.COMPRESS instead of Hive's.
// The references of Hive's classes will be minimized.
val ORC_COMPRESSION = "orc.compress"

How was this patch tested?

Pass the Jenkins with the existing and updated test cases.

@@ -256,9 +257,6 @@ private[orc] class OrcOutputWriter(
}

private[orc] object OrcRelation extends HiveInspectors {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dongjoon-hyun, mind changing this to OrcFileFormat while we are here (see #14529)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. No problem. Thank you for review, @HyukjinKwon !

@HyukjinKwon
Copy link
Member

LGTM btw.

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-22282][SQL] Remove OrcRelation.ORC_COMPRESSION [SPARK-22282][SQL] Rename OrcRelation to OrcFileFormat and remove ORC_COMPRESSION Oct 16, 2017
@dongjoon-hyun
Copy link
Member Author

Thank you, @HyukjinKwon !

@@ -42,7 +44,7 @@ private[orc] class OrcOptions(
val compressionCodec: String = {
// `compression`, `orc.compress`, and `spark.sql.orc.compression.codec` are
// in order of precedence from highest to lowest.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update the comment.

Copy link
Member Author

@dongjoon-hyun dongjoon-hyun Oct 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for review, @gatorsmile .
The name orc.compress and precedence order is not changed. Which part do you want to change?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

COMPRESS.getAttribute is not in our code base. Please change it to

// compression, orc.compress(i.e., OrcConf.COMPRESS), and spark.sql.orc.compression.codec are

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see.

@SparkQA
Copy link

SparkQA commented Oct 16, 2017

Test build #82771 has finished for PR 19502 at commit 7416e4a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -180,7 +181,7 @@ class OrcQuerySuite extends QueryTest with BeforeAndAfterAll with OrcTest {
// Respect `orc.compress`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we change this comment too?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also the test name Respect orc.compress option when compression is unset.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Thank you for review, @viirya .

@SparkQA
Copy link

SparkQA commented Oct 16, 2017

Test build #82775 has finished for PR 19502 at commit 3d18d75.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -19,6 +19,7 @@ package org.apache.spark.sql.hive.orc

import java.io.File

import org.apache.orc.OrcConf.COMPRESS
import org.scalatest.BeforeAndAfterAll

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another test in OrcSourceSuite:

assert(new OrcOptions(Map("Orc.Compress" -> "NONE"), conf).compressionCodec == "NONE")

Copy link
Member Author

@dongjoon-hyun dongjoon-hyun Oct 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is case sensitivity test case. We should not change this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I was thinking something like COMPRESS.getAttribute.toUpperCase.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want, no problem at all. :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just like to be consistent. Thanks. :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep!

@@ -255,10 +256,7 @@ private[orc] class OrcOutputWriter(
}
}

private[orc] object OrcRelation extends HiveInspectors {
// The references of Hive's classes will be minimized.
val ORC_COMPRESSION = "orc.compress"
Copy link
Member

@viirya viirya Oct 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have documented orc.compress as option name explicitly in

* `orc.compress` and `spark.sql.parquet.compression.codec`. If `orc.compress` is given,

Now we depends on the configuration name from an external library. But I think the configuration name should not be changed at all. So looks should be fine.

Copy link
Member Author

@dongjoon-hyun dongjoon-hyun Oct 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. I agree. I don't think Apache ORC changes this in the future since this is a primitive configuration.
BTW, Thank you for pointing this doc. I'll fix some typos here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, right, the parquet.

@SparkQA
Copy link

SparkQA commented Oct 16, 2017

Test build #82778 has finished for PR 19502 at commit a554112.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 16, 2017

Test build #82782 has finished for PR 19502 at commit 2b0a092.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 16, 2017

Test build #82783 has finished for PR 19502 at commit 02792d9.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 16, 2017

Test build #82781 has finished for PR 19502 at commit b7335ac.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

retest this please

@SparkQA
Copy link

SparkQA commented Oct 16, 2017

Test build #82788 has finished for PR 19502 at commit 02792d9.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member Author

Thank you for retesting, @HyukjinKwon !

@dongjoon-hyun
Copy link
Member Author

@HyukjinKwon , @gatorsmile , @viirya .
This is ready for review again. Thanks!

@gatorsmile
Copy link
Member

Thanks! Merged to master.

@asfgit asfgit closed this in 561505e Oct 16, 2017
@dongjoon-hyun
Copy link
Member Author

Thank you, @gatorsmile , @HyukjinKwon , and @viirya !

@dongjoon-hyun dongjoon-hyun deleted the SPARK-22282 branch October 16, 2017 18:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants