[SPARK-30492][SQL] Eliminate deprecation warnings in ORC datasource #27179

MaxGekk · 2020-01-12T20:45:29Z

What changes were proposed in this pull request?

In the PR, I propose to avoid usage of getTypes() in the SparkOrcNewRecordReader constructor, and replace it by getSchema().

Why are the changes needed?

To eliminate compiler warnings, and highlight other warnings that could indicate about real problems:

Warning:(44, 13) java: getTypes() in org.apache.orc.Reader has been deprecated
Warning:(47, 24) java: getTypes() in org.apache.orc.Reader has been deprecated

Does this PR introduce any user-facing change?

No

How was this patch tested?

By existing tests from the org.apache.spark.sql.hive.orc package like HiveOrcQuerySuite.

MaxGekk · 2020-01-12T20:45:52Z

@dongjoon-hyun Please, take a look at this.

dongjoon-hyun · 2020-01-12T20:49:47Z

Thank you for pinging me, @MaxGekk . Sure!
BTW, could you finish #27078 first?
I've been waiting for you two days because we need to discuss the real effect of NoOp there.

dongjoon-hyun

Unfortunately, this breaks our Hive 1.2 code. Can we have a fix for both Hive 1.2 and Hive 2.3?

[ERROR] [Error] /home/runner/work/spark/spark/sql/hive/src/main/java/org/apache/hadoop/hive/ql/io/orc/SparkOrcNewRecordReader.java:46: cannot find symbol
1420
  symbol:   method getSchema()
1421
  location: variable file of type org.apache.hadoop.hive.ql.io.orc.Reader

Since we cannot drop Hive 1.2 completely at least in 3.0 (or maybe until 3.1), we need to support it still.

cc @srowen , @wangyum and @gatorsmile

MaxGekk · 2020-01-12T21:50:38Z

@dongjoon-hyun Just in case, do you know why there are 2 ORC implementations:

org.apache.spark.sql.hive.orc.OrcFileFormat
org.apache.spark.sql.execution.datasources.orc.OrcFileFormat

Is it something specific for ORC?

dongjoon-hyun · 2020-01-12T22:00:30Z

Historically, org.apache.spark.sql.hive.orc.OrcFileFormat was using the the built-in ORC of Hive 1.2.1 module. And, we decided to keep it with Hive 1.2.1.

org.apache.spark.sql.execution.datasources.orc.OrcFileFormat is the one which is using Apache ORC library.

So, with -Phive1.2 profile, the above situation is still unchanged. The difference can be everything, but the main difference was ORC Filter classes.

SparkQA · 2020-01-12T23:09:27Z

Test build #116568 has finished for PR 27179 at commit 3ced2ba.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

srowen · 2020-01-12T23:38:27Z

If it's much trouble here... I'd just leave it. We're not going to be able to resolve 100% of warnings just for reasons like this.

MaxGekk · 2020-01-13T10:19:53Z

Can we have a fix for both Hive 1.2 and Hive 2.3?

I would propose to add the @Deprecated annotation to the org.apache.hadoop.hive.ql.io.orc.SparkOrcNewRecordReader and @deprecated to org.apache.spark.sql.hive.orc.OrcFileFormat (and other classes that depend from org.apache.spark.sql.hive.orc.OrcFileFormat). This will suppress compiler warnings for deprecated dependencies.

In any case, we are going to deprecate org.apache.spark.sql.hive.orc.OrcFileFormat. Maybe it is right time to make that in Spark 3.0.

dongjoon-hyun · 2020-01-13T20:18:39Z

@MaxGekk . Sorry, but I'm technically -1 to prevent a feature regression.

I guess you are assuming that the new one supports all use cases of old one. However, it's not true. One simple long standing JIRA is https://issues.apache.org/jira/browse/SPARK-21997 . Users are still using the old ones because new ones (ORC and Parquet) don't provide the same feature.

For me, this one is not worth of your time. We had better move on from this part.

MaxGekk · 2020-01-13T20:28:25Z

I guess you are assuming that the new one supports all use cases of old one. However, it's not true.

I didn't know that. @dongjoon-hyun Thank you for the explanation. I am closing this PR.

Eliminate warnings in SparkOrcNewRecordReader

3ced2ba

dongjoon-hyun added the SQL label Jan 12, 2020

dongjoon-hyun requested changes Jan 12, 2020

View reviewed changes

MaxGekk mentioned this pull request Jan 13, 2020

[SPARK-30505][DOCS] Deprecate Avro option ignoreExtension in sql-data-sources-avro.md #27194

Closed

MaxGekk closed this Jan 13, 2020

MaxGekk deleted the orc-eliminate-warning branch June 5, 2020 19:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-30492][SQL] Eliminate deprecation warnings in ORC datasource #27179

[SPARK-30492][SQL] Eliminate deprecation warnings in ORC datasource #27179

Uh oh!

MaxGekk commented Jan 12, 2020

Uh oh!

MaxGekk commented Jan 12, 2020

Uh oh!

dongjoon-hyun commented Jan 12, 2020

Uh oh!

dongjoon-hyun left a comment

Uh oh!

MaxGekk commented Jan 12, 2020

Uh oh!

dongjoon-hyun commented Jan 12, 2020 •

edited

Loading

Uh oh!

SparkQA commented Jan 12, 2020

Uh oh!

srowen commented Jan 12, 2020

Uh oh!

MaxGekk commented Jan 13, 2020

Uh oh!

dongjoon-hyun commented Jan 13, 2020

Uh oh!

MaxGekk commented Jan 13, 2020

Uh oh!

Uh oh!

[SPARK-30492][SQL] Eliminate deprecation warnings in ORC datasource #27179

[SPARK-30492][SQL] Eliminate deprecation warnings in ORC datasource #27179

Uh oh!

Conversation

MaxGekk commented Jan 12, 2020

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

MaxGekk commented Jan 12, 2020

Uh oh!

dongjoon-hyun commented Jan 12, 2020

Uh oh!

dongjoon-hyun left a comment

Choose a reason for hiding this comment

Uh oh!

MaxGekk commented Jan 12, 2020

Uh oh!

dongjoon-hyun commented Jan 12, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SparkQA commented Jan 12, 2020

Uh oh!

srowen commented Jan 12, 2020

Uh oh!

MaxGekk commented Jan 13, 2020

Uh oh!

dongjoon-hyun commented Jan 13, 2020

Uh oh!

MaxGekk commented Jan 13, 2020

Uh oh!

Uh oh!

dongjoon-hyun commented Jan 12, 2020 •

edited

Loading