Skip to content

[SPARK-30492][SQL] Eliminate deprecation warnings in ORC datasource #27179

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

MaxGekk
Copy link
Member

@MaxGekk MaxGekk commented Jan 12, 2020

What changes were proposed in this pull request?

In the PR, I propose to avoid usage of getTypes() in the SparkOrcNewRecordReader constructor, and replace it by getSchema().

Why are the changes needed?

To eliminate compiler warnings, and highlight other warnings that could indicate about real problems:

Warning:(44, 13) java: getTypes() in org.apache.orc.Reader has been deprecated
Warning:(47, 24) java: getTypes() in org.apache.orc.Reader has been deprecated

Does this PR introduce any user-facing change?

No

How was this patch tested?

By existing tests from the org.apache.spark.sql.hive.orc package like HiveOrcQuerySuite.

@MaxGekk
Copy link
Member Author

MaxGekk commented Jan 12, 2020

@dongjoon-hyun Please, take a look at this.

@dongjoon-hyun
Copy link
Member

Thank you for pinging me, @MaxGekk . Sure!
BTW, could you finish #27078 first?
I've been waiting for you two days because we need to discuss the real effect of NoOp there.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, this breaks our Hive 1.2 code. Can we have a fix for both Hive 1.2 and Hive 2.3?

[ERROR] [Error] /home/runner/work/spark/spark/sql/hive/src/main/java/org/apache/hadoop/hive/ql/io/orc/SparkOrcNewRecordReader.java:46: cannot find symbol
1420
  symbol:   method getSchema()
1421
  location: variable file of type org.apache.hadoop.hive.ql.io.orc.Reader

Since we cannot drop Hive 1.2 completely at least in 3.0 (or maybe until 3.1), we need to support it still.

cc @srowen , @wangyum and @gatorsmile

@MaxGekk
Copy link
Member Author

MaxGekk commented Jan 12, 2020

@dongjoon-hyun Just in case, do you know why there are 2 ORC implementations:

  • org.apache.spark.sql.hive.orc.OrcFileFormat
  • org.apache.spark.sql.execution.datasources.orc.OrcFileFormat

Is it something specific for ORC?

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Jan 12, 2020

Historically, org.apache.spark.sql.hive.orc.OrcFileFormat was using the the built-in ORC of Hive 1.2.1 module. And, we decided to keep it with Hive 1.2.1.

org.apache.spark.sql.execution.datasources.orc.OrcFileFormat is the one which is using Apache ORC library.

So, with -Phive1.2 profile, the above situation is still unchanged. The difference can be everything, but the main difference was ORC Filter classes.

@SparkQA
Copy link

SparkQA commented Jan 12, 2020

Test build #116568 has finished for PR 27179 at commit 3ced2ba.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member

srowen commented Jan 12, 2020

If it's much trouble here... I'd just leave it. We're not going to be able to resolve 100% of warnings just for reasons like this.

@MaxGekk
Copy link
Member Author

MaxGekk commented Jan 13, 2020

Can we have a fix for both Hive 1.2 and Hive 2.3?

I would propose to add the @Deprecated annotation to the org.apache.hadoop.hive.ql.io.orc.SparkOrcNewRecordReader and @deprecated to org.apache.spark.sql.hive.orc.OrcFileFormat (and other classes that depend from org.apache.spark.sql.hive.orc.OrcFileFormat). This will suppress compiler warnings for deprecated dependencies.

In any case, we are going to deprecate org.apache.spark.sql.hive.orc.OrcFileFormat. Maybe it is right time to make that in Spark 3.0.

@dongjoon-hyun
Copy link
Member

@MaxGekk . Sorry, but I'm technically -1 to prevent a feature regression.

I guess you are assuming that the new one supports all use cases of old one. However, it's not true. One simple long standing JIRA is https://issues.apache.org/jira/browse/SPARK-21997 . Users are still using the old ones because new ones (ORC and Parquet) don't provide the same feature.

For me, this one is not worth of your time. We had better move on from this part.

@MaxGekk
Copy link
Member Author

MaxGekk commented Jan 13, 2020

I guess you are assuming that the new one supports all use cases of old one. However, it's not true.

I didn't know that. @dongjoon-hyun Thank you for the explanation. I am closing this PR.

@MaxGekk MaxGekk closed this Jan 13, 2020
@MaxGekk MaxGekk deleted the orc-eliminate-warning branch June 5, 2020 19:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants