[SPARK-25132][SQL][BACKPORT-2.3] Case-insensitive field resolution when reading from Parquet #22183

seancxmao · 2018-08-22T05:27:33Z

What changes were proposed in this pull request?

This is a backport of #22148

Spark SQL returns NULL for a column whose Hive metastore schema and Parquet schema are in different letter cases, regardless of spark.sql.caseSensitive set to true or false. This PR aims to add case-insensitive field resolution for ParquetFileFormat.

Do case-insensitive resolution only if Spark is in case-insensitive mode.
Field resolution should fail if there is ambiguity, i.e. more than one field is matched.

How was this patch tested?

Unit tests added.

…en reading from Parquet

gatorsmile · 2018-08-22T15:44:28Z

See my comment: https://github.com/apache/spark/pull/22184/files#r212006137

cloud-fan · 2018-08-24T15:18:01Z

ok to test

cloud-fan · 2018-08-24T15:18:56Z

+1 to backport this. I think it's a bug that we don't respect case-sensitive config when resolving parquet fields.

yucai · 2018-08-24T15:28:42Z

We need to backport it. Without this PR, we cannot solve the data issue in [SPARK-25206] Wrong data may be returned when enable pushdown.

SparkQA · 2018-08-24T18:31:36Z

Test build #95215 has finished for PR 22183 at commit 2831588.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

gatorsmile · 2018-08-27T22:30:04Z

For Hive tables, column resolution is always case insensitive. However, When spark.sql.hive.convertMetastoreParquet is true, users might face inconsistent behaviors when they use native parquet reader to resolve the columns in the case sensitive mode. We still introduce behavior changes. Better error messages sounds good enough, instead of disabling spark.sql.hive.convertMetastoreParquet when the mode is case sensitive.

AmplabJenkins · 2018-08-30T22:17:52Z

Can one of the admins verify this patch?

HyukjinKwon · 2018-08-31T02:23:55Z

BTW, @gatorsmile and @cloud-fan, do you know who did this ^ and why?

cloud-fan · 2018-08-31T05:23:51Z

As discussed in the JIRA, this is a partial fix, and we need to backport another 2 PRs, which is risky. Can we close it?

HyukjinKwon · 2018-08-31T07:09:07Z

ohh.. no no .. I meant:

Can one of the admins verify this patch?

cloud-fan · 2018-08-31T07:18:49Z

ah, I think there is a service that will comment on stale PRs to ask people to review. I don't who maintain this service though...

[SPARK-25132][SQL][BACKPORT-2.3] Case-insensitive field resolution wh…

2831588

…en reading from Parquet

HyukjinKwon approved these changes Aug 26, 2018

View reviewed changes

seancxmao closed this Sep 5, 2018

dongjoon-hyun mentioned this pull request Sep 11, 2018

[SPARK-25391][SQL] Make behaviors consistent when converting parquet hive table to parquet data source #22343

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-25132][SQL][BACKPORT-2.3] Case-insensitive field resolution when reading from Parquet #22183

[SPARK-25132][SQL][BACKPORT-2.3] Case-insensitive field resolution when reading from Parquet #22183

Uh oh!

seancxmao commented Aug 22, 2018

Uh oh!

gatorsmile commented Aug 22, 2018

Uh oh!

cloud-fan commented Aug 24, 2018

Uh oh!

cloud-fan commented Aug 24, 2018

Uh oh!

yucai commented Aug 24, 2018

Uh oh!

SparkQA commented Aug 24, 2018

Uh oh!

gatorsmile commented Aug 27, 2018

Uh oh!

AmplabJenkins commented Aug 30, 2018

Uh oh!

HyukjinKwon commented Aug 31, 2018

Uh oh!

cloud-fan commented Aug 31, 2018

Uh oh!

HyukjinKwon commented Aug 31, 2018

Uh oh!

cloud-fan commented Aug 31, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

[SPARK-25132][SQL][BACKPORT-2.3] Case-insensitive field resolution when reading from Parquet #22183

[SPARK-25132][SQL][BACKPORT-2.3] Case-insensitive field resolution when reading from Parquet #22183

Uh oh!

Conversation

seancxmao commented Aug 22, 2018

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

gatorsmile commented Aug 22, 2018

Uh oh!

cloud-fan commented Aug 24, 2018

Uh oh!

cloud-fan commented Aug 24, 2018

Uh oh!

yucai commented Aug 24, 2018

Uh oh!

SparkQA commented Aug 24, 2018

Uh oh!

gatorsmile commented Aug 27, 2018

Uh oh!

AmplabJenkins commented Aug 30, 2018

Uh oh!

HyukjinKwon commented Aug 31, 2018

Uh oh!

cloud-fan commented Aug 31, 2018

Uh oh!

HyukjinKwon commented Aug 31, 2018

Uh oh!

cloud-fan commented Aug 31, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants