-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-25132][SQL][BACKPORT-2.3] Case-insensitive field resolution when reading from Parquet #22183
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…en reading from Parquet
|
See my comment: https://github.com/apache/spark/pull/22184/files#r212006137 |
|
ok to test |
|
+1 to backport this. I think it's a bug that we don't respect case-sensitive config when resolving parquet fields. |
|
We need to backport it. Without this PR, we cannot solve the data issue in [SPARK-25206] Wrong data may be returned when enable pushdown. |
|
Test build #95215 has finished for PR 22183 at commit
|
|
For Hive tables, column resolution is always case insensitive. However, When |
|
Can one of the admins verify this patch? |
|
BTW, @gatorsmile and @cloud-fan, do you know who did this ^ and why? |
|
As discussed in the JIRA, this is a partial fix, and we need to backport another 2 PRs, which is risky. Can we close it? |
|
ohh.. no no .. I meant:
|
|
ah, I think there is a service that will comment on stale PRs to ask people to review. I don't who maintain this service though... |
What changes were proposed in this pull request?
This is a backport of #22148
Spark SQL returns NULL for a column whose Hive metastore schema and Parquet schema are in different letter cases, regardless of spark.sql.caseSensitive set to true or false. This PR aims to add case-insensitive field resolution for ParquetFileFormat.
How was this patch tested?
Unit tests added.