-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hive: Fix for missing table schema in map reduce job configurations #1557
Conversation
Which version of Hive you are using? Or this is query dependent? Thanks for spotting the issue! |
@pvary we're using hive 1.1. But I was not able to find any difference in the relevant code in Hive 2, which calls Queries which can run on the driver and doesn't spawn mr jobs succeed, the problem is only faced by queries such as DESC which needs mr jobs. |
Makes sense. Thanks, Peter |
Looks reasonable to me, but will this affect jobs that run multiple scans in a single MR stage? @massdosage, do we have HiveRunner tests for joins that run a two table scans in a stage? |
I think this does it: https://github.com/ExpediaGroup/iceberg/blob/master/mr/src/test/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandlerBaseTest.java#L145 as @pvary mentioned above. I know this caught an issue with multiple tables not being configured for the scan properly in the past but it's possible it doesn't capture all the cases that can occur. |
Okay, if we do have a test case that does a simple join, then I think this should be okay. It doesn't sound like we can reproduce the issue with the newer Hive versions, though. So I'll merge this without adding a test for it. |
This reverts commit 13d94bc.
Hive queries which spawn map reduce jobs are currently failing on live yarn clusters with the following stack trace:
This error occurs on the mappers and the reason for this failure is that the job configurations such as
TABLE_SCHEMA
,TABLE_LOCATION
,TABLE_IDENTIFIER
are not set correctly. The location of failure is here:The outcome of this PR is that the job configs are set correctly and map reduce job succeeds this erroneous stage.
I'm not sure why this error is not reproducible in HiveRunner unit tests.
Steps to reproduce:
cc: @shardulm94. @omalley