Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run all StorageHandler tests with both MR and Tez #1695

Merged
merged 2 commits into from
Nov 3, 2020

Conversation

marton-bod
Copy link
Collaborator

@marton-bod marton-bod commented Oct 30, 2020

Use the execution engine as a test parameter to run all hive storage handler tests with both MR and Tez.

Note: setting the table schema in the global job conf (see #1557) in HiveIcebergStorageHandler#configureJobConf() meant that whichever table called this function last, ended up having its schema stored in the global conf. Then in HiveIcebergSerde#initialize(), the schema was taken from the conf when present, but that led to failures since all tables were now using the schema of the table that was the "winner" (setting its schema last in the previous step). To keep the caching nonetheless, I'm now storing and retrieving the schemas of each table separately with no risk of overwriting.

I can't rule out that there are other issues with how job conf is set in #1557, but I haven't observed any other ones for now. The tests are working with this small tweak.

@rdblue @pvary @lcspinter - could you please review when you have the chance? Thanks!

@rdblue
Copy link
Contributor

rdblue commented Oct 30, 2020

It sounds like the fix for the issue from #1557 should be a separate PR because that could break other uses. Do you think we should separate that out?

@marton-bod
Copy link
Collaborator Author

@rdblue yes, I think it makes sense to separate it out. The logic is flawed so it can cause other issues later and I believe does not address the Hive1.1-related issue fully either. I've created an issue #1708 and a PR #1707 for that.

I'll rebase this PR to the latest master once #1707 has been merged.

@rdblue rdblue merged commit 4f2b82e into apache:master Nov 3, 2020
@rdblue
Copy link
Contributor

rdblue commented Nov 3, 2020

Merged. Thanks @marton-bod!

marton-bod added a commit to ExpediaGroup/iceberg that referenced this pull request Nov 18, 2020
Co-authored-by: Marton Bod <mbod@cloudera.com>
massdosage pushed a commit to ExpediaGroup/iceberg that referenced this pull request Nov 18, 2020
* Always check the selected columns when getting a record reader
* Hive: Run StorageHandler tests in both MR and Tez (apache#1695)
* Disable vectorized Tez executions temporarily
Co-authored-by: Marton Bod <mbod@cloudera.com>
massdosage added a commit to ExpediaGroup/iceberg that referenced this pull request Nov 18, 2020
* Always check the selected columns when getting a record reader
* Hive: Run StorageHandler tests in both MR and Tez (apache#1695)
* fix checkstyle problem

Co-authored-by: Marton Bod <mbod@cloudera.com>
Co-authored-by: Adrian Woodhead <awoodhead@expediagroup.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants