Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hive: Convert the CREATE TABLE ... PARTITIONED BY to Iceberg identity partitions #1917

Merged
merged 4 commits into from
Dec 18, 2020

Conversation

pvary
Copy link
Contributor

@pvary pvary commented Dec 11, 2020

After consulting with the field folks they convinced me that it would be beneficial to have the first version of conversion in place for creating partitioned Iceberg tables from Hive. They suggested that even in this limited form this feature can boost adoption by allowing to try out Iceberg tables with partitions without changing the actual SQL commands.

@github-actions github-actions bot added the build label Dec 12, 2020
Copy link
Collaborator

@marton-bod marton-bod left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks

@marton-bod
Copy link
Collaborator

@rdblue Could you please take a look at this as well, whenever suitable? :) Thank you!

@@ -197,20 +205,32 @@ private static Schema schema(Properties properties, org.apache.hadoop.hive.metas
if (properties.getProperty(InputFormatConfig.TABLE_SCHEMA) != null) {
return SchemaParser.fromJson(properties.getProperty(InputFormatConfig.TABLE_SCHEMA));
} else {
return HiveSchemaUtil.convert(hmsTable.getSd().getCols());
if (hmsTable.isSetPartitionKeys() && !hmsTable.getPartitionKeys().isEmpty()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When would isSetPartitionKeys() be true and getPartitionKeys() empty?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Theoretically it could be set to an empty list. Checked this way to keep on the safe side.

@@ -542,7 +542,7 @@ project(':iceberg-mr') {

test {
// testJoinTables / testScanTable
maxHeapSize '1500m'
maxHeapSize '2500m'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this needed? Additional tasks because of partitioning?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. The extra tasks eat more memory :(

@rdblue
Copy link
Contributor

rdblue commented Dec 17, 2020

Thanks @marton-bod for reviewing, and @pvary for working on this!

It looks good to me. I noted a few nits to fix, but I'm also fine merging this if you don't have time to fix them. I'll wait a day or two and then merge if I don't hear back.

@rdblue rdblue merged commit 0575296 into apache:master Dec 18, 2020
pvary pushed a commit to pvary/iceberg that referenced this pull request Jan 5, 2021
rdblue pushed a commit that referenced this pull request Jan 5, 2021
@pvary pvary deleted the identity branch January 7, 2021 08:26
XuQianJin-Stars pushed a commit to XuQianJin-Stars/iceberg that referenced this pull request Mar 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants