Hive: Convert the CREATE TABLE ... PARTITIONED BY to Iceberg identity partitions #1917

pvary · 2020-12-11T14:23:03Z

After consulting with the field folks they convinced me that it would be beneficial to have the first version of conversion in place for creating partitioned Iceberg tables from Hive. They suggested that even in this limited form this feature can boost adoption by allowing to try out Iceberg tables with partitions without changing the actual SQL commands.

… partitions

mr/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergMetaHook.java

mr/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandler.java

mr/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergMetaHook.java

…ons as well

marton-bod

LGTM, thanks

marton-bod · 2020-12-16T10:40:44Z

@rdblue Could you please take a look at this as well, whenever suitable? :) Thank you!

mr/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergMetaHook.java

rdblue · 2020-12-17T01:33:28Z

mr/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergMetaHook.java

@@ -197,20 +205,32 @@ private static Schema schema(Properties properties, org.apache.hadoop.hive.metas
    if (properties.getProperty(InputFormatConfig.TABLE_SCHEMA) != null) {
      return SchemaParser.fromJson(properties.getProperty(InputFormatConfig.TABLE_SCHEMA));
    } else {
-      return HiveSchemaUtil.convert(hmsTable.getSd().getCols());
+      if (hmsTable.isSetPartitionKeys() && !hmsTable.getPartitionKeys().isEmpty()) {


When would isSetPartitionKeys() be true and getPartitionKeys() empty?

Theoretically it could be set to an empty list. Checked this way to keep on the safe side.

mr/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergMetaHook.java

rdblue · 2020-12-17T01:36:22Z

build.gradle

@@ -542,7 +542,7 @@ project(':iceberg-mr') {

  test {
    // testJoinTables / testScanTable
-    maxHeapSize '1500m'
+    maxHeapSize '2500m'


Why was this needed? Additional tasks because of partitioning?

Yeah. The extra tasks eat more memory :(

rdblue · 2020-12-17T01:37:52Z

Thanks @marton-bod for reviewing, and @pvary for working on this!

It looks good to me. I noted a few nits to fix, but I'm also fine merging this if you don't have time to fix them. I'll wait a day or two and then merge if I don't hear back.

Hive: Convert the CREATE TABLE ... PARTITIONED BY to Iceberg identity…

f9ccf40

… partitions

github-actions bot added hive MR labels Dec 11, 2020

marton-bod reviewed Dec 12, 2020

View reviewed changes

mr/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergMetaHook.java Outdated Show resolved Hide resolved

mr/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandler.java Outdated Show resolved Hide resolved

marton-bod reviewed Dec 12, 2020

View reviewed changes

mr/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergMetaHook.java Outdated Show resolved Hide resolved

Review comments

04073e2

marton-bod reviewed Dec 12, 2020

View reviewed changes

mr/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergMetaHook.java Outdated Show resolved Hide resolved

Peter Vary added 2 commits December 12, 2020 18:01

Refactored test so we try out the insert path for multi level partiti…

b362083

…ons as well

Inceased heap size

ff585b4

github-actions bot added the build label Dec 12, 2020

marton-bod approved these changes Dec 15, 2020

View reviewed changes

rdblue reviewed Dec 17, 2020

View reviewed changes

mr/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergMetaHook.java Show resolved Hide resolved

rdblue reviewed Dec 17, 2020

View reviewed changes

mr/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergMetaHook.java Show resolved Hide resolved

rdblue reviewed Dec 17, 2020

View reviewed changes

mr/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergMetaHook.java Show resolved Hide resolved

rdblue reviewed Dec 17, 2020

View reviewed changes

rdblue approved these changes Dec 17, 2020

View reviewed changes

rdblue merged commit 0575296 into apache:master Dec 18, 2020

pvary pushed a commit to pvary/iceberg that referenced this pull request Jan 5, 2021

Fixes for remaining issues in apache#1917

9edfb8e

pvary mentioned this pull request Jan 5, 2021

Fixes for remaining issues in #1917 #2029

Merged

rdblue pushed a commit that referenced this pull request Jan 5, 2021

Hive: Fix minor issues issues from #1917 (#2029)

7a1858d

pvary deleted the identity branch January 7, 2021 08:26

XuQianJin-Stars pushed a commit to XuQianJin-Stars/iceberg that referenced this pull request Mar 22, 2021

Hive: Fix minor issues issues from apache#1917 (apache#2029)

faee538

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hive: Convert the CREATE TABLE ... PARTITIONED BY to Iceberg identity partitions #1917

Hive: Convert the CREATE TABLE ... PARTITIONED BY to Iceberg identity partitions #1917

pvary commented Dec 11, 2020

marton-bod left a comment

marton-bod commented Dec 16, 2020

rdblue Dec 17, 2020

pvary Jan 5, 2021

rdblue Dec 17, 2020

pvary Jan 5, 2021

rdblue commented Dec 17, 2020

Hive: Convert the CREATE TABLE ... PARTITIONED BY to Iceberg identity partitions #1917

Hive: Convert the CREATE TABLE ... PARTITIONED BY to Iceberg identity partitions #1917

Conversation

pvary commented Dec 11, 2020

marton-bod left a comment

Choose a reason for hiding this comment

marton-bod commented Dec 16, 2020

rdblue Dec 17, 2020

Choose a reason for hiding this comment

pvary Jan 5, 2021

Choose a reason for hiding this comment

rdblue Dec 17, 2020

Choose a reason for hiding this comment

pvary Jan 5, 2021

Choose a reason for hiding this comment

rdblue commented Dec 17, 2020