Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hive: Support case insensitive in hive query #2053

Merged
merged 7 commits into from
Jan 16, 2021

Conversation

dixingxing0
Copy link
Contributor

@dixingxing0 dixingxing0 commented Jan 8, 2021

Fixes #2045.
Since hive is case insensitive, we will make HiveIcebergSerDe always case insensitive by set iceberg.mr.case.sensitive to false.

@pvary, could you review this please?

Schema projectedSchema = selectedColumns.length > 0 ? tableSchema.select(selectedColumns) : tableSchema;
Schema projectedSchema = tableSchema;

boolean caseSensitive = configuration.getBoolean(InputFormatConfig.CASE_SENSITIVE,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that we want to have a possibility to configure the IcebergInputFormat to be case sensitive or case insensitive since it can be used by other MR jobs as well. Do we want to allow the users of Hive to shot themselves on the foot and enable case sensitivity?
My first guess would be that we should not use the configuration here, just go with false, but if you have some specific use-case in your mind I can be easily convinced 😄

Copy link
Contributor Author

@dixingxing0 dixingxing0 Jan 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with you, i'll make here go with false, i don't have a specific use-case.
Since we make hive go with false, i think iceberg.mr.case.sensitive should use true as a default value, is that ok?


private final List<IcebergRecordStructField> structFields;
private final List<IcebergRecordStructField> structFieldsInLowercase;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we keep both of the fields? Shouldn't we just keep the currently requested version?

Copy link
Contributor Author

@dixingxing0 dixingxing0 Jan 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right! I'll change it.

…as default value for iceberg.mr.case.sensitive
@dixingxing0
Copy link
Contributor Author

@pvary, i just pushed a commit, could you please take a look?

@@ -150,6 +150,27 @@ public void testScanTable() throws IOException {
Assert.assertArrayEquals(new Object[] {"Alice", 0L}, descRows.get(2));
}

@Test
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these queries do not use the execution engine itself, so we do not have to put these tests to this class.
It is ok to have them only in the TestHiveIcebergStorageHandlerLocalScan.java

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, i'll remove the test code in TestHiveIcebergStorageHandlerLocalScan.java

@pvary
Copy link
Contributor

pvary commented Jan 8, 2021

@pvary, i just pushed a commit, could you please take a look?

Just answered 😄
Since the queries in the new test cases do not use the execution engine itself (they are using only local tasks), we do need them in both classes. It is enough to have them in the TestHiveIcebergStorageHandlerLocalScan.java

@dixingxing0
Copy link
Contributor Author

@pvary, i just pushed a commit, could you please take a look?

Just answered 😄
Since the queries in the new test cases do not use the execution engine itself (they are using only local tasks), we do need them in both classes. It is enough to have them in the TestHiveIcebergStorageHandlerLocalScan.java

Thanks, i've removed the test code in TestHiveIcebergStorageHandlerWithEngine.

@@ -80,12 +80,14 @@ public void initialize(@Nullable Configuration configuration, Properties serDePr
tableSchema = hiveSchemaOrThrow(serDeProperties, e);
}
}
configuration.set(InputFormatConfig.CASE_SENSITIVE, "false");
Copy link
Contributor

@pvary pvary Jan 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

configuration.setBoolean(InputFormatConfig.CASE_SENSITIVE, false);

Types.NestedField fieldInLowercase = Types.NestedField.of(field.fieldId(), field.isOptional(),
field.name().toLowerCase(), field.type(), field.doc());
structField = new IcebergRecordStructField(fieldInLowercase, oi, position);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: After if blocks we tend to leave an extra line

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it!

Comment on lines 192 to 193
this.caseSensitive = conf.getBoolean(InputFormatConfig.CASE_SENSITIVE,
InputFormatConfig.CASE_SENSITIVE_DEFAULT);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: This can be a single line

if (selectedColumns == null) {
return tableSchema;
}
return caseSensitive ? tableSchema.select(selectedColumns) : tableSchema.caseInsensitiveSelect(selectedColumns);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Extra line

@pvary
Copy link
Contributor

pvary commented Jan 8, 2021

@pvary, i just pushed a commit, could you please take a look?

Just answered 😄
Since the queries in the new test cases do not use the execution engine itself (they are using only local tasks), we do need them in both classes. It is enough to have them in the TestHiveIcebergStorageHandlerLocalScan.java

Thanks, i've removed the test code in TestHiveIcebergStorageHandlerWithEngine.

Thanks @dixingxing0 for the patch!

Few last nits and then I think it is good.
+1 non-binding

@rdblue: Could you please review?

Thanks,
Peter

@dixingxing0
Copy link
Contributor Author

@pvary, thank you for your patience, i will be more careful next time.

return create(schema, true);
}

public static ObjectInspector create(@Nullable Schema schema, boolean caseSensitive) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why accept a case sensitivity setting for something that is only used by Hive? I thought that the purpose of this PR was to allow case sensitive behavior for the InputFormat, but not Hive because Hive's behavior is case insensitive.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, comment addressed.

@rdblue
Copy link
Contributor

rdblue commented Jan 15, 2021

@dixingxing0, this looks good to me now. Could you resolve the conflicts? I can merge after that. Thanks!

@dixingxing0
Copy link
Contributor Author

dixingxing0 commented Jan 16, 2021

@dixingxing0, this looks good to me now. Could you resolve the conflicts? I can merge after that. Thanks!

Conflincts resolved.

@rdblue rdblue merged commit b1de539 into apache:master Jan 16, 2021
@rdblue
Copy link
Contributor

rdblue commented Jan 16, 2021

Thanks, @dixingxing0!

@dixingxing0
Copy link
Contributor Author

@pvary @rdblue Thanks for the review and all the help 😄 .

XuQianJin-Stars pushed a commit to XuQianJin-Stars/iceberg that referenced this pull request Mar 22, 2021
Co-authored-by: dixingxing <dixingxing@autohome.com.cn>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Hive: query failed when iceberg table has upper case column
3 participants