-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hive: Support case insensitive in hive query #2053
Conversation
Schema projectedSchema = selectedColumns.length > 0 ? tableSchema.select(selectedColumns) : tableSchema; | ||
Schema projectedSchema = tableSchema; | ||
|
||
boolean caseSensitive = configuration.getBoolean(InputFormatConfig.CASE_SENSITIVE, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand that we want to have a possibility to configure the IcebergInputFormat to be case sensitive or case insensitive since it can be used by other MR jobs as well. Do we want to allow the users of Hive to shot themselves on the foot and enable case sensitivity?
My first guess would be that we should not use the configuration here, just go with false
, but if you have some specific use-case in your mind I can be easily convinced 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with you, i'll make here go with false
, i don't have a specific use-case.
Since we make hive go with false
, i think iceberg.mr.case.sensitive should use true
as a default value, is that ok?
|
||
private final List<IcebergRecordStructField> structFields; | ||
private final List<IcebergRecordStructField> structFieldsInLowercase; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we keep both of the fields? Shouldn't we just keep the currently requested version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right! I'll change it.
…as default value for iceberg.mr.case.sensitive
@pvary, i just pushed a commit, could you please take a look? |
@@ -150,6 +150,27 @@ public void testScanTable() throws IOException { | |||
Assert.assertArrayEquals(new Object[] {"Alice", 0L}, descRows.get(2)); | |||
} | |||
|
|||
@Test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think these queries do not use the execution engine itself, so we do not have to put these tests to this class.
It is ok to have them only in the TestHiveIcebergStorageHandlerLocalScan.java
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, i'll remove the test code in TestHiveIcebergStorageHandlerLocalScan.java
Just answered 😄 |
Thanks, i've removed the test code in TestHiveIcebergStorageHandlerWithEngine. |
@@ -80,12 +80,14 @@ public void initialize(@Nullable Configuration configuration, Properties serDePr | |||
tableSchema = hiveSchemaOrThrow(serDeProperties, e); | |||
} | |||
} | |||
configuration.set(InputFormatConfig.CASE_SENSITIVE, "false"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
configuration.setBoolean(InputFormatConfig.CASE_SENSITIVE, false);
Types.NestedField fieldInLowercase = Types.NestedField.of(field.fieldId(), field.isOptional(), | ||
field.name().toLowerCase(), field.type(), field.doc()); | ||
structField = new IcebergRecordStructField(fieldInLowercase, oi, position); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: After if blocks we tend to leave an extra line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it!
this.caseSensitive = conf.getBoolean(InputFormatConfig.CASE_SENSITIVE, | ||
InputFormatConfig.CASE_SENSITIVE_DEFAULT); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: This can be a single line
if (selectedColumns == null) { | ||
return tableSchema; | ||
} | ||
return caseSensitive ? tableSchema.select(selectedColumns) : tableSchema.caseInsensitiveSelect(selectedColumns); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Extra line
Thanks @dixingxing0 for the patch! Few last nits and then I think it is good. @rdblue: Could you please review? Thanks, |
@pvary, thank you for your patience, i will be more careful next time. |
return create(schema, true); | ||
} | ||
|
||
public static ObjectInspector create(@Nullable Schema schema, boolean caseSensitive) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why accept a case sensitivity setting for something that is only used by Hive? I thought that the purpose of this PR was to allow case sensitive behavior for the InputFormat, but not Hive because Hive's behavior is case insensitive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, comment addressed.
@dixingxing0, this looks good to me now. Could you resolve the conflicts? I can merge after that. Thanks! |
Conflincts resolved. |
Thanks, @dixingxing0! |
Co-authored-by: dixingxing <dixingxing@autohome.com.cn>
Fixes #2045.
Since hive is case insensitive, we will make HiveIcebergSerDe always case insensitive by set
iceberg.mr.case.sensitive
tofalse
.@pvary, could you review this please?