HIVE-13748: TypeInfoParser cannot handle symbols in the field name of STRUCT#5767
HIVE-13748: TypeInfoParser cannot handle symbols in the field name of STRUCT#5767okumin wants to merge 2 commits intoapache:masterfrom
Conversation
@check-spelling-bot Report🔴 Please reviewSee the files view or the action log for details. Unrecognized words (2)DFB Previously acknowledged words that are now absentaarry bytecode HIVEFETCHOUTPUTSERDE timestamplocal yyyyTo accept these unrecognized words as correct (and remove the previously acknowledged and now absent words), run the following commands... in a clone of the git@github.com:okumin/hive.git repository If the flagged items do not appear to be textIf items relate to a ...
|
|
|
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. |



What changes were proposed in this pull request?
https://issues.apache.org/jira/browse/HIVE-13748
I assume the STRUCT type of Hive derives from the ROW type of ANSI SQL. Based on "4.10 Row types" of SQL:2023 part 2, it is a sequence of (, ), where "field name" is any identifier. It is consistent with our parser's definition. "6.2 " and "5.4 Names and identifiers" include the syntax rule, and I don't see any restrictions on the content.
The approach is still controversial. If we follow the ANSI standard, we should accept any identifier. My first draft is slightly more defensive, allowing characters not to be used by type definitions.
To be perfect, we have to reimplement the type parser and ensure all Hive codes correctly serialize and deserialize type definitions.
Why are the changes needed?
It's possible that Hive can't read Iceberg tables written by other engines.
Does this PR introduce any user-facing change?
Our STRUCT type will be more generic.
Is the change a dependency upgrade?
No.
How was this patch tested?
Added unit tests and integration tests.