-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[fix](parquet) return error if schema changed in complex types #31128
Conversation
Thank you for your contribution to Apache Doris. |
run buildall |
clang-tidy review says "All clean, LGTM! 👍" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
TPC-H: Total hot run time: 41832 ms
|
PR approved by at least one committer and no changes requested. |
PR approved by anyone and no changes requested. |
TeamCity be ut coverage result: |
TPC-DS: Total hot run time: 177438 ms
|
ClickBench: Total hot run time: 30.5 s
|
Load test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
|
Check the column type of complex type to prevent core dump in BE. ColumnReader will throw segmentation fault in the following case: Change complex types in hive: hive> create table struct_test( id int, sf struct<f1: int, f2: map<string, string>>) stored as parquet; hive> insert into struct_test values (1, named_struct('f1', 1, 'f2', str_to_map('1:s2,2:s2'))), (2, named_struct('f1', 2, 'f2', str_to_map('k1:s3,k2:s4'))), (3, named_struct('f1', 3, 'f2', str_to_map('k1:s5,k2:s6'))); hive> alter table struct_test change sf sf struct<f1:int, f2: string>;
Followup: #31128 This optimization allows doris to correctly read struct type data after changing the schema from hive. ## Changing struct schema in hive: ```sql hive> create table struct_test(id int,sf struct<f1: int, f2: string>) stored as parquet; hive> insert into struct_test values > (1, named_struct('f1', 1, 'f2', 's1')), > (2, named_struct('f1', 2, 'f2', 's2')), > (3, named_struct('f1', 3, 'f2', 's3')); hive> alter table struct_test change sf sf struct<f1:int, f3:string>; hive> select * from struct_test; OK 1 {"f1":1,"f3":null} 2 {"f1":2,"f3":null} 3 {"f1":3,"f3":null} Time taken: 5.298 seconds, Fetched: 3 row(s) ``` The previous result of doris was: ```sql mysql> select * from struct_test; +------+-----------------------+ | id | sf | +------+-----------------------+ | 1 | {"f1": 1, "f3": "s1"} | | 2 | {"f1": 2, "f3": "s2"} | | 3 | {"f1": 3, "f3": "s3"} | +------+-----------------------+ ``` Now the result is same as hive: ```sql mysql> select * from struct_test; +------+-----------------------+ | id | sf | +------+-----------------------+ | 1 | {"f1": 1, "f3": null} | | 2 | {"f1": 2, "f3": null} | | 3 | {"f1": 3, "f3": null} | +------+-----------------------+ ```
Followup: #31128 This optimization allows doris to correctly read struct type data after changing the schema from hive. ## Changing struct schema in hive: ```sql hive> create table struct_test(id int,sf struct<f1: int, f2: string>) stored as parquet; hive> insert into struct_test values > (1, named_struct('f1', 1, 'f2', 's1')), > (2, named_struct('f1', 2, 'f2', 's2')), > (3, named_struct('f1', 3, 'f2', 's3')); hive> alter table struct_test change sf sf struct<f1:int, f3:string>; hive> select * from struct_test; OK 1 {"f1":1,"f3":null} 2 {"f1":2,"f3":null} 3 {"f1":3,"f3":null} Time taken: 5.298 seconds, Fetched: 3 row(s) ``` The previous result of doris was: ```sql mysql> select * from struct_test; +------+-----------------------+ | id | sf | +------+-----------------------+ | 1 | {"f1": 1, "f3": "s1"} | | 2 | {"f1": 2, "f3": "s2"} | | 3 | {"f1": 3, "f3": "s3"} | +------+-----------------------+ ``` Now the result is same as hive: ```sql mysql> select * from struct_test; +------+-----------------------+ | id | sf | +------+-----------------------+ | 1 | {"f1": 1, "f3": null} | | 2 | {"f1": 2, "f3": null} | | 3 | {"f1": 3, "f3": null} | +------+-----------------------+ ```
…type (apache#31245) Add regression test for apache#31128
Proposed changes
Check the column type of complex type to prevent core dump in BE.
ColumnReader
will throw segmentation fault in the following case:Change complex types in hive:
Further comments
If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...