Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[update](hudi) update hudi version to 0.14.1 and compatible with flink hive catalog #31181

Merged
merged 3 commits into from
Feb 22, 2024

Conversation

AshinGau
Copy link
Member

@AshinGau AshinGau commented Feb 21, 2024

Proposed changes

  1. Update hudi version from 0.13.1 to .14.1
  2. Compatible with the hudi table created by flink hive catalog

flink hive catalog

The hudi table create by flink hive catalog has wrong schema and inputformat, related issue(apache/hudi#10735)

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

@AshinGau
Copy link
Member Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 41289 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 1781c32cda9c765e3d444ea7affcc523b6ee660e, data reload: false

------ Round 1 ----------------------------------
q1	17615	4946	4849	4849
q2	2044	145	136	136
q3	10699	1012	1034	1012
q4	4722	957	958	957
q5	7712	3212	3257	3212
q6	197	134	134	134
q7	1261	770	761	761
q8	9598	2018	2095	2018
q9	7803	6662	6652	6652
q10	8356	2652	2631	2631
q11	409	222	225	222
q12	796	329	338	329
q13	18001	3628	3651	3628
q14	294	262	256	256
q15	620	540	501	501
q16	506	396	412	396
q17	933	852	844	844
q18	7226	6671	6625	6625
q19	1534	1481	1483	1481
q20	627	352	350	350
q21	6447	3956	3968	3956
q22	868	349	339	339
Total cold run time: 108268 ms
Total hot run time: 41289 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4818	4822	4791	4791
q2	293	182	178	178
q3	3580	3561	3552	3552
q4	2501	2488	2491	2488
q5	5711	5713	5735	5713
q6	210	126	128	126
q7	2226	1660	1625	1625
q8	2992	3044	3057	3044
q9	8686	8696	8661	8661
q10	6883	4231	4230	4230
q11	523	390	382	382
q12	770	540	540	540
q13	4144	3414	3409	3409
q14	258	242	242	242
q15	595	505	501	501
q16	476	462	437	437
q17	1652	1568	1579	1568
q18	8285	7524	7673	7524
q19	1628	1630	1627	1627
q20	2116	1834	1841	1834
q21	6582	6158	6105	6105
q22	562	502	500	500
Total cold run time: 65491 ms
Total hot run time: 59077 ms

Copy link
Contributor

@morningman morningman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions github-actions bot added approved Indicates a PR has been approved by one committer. reviewed labels Feb 22, 2024
Copy link
Contributor

PR approved by anyone and no changes requested.

@hello-stephen
Copy link
Contributor

run performance

@doris-robot
Copy link

TPC-H: Total hot run time: 41123 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 1781c32cda9c765e3d444ea7affcc523b6ee660e, data reload: false

------ Round 1 ----------------------------------
q1	17638	5104	4974	4974
q2	2043	138	134	134
q3	10586	1003	994	994
q4	4659	964	961	961
q5	7619	3147	3195	3147
q6	192	133	136	133
q7	1280	792	767	767
q8	9340	2050	2039	2039
q9	7396	6548	6593	6548
q10	8324	2642	2646	2642
q11	420	207	222	207
q12	789	334	340	334
q13	17942	3638	3606	3606
q14	292	256	264	256
q15	578	520	499	499
q16	473	405	415	405
q17	919	852	857	852
q18	7459	6546	6562	6546
q19	1526	1483	1494	1483
q20	534	277	270	270
q21	6872	4008	3989	3989
q22	881	349	337	337
Total cold run time: 107762 ms
Total hot run time: 41123 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4944	4992	4951	4951
q2	290	187	184	184
q3	3619	3600	3603	3600
q4	2574	2564	2556	2556
q5	5726	5734	5751	5734
q6	209	131	133	131
q7	2259	1651	1686	1651
q8	3033	3103	3122	3103
q9	8708	8744	8679	8679
q10	6750	4229	4240	4229
q11	528	381	393	381
q12	766	540	534	534
q13	6691	3395	3436	3395
q14	260	247	246	246
q15	626	539	492	492
q16	470	433	431	431
q17	1658	1635	1617	1617
q18	8343	7683	7603	7603
q19	1642	1638	1622	1622
q20	2138	1845	1845	1845
q21	6599	6218	6183	6183
q22	581	489	498	489
Total cold run time: 68414 ms
Total hot run time: 59656 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 176899 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 1781c32cda9c765e3d444ea7affcc523b6ee660e, data reload: false

query1	940	367	363	363
query2	6518	1833	1843	1833
query3	6695	205	199	199
query4	23469	21345	21324	21324
query5	4205	380	384	380
query6	264	180	172	172
query7	4601	298	287	287
query8	248	188	196	188
query9	8466	2854	2854	2854
query10	425	227	227	227
query11	15156	14692	14496	14496
query12	144	83	85	83
query13	1734	411	410	410
query14	9398	7565	7611	7565
query15	215	186	186	186
query16	7472	252	244	244
query17	1419	557	528	528
query18	1964	274	269	269
query19	201	146	149	146
query20	87	83	84	83
query21	187	132	126	126
query22	4956	4784	4665	4665
query23	32428	31511	31663	31511
query24	12300	3364	3497	3364
query25	645	373	365	365
query26	1865	157	164	157
query27	3042	328	318	318
query28	6854	1846	1837	1837
query29	1142	623	629	623
query30	279	144	152	144
query31	929	745	753	745
query32	95	58	61	58
query33	729	239	236	236
query34	1014	492	504	492
query35	963	859	841	841
query36	954	837	885	837
query37	140	60	60	60
query38	3309	3159	3176	3159
query39	1375	1364	1336	1336
query40	289	109	103	103
query41	40	36	35	35
query42	111	106	100	100
query43	473	446	461	446
query44	1085	685	703	685
query45	197	180	184	180
query46	1069	778	780	778
query47	1644	1551	1532	1532
query48	428	350	347	347
query49	1197	319	303	303
query50	771	385	383	383
query51	4491	4369	4301	4301
query52	106	96	97	96
query53	396	304	297	297
query54	306	231	232	231
query55	85	75	75	75
query56	221	212	202	202
query57	1055	981	923	923
query58	206	192	203	192
query59	2332	2216	2094	2094
query60	242	225	236	225
query61	83	82	82	82
query62	587	371	378	371
query63	332	280	286	280
query64	6489	3105	3134	3105
query65	3364	3298	3291	3291
query66	1363	331	332	331
query67	14582	14350	14033	14033
query68	5070	564	551	551
query69	526	364	355	355
query70	1299	1141	1182	1141
query71	432	259	261	259
query72	6297	2769	2595	2595
query73	706	307	308	307
query74	6862	6487	6401	6401
query75	3334	2644	2596	2596
query76	3296	1119	1222	1119
query77	367	242	240	240
query78	9754	8885	8936	8885
query79	984	502	500	500
query80	520	358	360	358
query81	445	216	214	214
query82	168	88	84	84
query83	143	133	120	120
query84	225	85	77	77
query85	1029	346	338	338
query86	307	299	299	299
query87	3468	3326	3302	3302
query88	2782	2293	2313	2293
query89	436	357	348	348
query90	1994	166	165	165
query91	155	128	130	128
query92	53	53	50	50
query93	1021	519	500	500
query94	1154	182	178	178
query95	436	343	341	341
query96	581	270	264	264
query97	4472	4295	4271	4271
query98	221	202	200	200
query99	1059	730	688	688
Total cold run time: 269967 ms
Total hot run time: 176899 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 31.15 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 1781c32cda9c765e3d444ea7affcc523b6ee660e, data reload: false

query1	0.03	0.03	0.02
query2	0.07	0.03	0.03
query3	0.23	0.08	0.07
query4	1.61	0.08	0.09
query5	0.50	0.49	0.48
query6	1.36	0.62	0.62
query7	0.02	0.01	0.02
query8	0.04	0.03	0.03
query9	0.52	0.45	0.44
query10	0.49	0.48	0.48
query11	0.13	0.10	0.10
query12	0.12	0.10	0.10
query13	0.61	0.58	0.58
query14	0.77	0.79	0.80
query15	0.82	0.79	0.79
query16	0.33	0.33	0.34
query17	0.93	0.91	0.94
query18	0.17	0.16	0.21
query19	1.80	1.70	1.71
query20	0.01	0.02	0.01
query21	15.41	0.63	0.55
query22	2.85	3.92	2.49
query23	17.64	1.16	1.05
query24	2.17	0.28	0.30
query25	0.62	0.06	0.05
query26	0.16	0.13	0.14
query27	0.06	0.05	0.06
query28	12.16	0.84	0.82
query29	12.53	3.48	3.39
query30	0.54	0.50	0.47
query31	2.78	0.35	0.36
query32	3.33	0.48	0.48
query33	3.11	3.14	3.12
query34	15.33	4.48	4.45
query35	4.51	4.51	4.49
query36	1.07	0.96	0.96
query37	0.07	0.05	0.05
query38	0.04	0.03	0.03
query39	0.02	0.02	0.01
query40	0.17	0.15	0.15
query41	0.07	0.02	0.01
query42	0.02	0.02	0.02
query43	0.03	0.02	0.02
Total cold run time: 105.25 s
Total hot run time: 31.15 s

@doris-robot
Copy link

Load test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'

Load test result on commit 1781c32cda9c765e3d444ea7affcc523b6ee660e with default session variables
Stream load json:         20 seconds loaded 2358488459 Bytes, about 112 MB/s
Stream load orc:          59 seconds loaded 1101869774 Bytes, about 17 MB/s
Stream load parquet:      32 seconds loaded 861443392 Bytes, about 25 MB/s
Insert into select:       13.7 seconds inserted 10000000 Rows, about 729K ops/s

@morningman morningman merged commit f8202f9 into apache:master Feb 22, 2024
27 of 29 checks passed
yiguolei pushed a commit that referenced this pull request Feb 22, 2024
…k hive catalog (#31181)

1. Update hudi version from 0.13.1 to .14.1
2. Compatible with the hudi table created by flink hive catalog
feiniaofeiafei pushed a commit to feiniaofeiafei/doris that referenced this pull request Feb 23, 2024
…k hive catalog (apache#31181)

1. Update hudi version from 0.13.1 to .14.1
2. Compatible with the hudi table created by flink hive catalog
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants