Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix](parquet) fix parquet reader missing column and filter missing column #36189

Merged
merged 3 commits into from
Jun 19, 2024

Conversation

AshinGau
Copy link
Member

Proposed changes

follow #35583, fix parquet reader.

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

@AshinGau
Copy link
Member Author

run buildall

Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 36.41% (8986/24681)
Line Coverage: 27.95% (73484/262876)
Region Coverage: 27.40% (38110/139099)
Branch Coverage: 24.01% (19348/80580)
Coverage Report: http://coverage.selectdb-in.cc/coverage/e4681f458b400a39458a5f76bf04ef672e99c205_e4681f458b400a39458a5f76bf04ef672e99c205/report/index.html

@doris-robot
Copy link

TPC-H: Total hot run time: 40544 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit e4681f458b400a39458a5f76bf04ef672e99c205, data reload: false

------ Round 1 ----------------------------------
q1	18118	4530	4469	4469
q2	2068	199	197	197
q3	10441	1092	1198	1092
q4	10191	768	858	768
q5	7490	2707	2632	2632
q6	224	138	137	137
q7	970	619	595	595
q8	9224	2050	2084	2050
q9	9064	6467	6437	6437
q10	8862	3805	3672	3672
q11	437	238	235	235
q12	418	228	229	228
q13	18696	3026	2969	2969
q14	256	227	222	222
q15	510	475	475	475
q16	508	396	378	378
q17	977	687	687	687
q18	7918	7571	7431	7431
q19	6873	1405	1525	1405
q20	660	308	320	308
q21	4906	3823	3932	3823
q22	400	335	334	334
Total cold run time: 119211 ms
Total hot run time: 40544 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4349	4255	4275	4255
q2	374	272	260	260
q3	2941	2769	2717	2717
q4	1864	1622	1655	1622
q5	5291	5292	5284	5284
q6	216	125	126	125
q7	2156	1738	1747	1738
q8	3175	3318	3337	3318
q9	8276	8342	8337	8337
q10	3918	3656	3676	3656
q11	597	484	487	484
q12	783	631	594	594
q13	16548	3003	2964	2964
q14	274	266	260	260
q15	508	477	485	477
q16	470	409	419	409
q17	1757	1495	1501	1495
q18	7518	7618	7324	7324
q19	1687	1597	1501	1501
q20	1987	1788	1753	1753
q21	4771	4751	4688	4688
q22	642	551	537	537
Total cold run time: 70102 ms
Total hot run time: 53798 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 172482 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit e4681f458b400a39458a5f76bf04ef672e99c205, data reload: false

query1	937	386	372	372
query2	6458	2442	2317	2317
query3	6664	216	212	212
query4	21972	17276	17234	17234
query5	4156	478	463	463
query6	260	164	159	159
query7	4587	311	297	297
query8	303	303	298	298
query9	8462	2362	2358	2358
query10	609	298	282	282
query11	10584	10162	9958	9958
query12	136	83	95	83
query13	1624	371	365	365
query14	10224	6878	7179	6878
query15	248	188	189	188
query16	7545	271	260	260
query17	1473	542	530	530
query18	1804	272	270	270
query19	193	157	157	157
query20	90	79	81	79
query21	208	129	134	129
query22	4220	4244	3989	3989
query23	33754	33012	33159	33012
query24	12222	2998	2793	2793
query25	671	386	383	383
query26	1834	156	201	156
query27	2934	324	331	324
query28	7572	2045	2038	2038
query29	1074	613	604	604
query30	270	150	148	148
query31	931	730	750	730
query32	87	55	57	55
query33	760	279	281	279
query34	1013	479	478	478
query35	733	631	624	624
query36	1096	896	956	896
query37	273	69	71	69
query38	2873	2742	2732	2732
query39	853	800	805	800
query40	277	124	123	123
query41	59	52	51	51
query42	124	94	108	94
query43	581	535	569	535
query44	1274	729	740	729
query45	195	166	168	166
query46	1089	729	709	709
query47	1827	1754	1783	1754
query48	384	301	295	295
query49	1221	422	402	402
query50	779	392	379	379
query51	6833	6674	6577	6577
query52	99	96	92	92
query53	367	293	292	292
query54	1003	445	439	439
query55	74	77	77	77
query56	283	252	260	252
query57	1163	1062	1127	1062
query58	257	229	257	229
query59	3311	3233	3171	3171
query60	281	268	268	268
query61	95	92	112	92
query62	636	437	463	437
query63	312	290	290	290
query64	9961	2261	1732	1732
query65	3165	3135	3134	3134
query66	1356	343	352	343
query67	15210	15014	14952	14952
query68	4522	546	554	546
query69	542	417	358	358
query70	1138	1135	1149	1135
query71	406	267	280	267
query72	7122	5398	5371	5371
query73	770	330	328	328
query74	5926	5509	5490	5490
query75	3387	2673	2667	2667
query76	2816	893	954	893
query77	641	293	296	293
query78	10169	9786	9894	9786
query79	2645	521	525	521
query80	1331	460	516	460
query81	550	225	222	222
query82	760	107	105	105
query83	202	169	169	169
query84	276	86	88	86
query85	1415	288	269	269
query86	461	328	321	321
query87	3237	3086	3060	3060
query88	4375	2459	2441	2441
query89	471	381	387	381
query90	1879	193	193	193
query91	128	102	99	99
query92	61	49	49	49
query93	3280	520	497	497
query94	1211	192	188	188
query95	399	314	309	309
query96	602	274	273	273
query97	3271	3044	3009	3009
query98	224	206	194	194
query99	1315	827	834	827
Total cold run time: 279455 ms
Total hot run time: 172482 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.4 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit e4681f458b400a39458a5f76bf04ef672e99c205, data reload: false

query1	0.05	0.03	0.03
query2	0.08	0.04	0.04
query3	0.23	0.05	0.05
query4	1.67	0.08	0.08
query5	0.51	0.50	0.50
query6	1.14	0.75	0.72
query7	0.02	0.02	0.01
query8	0.05	0.04	0.04
query9	0.54	0.47	0.49
query10	0.54	0.56	0.54
query11	0.15	0.11	0.11
query12	0.15	0.11	0.11
query13	0.60	0.60	0.60
query14	0.78	0.79	0.79
query15	0.83	0.80	0.81
query16	0.36	0.36	0.36
query17	1.01	0.96	0.95
query18	0.23	0.23	0.22
query19	1.82	1.68	1.70
query20	0.02	0.01	0.01
query21	15.43	0.67	0.66
query22	4.39	7.31	1.71
query23	18.30	1.42	1.30
query24	2.14	0.22	0.23
query25	0.16	0.08	0.08
query26	0.27	0.17	0.16
query27	0.08	0.07	0.09
query28	13.16	1.01	1.00
query29	12.58	3.38	3.34
query30	0.26	0.07	0.06
query31	2.87	0.39	0.38
query32	3.27	0.48	0.47
query33	2.87	2.95	2.90
query34	17.14	4.46	4.42
query35	4.51	4.51	4.49
query36	0.65	0.49	0.46
query37	0.18	0.16	0.15
query38	0.16	0.15	0.14
query39	0.05	0.03	0.04
query40	0.18	0.14	0.14
query41	0.10	0.05	0.04
query42	0.05	0.04	0.05
query43	0.04	0.03	0.04
Total cold run time: 109.62 s
Total hot run time: 30.4 s

Copy link
Contributor

@morningman morningman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to add cases

@AshinGau AshinGau force-pushed the parquet_missing_m branch from e4681f4 to 2031437 Compare June 18, 2024 07:09
@AshinGau
Copy link
Member Author

run buildall

Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 36.47% (9001/24679)
Line Coverage: 28.01% (73732/263191)
Region Coverage: 27.48% (38294/139332)
Branch Coverage: 24.19% (19519/80706)
Coverage Report: http://coverage.selectdb-in.cc/coverage/203143757498371927e6ec99be9b1c1f16e7aec4_203143757498371927e6ec99be9b1c1f16e7aec4/report/index.html

@AshinGau
Copy link
Member Author

run buildall

Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 36.48% (9002/24679)
Line Coverage: 28.01% (73732/263194)
Region Coverage: 27.48% (38293/139334)
Branch Coverage: 24.18% (19516/80708)
Coverage Report: http://coverage.selectdb-in.cc/coverage/25ae4146bb2e5bb18171a657df46411ce64c3594_25ae4146bb2e5bb18171a657df46411ce64c3594/report/index.html

@doris-robot
Copy link

TPC-H: Total hot run time: 40494 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 25ae4146bb2e5bb18171a657df46411ce64c3594, data reload: false

------ Round 1 ----------------------------------
q1	17626	4328	4277	4277
q2	2029	192	191	191
q3	10455	1107	1085	1085
q4	10185	767	792	767
q5	7459	2633	2682	2633
q6	217	140	135	135
q7	947	617	603	603
q8	9221	2062	2042	2042
q9	8771	6488	6479	6479
q10	8983	3738	3725	3725
q11	450	234	234	234
q12	465	228	224	224
q13	17781	2993	2985	2985
q14	266	217	230	217
q15	519	484	486	484
q16	524	372	378	372
q17	957	668	663	663
q18	8015	7400	7434	7400
q19	6661	1495	1529	1495
q20	646	330	317	317
q21	4899	3830	3871	3830
q22	385	336	340	336
Total cold run time: 117461 ms
Total hot run time: 40494 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4344	4211	4262	4211
q2	365	265	265	265
q3	2985	2928	2933	2928
q4	2008	1659	1723	1659
q5	5597	5531	5448	5448
q6	233	135	130	130
q7	2238	1872	1842	1842
q8	3253	3440	3431	3431
q9	8693	8747	8737	8737
q10	4152	3732	3793	3732
q11	593	530	506	506
q12	808	642	618	618
q13	16101	3156	3157	3156
q14	310	295	275	275
q15	541	466	494	466
q16	507	431	440	431
q17	1807	1537	1484	1484
q18	8025	7976	7728	7728
q19	1854	1593	1679	1593
q20	3160	1927	1859	1859
q21	5052	5046	4736	4736
q22	649	556	608	556
Total cold run time: 73275 ms
Total hot run time: 55791 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 173787 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 25ae4146bb2e5bb18171a657df46411ce64c3594, data reload: false

query1	924	387	381	381
query2	6428	2504	2473	2473
query3	6640	207	208	207
query4	19151	17402	17295	17295
query5	3618	465	503	465
query6	234	155	163	155
query7	4574	305	290	290
query8	322	311	299	299
query9	8567	2375	2364	2364
query10	561	316	287	287
query11	10498	10119	10012	10012
query12	119	92	84	84
query13	1643	370	368	368
query14	9737	6966	7493	6966
query15	235	194	191	191
query16	7752	277	263	263
query17	1829	582	512	512
query18	1941	272	281	272
query19	206	149	151	149
query20	97	83	82	82
query21	209	127	127	127
query22	4370	4000	4032	4000
query23	33883	33544	33745	33544
query24	11103	2917	2974	2917
query25	604	377	377	377
query26	745	157	157	157
query27	2339	338	314	314
query28	6090	2081	2085	2081
query29	898	657	631	631
query30	253	159	154	154
query31	961	742	771	742
query32	90	53	52	52
query33	745	283	276	276
query34	981	479	466	466
query35	770	620	626	620
query36	1150	975	1016	975
query37	136	76	71	71
query38	2920	2836	2803	2803
query39	924	860	834	834
query40	212	128	129	128
query41	56	54	59	54
query42	110	112	104	104
query43	622	562	537	537
query44	1179	724	743	724
query45	189	165	166	165
query46	1056	737	707	707
query47	1877	1769	1789	1769
query48	365	296	291	291
query49	845	397	396	396
query50	765	381	379	379
query51	6778	6737	6638	6638
query52	100	91	93	91
query53	351	292	287	287
query54	901	493	430	430
query55	74	72	73	72
query56	287	246	260	246
query57	1103	1054	1037	1037
query58	251	262	256	256
query59	3478	3407	3250	3250
query60	284	271	292	271
query61	96	92	91	91
query62	619	437	437	437
query63	315	288	286	286
query64	8506	2291	1773	1773
query65	3175	3116	3082	3082
query66	731	322	328	322
query67	15534	15102	14908	14908
query68	4470	526	536	526
query69	550	414	389	389
query70	1117	1130	1147	1130
query71	397	271	276	271
query72	6945	6085	5502	5502
query73	759	319	323	319
query74	5836	5481	5549	5481
query75	3473	2639	2715	2639
query76	2726	978	951	951
query77	430	298	303	298
query78	10259	9779	9815	9779
query79	2529	507	502	502
query80	980	463	458	458
query81	552	218	218	218
query82	996	101	99	99
query83	262	165	230	165
query84	228	89	87	87
query85	1164	284	263	263
query86	450	315	326	315
query87	3248	3113	3075	3075
query88	3987	2317	2303	2303
query89	457	389	383	383
query90	1733	191	188	188
query91	124	97	99	97
query92	56	49	51	49
query93	1673	499	493	493
query94	1111	187	187	187
query95	402	307	313	307
query96	591	271	258	258
query97	3195	3104	3053	3053
query98	222	195	198	195
query99	1131	840	820	820
Total cold run time: 266159 ms
Total hot run time: 173787 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.35 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 25ae4146bb2e5bb18171a657df46411ce64c3594, data reload: false

query1	0.04	0.03	0.03
query2	0.07	0.04	0.04
query3	0.22	0.05	0.05
query4	1.68	0.07	0.07
query5	0.50	0.48	0.48
query6	1.13	0.74	0.73
query7	0.02	0.01	0.01
query8	0.05	0.04	0.05
query9	0.55	0.48	0.49
query10	0.54	0.55	0.54
query11	0.17	0.12	0.12
query12	0.15	0.12	0.13
query13	0.59	0.59	0.59
query14	0.76	0.78	0.79
query15	0.82	0.83	0.82
query16	0.36	0.37	0.36
query17	1.01	1.01	1.00
query18	0.22	0.25	0.24
query19	1.75	1.70	1.85
query20	0.01	0.01	0.01
query21	15.41	0.66	0.65
query22	4.12	7.85	1.83
query23	18.26	1.30	1.25
query24	2.16	0.22	0.22
query25	0.15	0.08	0.09
query26	0.26	0.18	0.18
query27	0.07	0.07	0.08
query28	13.14	1.02	0.99
query29	12.65	3.24	3.18
query30	0.26	0.06	0.05
query31	2.88	0.38	0.38
query32	3.29	0.47	0.47
query33	2.90	2.87	2.84
query34	17.07	4.44	4.41
query35	4.50	4.50	4.50
query36	0.65	0.46	0.45
query37	0.17	0.16	0.15
query38	0.16	0.14	0.14
query39	0.05	0.04	0.03
query40	0.17	0.14	0.14
query41	0.09	0.05	0.05
query42	0.05	0.05	0.05
query43	0.04	0.03	0.05
Total cold run time: 109.14 s
Total hot run time: 30.35 s

Copy link
Contributor

@morningman morningman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Jun 19, 2024
Copy link
Contributor

PR approved by anyone and no changes requested.

@AshinGau AshinGau merged commit 6212357 into apache:master Jun 19, 2024
25 of 29 checks passed
dataroaring pushed a commit that referenced this pull request Jun 21, 2024
…olumn (#36189)

## Proposed changes

follow #35583, fix parquet reader.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. dev/2.1.4-merged dev/3.0.0-merged reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants