Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix](inverted index)support merge null_bitmap during index compaction #30326

Merged
merged 1 commit into from
Jan 25, 2024

Conversation

qidaye
Copy link
Contributor

@qidaye qidaye commented Jan 24, 2024

Proposed changes

null_bitmap file is not considered in index compaction process. This will lead wrong query result when doc is contain NULL values.

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

@qidaye
Copy link
Contributor Author

qidaye commented Jan 24, 2024

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 39397 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit a28e55bea653c66fbb7825d10a0b8f2fea6a3d97, data reload: false

------ Round 1 ----------------------------------
q1	18497	5637	5535	5535
q2	2271	149	144	144
q3	11004	1162	1230	1162
q4	10603	765	859	765
q5	7967	3278	3205	3205
q6	203	126	123	123
q7	897	527	497	497
q8	9619	2045	2011	2011
q9	7363	6382	6347	6347
q10	8213	3056	3081	3056
q11	406	210	209	209
q12	357	190	186	186
q13	17994	3372	3385	3372
q14	255	208	220	208
q15	550	511	498	498
q16	448	376	385	376
q17	948	543	503	503
q18	7597	7016	6855	6855
q19	2353	1351	1416	1351
q20	623	307	290	290
q21	2789	2410	2469	2410
q22	367	294	317	294
Total cold run time: 111324 ms
Total hot run time: 39397 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5494	5343	5481	5343
q2	328	224	212	212
q3	3406	3222	3216	3216
q4	2105	2069	2064	2064
q5	6071	5948	5973	5948
q6	199	118	116	116
q7	2286	1922	1860	1860
q8	3225	3415	3375	3375
q9	8947	8904	8855	8855
q10	3905	3845	3796	3796
q11	571	457	441	441
q12	826	656	619	619
q13	16934	3171	3188	3171
q14	278	265	275	265
q15	550	509	511	509
q16	525	461	468	461
q17	1905	1810	1842	1810
q18	9509	17820	9620	9620
q19	24304	1570	1530	1530
q20	4624	1927	1927	1927
q21	14626	5402	5484	5402
q22	1011	539	533	533
Total cold run time: 111629 ms
Total hot run time: 61073 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 186365 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit a28e55bea653c66fbb7825d10a0b8f2fea6a3d97, data reload: false

query1	932	340	334	334
query2	6559	2110	1891	1891
query3	6710	204	201	201
query4	30241	22176	22272	22176
query5	4490	371	379	371
query6	254	154	159	154
query7	4611	261	262	261
query8	222	172	173	172
query9	8319	2500	2498	2498
query10	418	231	220	220
query11	17089	15563	15589	15563
query12	124	71	69	69
query13	1680	382	383	382
query14	10562	6933	7042	6933
query15	214	187	184	184
query16	5786	259	246	246
query17	947	469	471	469
query18	1781	251	251	251
query19	187	130	133	130
query20	66	76	73	73
query21	196	122	137	122
query22	4992	4751	4785	4751
query23	31695	30987	30877	30877
query24	12477	2783	2833	2783
query25	569	309	322	309
query26	1802	141	140	140
query27	3224	286	284	284
query28	7348	1830	1808	1808
query29	2005	632	628	628
query30	281	138	139	138
query31	973	738	761	738
query32	89	48	50	48
query33	707	226	211	211
query34	1132	456	461	456
query35	870	761	740	740
query36	1333	1201	1180	1180
query37	94	61	60	60
query38	3337	3211	3211	3211
query39	1309	1265	1251	1251
query40	346	88	87	87
query41	39	35	34	34
query42	90	81	88	81
query43	508	474	489	474
query44	1137	692	681	681
query45	198	179	174	174
query46	1063	642	649	642
query47	1650	1586	1543	1543
query48	387	306	317	306
query49	1199	287	277	277
query50	682	316	307	307
query51	5367	5177	5218	5177
query52	94	79	77	77
query53	327	262	260	260
query54	244	185	199	185
query55	82	74	81	74
query56	182	168	178	168
query57	982	927	943	927
query58	188	159	155	155
query59	2729	2700	2602	2602
query60	208	184	189	184
query61	84	82	85	82
query62	614	356	346	346
query63	285	261	259	259
query64	6159	1784	1749	1749
query65	3358	3246	3248	3246
query66	1382	322	312	312
query67	15422	15151	14917	14917
query68	12494	550	518	518
query69	607	306	312	306
query70	1700	1422	1519	1422
query71	10477	10198	10210	10198
query72	4915	2830	2817	2817
query73	2578	319	313	313
query74	7028	6404	6410	6404
query75	4958	2324	2310	2310
query76	6687	1042	1028	1028
query77	809	232	226	226
query78	9069	8854	8628	8628
query79	1123	512	490	490
query80	687	328	317	317
query81	452	209	204	204
query82	201	85	82	82
query83	146	121	122	121
query84	277	71	68	68
query85	1084	343	322	322
query86	399	385	380	380
query87	3524	3330	3328	3328
query88	3183	2199	2193	2193
query89	440	365	352	352
query90	2006	186	183	183
query91	157	131	127	127
query92	51	43	44	43
query93	1968	426	424	424
query94	1274	159	162	159
query95	495	451	446	446
query96	608	319	320	319
query97	4265	4118	4144	4118
query98	211	192	180	180
query99	1056	718	700	700
Total cold run time: 305560 ms
Total hot run time: 186365 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.19 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit a28e55bea653c66fbb7825d10a0b8f2fea6a3d97, data reload: false

query1	0.03	0.03	0.03
query2	0.06	0.03	0.03
query3	0.22	0.06	0.05
query4	1.68	0.08	0.07
query5	0.54	0.52	0.52
query6	1.26	0.64	0.65
query7	0.01	0.01	0.01
query8	0.04	0.03	0.03
query9	0.54	0.50	0.51
query10	0.55	0.55	0.56
query11	0.13	0.09	0.08
query12	0.11	0.09	0.09
query13	0.61	0.61	0.60
query14	0.79	0.80	0.79
query15	0.80	0.78	0.78
query16	0.38	0.39	0.40
query17	1.01	0.98	1.02
query18	0.24	0.24	0.25
query19	1.86	1.70	1.74
query20	0.01	0.01	0.02
query21	15.41	0.59	0.59
query22	2.58	2.27	1.19
query23	17.25	0.72	0.73
query24	2.62	1.69	1.02
query25	0.42	0.31	0.16
query26	0.50	0.14	0.14
query27	0.05	0.06	0.04
query28	10.69	0.77	0.77
query29	12.54	3.16	3.09
query30	0.51	0.49	0.47
query31	2.79	0.35	0.35
query32	3.36	0.47	0.48
query33	3.22	3.25	3.21
query34	16.10	4.21	4.27
query35	4.32	4.36	4.34
query36	1.14	1.08	1.08
query37	0.07	0.05	0.05
query38	0.03	0.02	0.02
query39	0.02	0.01	0.02
query40	0.16	0.13	0.13
query41	0.06	0.01	0.02
query42	0.02	0.01	0.02
query43	0.02	0.02	0.02
Total cold run time: 104.75 s
Total hot run time: 30.19 s

@doris-robot
Copy link

Load test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'

Load test result on commit a28e55bea653c66fbb7825d10a0b8f2fea6a3d97 with default session variables
Stream load json:         19 seconds loaded 2358488459 Bytes, about 118 MB/s
Stream load orc:          58 seconds loaded 1101869774 Bytes, about 18 MB/s
Stream load parquet:      31 seconds loaded 861443392 Bytes, about 26 MB/s
Insert into select:       14.8 seconds inserted 10000000 Rows, about 675K ops/s

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 36.27% (8609/23733)
Line Coverage: 28.38% (70474/248315)
Region Coverage: 27.36% (36366/132933)
Branch Coverage: 24.16% (18642/77174)
Coverage Report: http://coverage.selectdb-in.cc/coverage/a28e55bea653c66fbb7825d10a0b8f2fea6a3d97_a28e55bea653c66fbb7825d10a0b8f2fea6a3d97/report/index.html

Copy link
Contributor

@zzzxl1993 zzzxl1993 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

PR approved by anyone and no changes requested.

Copy link
Member

@airborne12 airborne12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@eldenmoon eldenmoon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Jan 25, 2024
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@qidaye qidaye merged commit 0800fdc into apache:master Jan 25, 2024
26 of 28 checks passed
@qidaye qidaye deleted the index_compaction_null_bitmap branch January 25, 2024 04:01
yiguolei pushed a commit that referenced this pull request Jan 25, 2024
#30326)

`null_bitmap` file is not considered in index compaction process. This will lead wrong query result when doc is contain `NULL` values.
mongo360 pushed a commit to mongo360/doris that referenced this pull request Aug 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. dev/2.0.5-merged dev/3.0.0-merged reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants