Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix](es catalog) only es_query function can push down to ES #29320

Merged
merged 2 commits into from
Dec 30, 2023

Conversation

qidaye
Copy link
Contributor

@qidaye qidaye commented Dec 29, 2023

Proposed changes

Issue Number: close #29318

  1. Only push down es_query function to ES
  2. Add null check where ES query result not have _source or fields fields.

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

@qidaye
Copy link
Contributor Author

qidaye commented Dec 29, 2023

run buildall

Copy link
Member

@airborne12 airborne12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

PR approved by anyone and no changes requested.

Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

TPC-H test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'

Tpch sf100 test result on commit 8f6c4b5a0c643b57da5c4957afa647a77bba32ed, data reload: false

run tpch-sf100 query with default conf and session variables
q1	5419	5167	5147	5147
q2	406	168	159	159
q3	1487	1218	1238	1218
q4	1091	813	892	813
q5	3130	3089	3165	3089
q6	247	150	140	140
q7	953	583	539	539
q8	2146	2256	2222	2222
q9	6740	6685	6668	6668
q10	3159	3173	3135	3135
q11	340	235	225	225
q12	397	249	245	245
q13	4398	3638	3654	3638
q14	254	216	238	216
q15	632	579	591	579
q16	473	399	396	396
q17	1038	624	509	509
q18	7085	6818	6881	6818
q19	1672	1596	1491	1491
q20	581	347	485	347
q21	2952	2519	2512	2512
q22	385	330	326	326
Total cold run time: 44985 ms
Total hot run time: 40432 ms

run tpch-sf100 query with default conf and set session variable runtime_filter_mode=off
q1	5148	5118	5124	5118
q2	342	253	275	253
q3	3364	3323	3307	3307
q4	2156	2038	2028	2028
q5	5918	5923	5949	5923
q6	234	131	136	131
q7	2394	1963	1936	1936
q8	3539	3672	3687	3672
q9	9113	9038	9008	9008
q10	3887	3935	3963	3935
q11	614	496	500	496
q12	800	627	640	627
q13	3884	3210	3215	3210
q14	303	264	268	264
q15	630	578	582	578
q16	545	524	520	520
q17	2023	1834	1813	1813
q18	8731	8413	8352	8352
q19	1760	1753	1707	1707
q20	2286	2023	1985	1985
q21	5785	5344	5502	5344
q22	560	473	496	473
Total cold run time: 64016 ms
Total hot run time: 60680 ms

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 36.60% (8603/23504)
Line Coverage: 28.67% (69944/243999)
Region Coverage: 27.66% (36191/130830)
Branch Coverage: 24.38% (18498/75886)
Coverage Report: http://coverage.selectdb-in.cc/coverage/8f6c4b5a0c643b57da5c4957afa647a77bba32ed_8f6c4b5a0c643b57da5c4957afa647a77bba32ed/report/index.html

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 47.09 seconds
stream load tsv: 578 seconds loaded 74807831229 Bytes, about 123 MB/s
stream load json: 19 seconds loaded 2358488459 Bytes, about 118 MB/s
stream load orc: 66 seconds loaded 1101869774 Bytes, about 15 MB/s
stream load parquet: 32 seconds loaded 861443392 Bytes, about 25 MB/s
insert into select: 27.5 seconds inserted 10000000 Rows, about 363K ops/s
storage size: 17184037303 Bytes

@qidaye
Copy link
Contributor Author

qidaye commented Dec 29, 2023

run buildall

Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

TPC-H test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'

Tpch sf100 test result on commit afd7ee0332759be9840f723374a2ddac7cba08f7, data reload: false

run tpch-sf100 query with default conf and session variables
q1	5466	5105	5121	5105
q2	418	175	160	160
q3	1483	1170	1151	1151
q4	1091	819	833	819
q5	3155	3152	3145	3145
q6	233	138	138	138
q7	983	576	541	541
q8	2163	2248	2278	2248
q9	6706	6681	6664	6664
q10	3193	3131	3122	3122
q11	348	240	222	222
q12	390	248	244	244
q13	4402	3623	3622	3622
q14	256	223	216	216
q15	613	580	548	548
q16	456	390	409	390
q17	1055	597	623	597
q18	7098	6706	8153	6706
q19	1655	1581	1485	1485
q20	579	377	361	361
q21	2894	2488	2500	2488
q22	394	307	330	307
Total cold run time: 45031 ms
Total hot run time: 40279 ms

run tpch-sf100 query with default conf and set session variable runtime_filter_mode=off
q1	5120	5052	5094	5052
q2	339	263	244	244
q3	3371	3340	3318	3318
q4	2171	2047	2025	2025
q5	5950	5919	5940	5919
q6	234	130	136	130
q7	2387	1864	1923	1864
q8	3564	3644	3651	3644
q9	9041	8971	8959	8959
q10	3856	3919	3945	3919
q11	605	482	492	482
q12	820	647	637	637
q13	3881	3178	3187	3178
q14	288	273	259	259
q15	603	577	575	575
q16	595	535	514	514
q17	2051	1847	1768	1768
q18	8709	8426	8457	8426
q19	1766	1714	1735	1714
q20	2282	2008	1989	1989
q21	5725	5386	5374	5374
q22	567	508	486	486
Total cold run time: 63925 ms
Total hot run time: 60476 ms

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 47.68 seconds
stream load tsv: 564 seconds loaded 74807831229 Bytes, about 126 MB/s
stream load json: 19 seconds loaded 2358488459 Bytes, about 118 MB/s
stream load orc: 66 seconds loaded 1101869774 Bytes, about 15 MB/s
stream load parquet: 32 seconds loaded 861443392 Bytes, about 25 MB/s
insert into select: 28.6 seconds inserted 10000000 Rows, about 349K ops/s
storage size: 17188096521 Bytes

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 36.62% (8608/23505)
Line Coverage: 28.69% (70004/244008)
Region Coverage: 27.69% (36234/130856)
Branch Coverage: 24.40% (18522/75918)
Coverage Report: http://coverage.selectdb-in.cc/coverage/afd7ee0332759be9840f723374a2ddac7cba08f7_afd7ee0332759be9840f723374a2ddac7cba08f7/report/index.html

@qidaye
Copy link
Contributor Author

qidaye commented Dec 29, 2023

run p1

Copy link
Contributor

@morningman morningman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Dec 30, 2023
@morningman morningman merged commit 2c4e52e into apache:master Dec 30, 2023
16 checks passed
qidaye added a commit to qidaye/incubator-doris that referenced this pull request Dec 31, 2023
…29320)

Issue Number: close apache#29318 
1. Only push down `es_query` function to ES
2. Add null check where ES query result not have `_source` or `fields` fields.
@qidaye qidaye deleted the es_not_null branch December 31, 2023 01:08
HappenLee pushed a commit to HappenLee/incubator-doris that referenced this pull request Jan 12, 2024
…29320)

Issue Number: close apache#29318 
1. Only push down `es_query` function to ES
2. Add null check where ES query result not have `_source` or `fields` fields.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. dev/2.0.4-merged reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug] es catalog not_null_or_empty not working
4 participants