Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix](Nereids): NullSafeEqual should be in HashJoinCondition #27127

Merged
merged 2 commits into from
Nov 21, 2023

Conversation

jackwener
Copy link
Member

@jackwener jackwener commented Nov 16, 2023

Proposed changes

Originally, we just put EqualTo in HashJoinCondition, we also need to allow NullSafeEqual

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

@jackwener
Copy link
Member Author

run buildall

@doris-robot
Copy link

TPC-H test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
Tpch sf100 test result on commit 280c002c3fee41cc40516ef301a47ea7d20dd8a3, data reload: false

run tpch-sf100 query with default conf and session variables
q1	4948	4696	4661	4661
q2	354	182	158	158
q3	2037	1909	1896	1896
q4	1391	1257	1232	1232
q5	3995	3950	4093	3950
q6	246	126	128	126
q7	1409	886	899	886
q8	2772	2806	2781	2781
q9	9923	9610	9556	9556
q10	3480	3518	3528	3518
q11	382	242	252	242
q12	437	296	299	296
q13	4566	3795	3778	3778
q14	327	290	279	279
q15	581	540	527	527
q16	665	581	585	581
q17	1146	993	951	951
q18	7850	7318	7445	7318
q19	1702	1707	1693	1693
q20	555	296	311	296
q21	4417	3945	4018	3945
q22	480	383	370	370
Total cold run time: 53663 ms
Total hot run time: 49040 ms

run tpch-sf100 query with default conf and set session variable runtime_filter_mode=off
q1	4569	4585	4572	4572
q2	327	226	247	226
q3	4022	4020	4001	4001
q4	2717	2703	2700	2700
q5	9622	9679	9675	9675
q6	248	126	124	124
q7	2619	2276	2295	2276
q8	4432	4437	4459	4437
q9	13277	13132	13191	13132
q10	4056	4172	4184	4172
q11	791	694	642	642
q12	977	824	825	824
q13	4303	3554	3627	3554
q14	381	338	334	334
q15	578	517	525	517
q16	741	676	670	670
q17	3931	3819	3828	3819
q18	9526	9086	9114	9086
q19	1825	1784	1785	1784
q20	2415	2064	2069	2064
q21	8830	8698	8514	8514
q22	877	818	802	802
Total cold run time: 81064 ms
Total hot run time: 77925 ms

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 44.62 seconds
stream load tsv: 566 seconds loaded 74807831229 Bytes, about 126 MB/s
stream load json: 18 seconds loaded 2358488459 Bytes, about 124 MB/s
stream load orc: 65 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 34 seconds loaded 861443392 Bytes, about 24 MB/s
insert into select: 28.7 seconds inserted 10000000 Rows, about 348K ops/s
storage size: 17099288597 Bytes

@wm1581066 wm1581066 requested a review from morrySnow November 16, 2023 16:14
@yiguolei yiguolei added usercase Important user case type label dev/2.0.3 labels Nov 17, 2023
@jackwener
Copy link
Member Author

run buildall

@wm1581066 wm1581066 requested a review from englefly November 17, 2023 04:03
@jackwener
Copy link
Member Author

run buildall

3 similar comments
@jackwener
Copy link
Member Author

run buildall

@jackwener
Copy link
Member Author

run buildall

@yiguolei
Copy link
Contributor

run buildall

@doris-robot
Copy link

TPC-H test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
Tpch sf100 test result on commit 0a06e653958fc0ada9e12ca84ee45823020109bd, data reload: false

run tpch-sf100 query with default conf and session variables
q1	4949	4682	4678	4678
q2	357	158	159	158
q3	2041	1924	1927	1924
q4	1379	1244	1245	1244
q5	3976	3958	4060	3958
q6	251	129	131	129
q7	1409	872	881	872
q8	2768	2792	2788	2788
q9	9752	9739	9619	9619
q10	3465	3562	3528	3528
q11	380	254	244	244
q12	441	288	300	288
q13	4581	3799	3811	3799
q14	320	292	283	283
q15	594	516	518	516
q16	665	586	590	586
q17	1133	958	973	958
q18	7858	7445	7364	7364
q19	1681	1670	1698	1670
q20	528	301	303	301
q21	4372	3948	4002	3948
q22	479	387	375	375
Total cold run time: 53379 ms
Total hot run time: 49230 ms

run tpch-sf100 query with default conf and set session variable runtime_filter_mode=off
q1	4594	4614	4570	4570
q2	350	233	253	233
q3	4029	4016	4013	4013
q4	2709	2694	2699	2694
q5	9719	9668	9647	9647
q6	247	123	124	123
q7	2582	2304	2285	2285
q8	4429	4417	4462	4417
q9	13275	13159	13259	13159
q10	4120	4136	4177	4136
q11	824	700	700	700
q12	974	814	814	814
q13	4285	3526	3582	3526
q14	379	340	345	340
q15	575	532	524	524
q16	737	654	667	654
q17	3889	3900	3863	3863
q18	9689	9156	9114	9114
q19	1819	1787	1771	1771
q20	2404	2072	2061	2061
q21	8792	8897	8546	8546
q22	866	798	807	798
Total cold run time: 81287 ms
Total hot run time: 77988 ms

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 46.25 seconds
stream load tsv: 580 seconds loaded 74807831229 Bytes, about 123 MB/s
stream load json: 18 seconds loaded 2358488459 Bytes, about 124 MB/s
stream load orc: 65 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 32 seconds loaded 861443392 Bytes, about 25 MB/s
insert into select: 28.6 seconds inserted 10000000 Rows, about 349K ops/s
storage size: 17100173515 Bytes

@jackwener
Copy link
Member Author

run buildall

@doris-robot
Copy link

TPC-H test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
Tpch sf100 test result on commit 31b5d04dde6be5e28977c80e3da289f3ecafa98d, data reload: false

run tpch-sf100 query with default conf and session variables
q1	4942	4652	4699	4652
q2	354	161	160	160
q3	2044	1919	1900	1900
q4	1380	1273	1246	1246
q5	3986	3979	4032	3979
q6	247	135	134	134
q7	1414	887	897	887
q8	2763	2798	2792	2792
q9	9803	9733	9588	9588
q10	3497	3537	3549	3537
q11	381	247	245	245
q12	436	289	299	289
q13	4571	3785	3839	3785
q14	327	291	283	283
q15	581	541	515	515
q16	668	584	580	580
q17	1144	971	944	944
q18	7850	7405	7414	7405
q19	1711	1708	1660	1660
q20	571	323	288	288
q21	4363	4008	3982	3982
q22	473	375	367	367
Total cold run time: 53506 ms
Total hot run time: 49218 ms

run tpch-sf100 query with default conf and set session variable runtime_filter_mode=off
q1	4591	4574	4560	4560
q2	343	246	264	246
q3	4028	4011	3994	3994
q4	2702	2697	2692	2692
q5	9656	9671	9602	9602
q6	244	127	127	127
q7	3049	2503	2486	2486
q8	4460	4443	4476	4443
q9	13257	13113	13062	13062
q10	4125	4205	4191	4191
q11	787	675	691	675
q12	973	816	816	816
q13	4287	3556	3588	3556
q14	386	345	367	345
q15	579	526	512	512
q16	745	688	687	687
q17	3892	3925	3921	3921
q18	9570	8895	9089	8895
q19	1837	1800	1766	1766
q20	2377	2062	2048	2048
q21	8718	8779	8539	8539
q22	876	823	831	823
Total cold run time: 81482 ms
Total hot run time: 77986 ms

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 44.19 seconds
stream load tsv: 570 seconds loaded 74807831229 Bytes, about 125 MB/s
stream load json: 18 seconds loaded 2358488459 Bytes, about 124 MB/s
stream load orc: 65 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 32 seconds loaded 861443392 Bytes, about 25 MB/s
insert into select: 29.1 seconds inserted 10000000 Rows, about 343K ops/s
storage size: 17099751921 Bytes

@keanji-x
Copy link
Contributor

Maybe some test to check correctness is necessary

Copy link
Contributor

PR approved by anyone and no changes requested.

@jackwener
Copy link
Member Author

run buildall

1 similar comment
@jackwener
Copy link
Member Author

run buildall

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 44.99 seconds
stream load tsv: 578 seconds loaded 74807831229 Bytes, about 123 MB/s
stream load json: 18 seconds loaded 2358488459 Bytes, about 124 MB/s
stream load orc: 65 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 32 seconds loaded 861443392 Bytes, about 25 MB/s
insert into select: 28.7 seconds inserted 10000000 Rows, about 348K ops/s
storage size: 17163618468 Bytes

@hello-stephen
Copy link
Contributor

run buildall

1 similar comment
@jackwener
Copy link
Member Author

run buildall

@doris-robot
Copy link

TPC-H test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
Tpch sf100 test result on commit 7c61195cc842c091a1909b74bb423a0f2730fe1b, data reload: false

run tpch-sf100 query with default conf and session variables
q1	4918	4654	4633	4633
q2	365	150	158	150
q3	2035	1866	1908	1866
q4	1378	1255	1241	1241
q5	3995	3989	4045	3989
q6	249	129	132	129
q7	1441	879	896	879
q8	2761	2796	2777	2777
q9	9800	9842	9567	9567
q10	3469	3557	3547	3547
q11	379	241	239	239
q12	439	297	297	297
q13	4591	3868	3801	3801
q14	310	286	287	286
q15	598	537	522	522
q16	665	579	575	575
q17	1141	954	960	954
q18	7809	7535	7320	7320
q19	1693	1690	1696	1690
q20	567	300	293	293
q21	4391	3972	3957	3957
q22	482	368	374	368
Total cold run time: 53476 ms
Total hot run time: 49080 ms

run tpch-sf100 query with default conf and set session variable runtime_filter_mode=off
q1	4591	4565	4579	4565
q2	340	216	272	216
q3	4050	4016	4002	4002
q4	2703	2690	2680	2680
q5	9850	9856	9913	9856
q6	241	120	127	120
q7	3006	2476	2467	2467
q8	4527	4501	4533	4501
q9	13272	13131	13182	13131
q10	4142	4231	4250	4231
q11	798	625	632	625
q12	988	804	833	804
q13	4282	3599	3563	3563
q14	372	363	343	343
q15	589	529	515	515
q16	767	662	664	662
q17	3855	3908	3921	3908
q18	9525	9100	9199	9100
q19	1810	1818	1777	1777
q20	2405	2055	2039	2039
q21	8749	8607	8631	8607
q22	873	794	799	794
Total cold run time: 81735 ms
Total hot run time: 78506 ms

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 45.42 seconds
stream load tsv: 577 seconds loaded 74807831229 Bytes, about 123 MB/s
stream load json: 18 seconds loaded 2358488459 Bytes, about 124 MB/s
stream load orc: 65 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 32 seconds loaded 861443392 Bytes, about 25 MB/s
insert into select: 28.7 seconds inserted 10000000 Rows, about 348K ops/s
storage size: 17100724922 Bytes

@jackwener
Copy link
Member Author

run buildall

1 similar comment
@yiguolei
Copy link
Contributor

run buildall

@doris-robot
Copy link

TPC-H test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
Tpch sf100 test result on commit 4af5845122afba29961abf6d440cdccc44e96426, data reload: false

run tpch-sf100 query with default conf and session variables
q1	4928	4662	4688	4662
q2	369	146	157	146
q3	2080	1988	1920	1920
q4	1394	1265	1274	1265
q5	4021	4000	4040	4000
q6	244	131	127	127
q7	1412	888	894	888
q8	2763	2808	2781	2781
q9	9728	9721	9565	9565
q10	3493	3519	3567	3519
q11	381	255	254	254
q12	440	296	299	296
q13	4604	3856	3787	3787
q14	321	294	283	283
q15	601	539	531	531
q16	669	579	584	579
q17	1149	967	919	919
q18	7877	7365	7317	7317
q19	1682	1689	1691	1689
q20	575	307	300	300
q21	4402	3961	4006	3961
q22	475	375	367	367
Total cold run time: 53608 ms
Total hot run time: 49156 ms

run tpch-sf100 query with default conf and set session variable runtime_filter_mode=off
q1	4594	4581	4586	4581
q2	336	232	264	232
q3	4018	4002	4011	4002
q4	2702	2689	2706	2689
q5	9573	9607	9645	9607
q6	242	124	126	124
q7	3020	2455	2444	2444
q8	4452	4478	4464	4464
q9	13170	13104	13128	13104
q10	4120	4185	4243	4185
q11	785	725	666	666
q12	977	810	814	810
q13	4305	3602	3557	3557
q14	379	350	357	350
q15	588	523	536	523
q16	740	671	671	671
q17	3871	3865	3890	3865
q18	9554	9019	9234	9019
q19	1818	1783	1782	1782
q20	2396	2088	2058	2058
q21	8783	8651	8431	8431
q22	915	802	795	795
Total cold run time: 81338 ms
Total hot run time: 77959 ms

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 44.87 seconds
stream load tsv: 570 seconds loaded 74807831229 Bytes, about 125 MB/s
stream load json: 18 seconds loaded 2358488459 Bytes, about 124 MB/s
stream load orc: 65 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 34 seconds loaded 861443392 Bytes, about 24 MB/s
insert into select: 28.7 seconds inserted 10000000 Rows, about 348K ops/s
storage size: 17098798073 Bytes

@jackwener
Copy link
Member Author

run buildall

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 44.45 seconds
stream load tsv: 572 seconds loaded 74807831229 Bytes, about 124 MB/s
stream load json: 18 seconds loaded 2358488459 Bytes, about 124 MB/s
stream load orc: 65 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 32 seconds loaded 861443392 Bytes, about 25 MB/s
insert into select: 28.8 seconds inserted 10000000 Rows, about 347K ops/s
storage size: 17098981771 Bytes

@doris-robot
Copy link

TPC-H test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
Tpch sf100 test result on commit 90736869f16689b82c78a990c55fb5e6d8e9a053, data reload: false

run tpch-sf100 query with default conf and session variables
q1	4938	4682	4681	4681
q2	377	180	159	159
q3	2023	1922	1915	1915
q4	1381	1269	1229	1229
q5	3927	3956	4009	3956
q6	245	132	133	132
q7	1406	888	874	874
q8	2713	2770	2760	2760
q9	21082	13618	9650	9650
q10	13195	3537	3539	3537
q11	398	236	243	236
q12	459	300	297	297
q13	21848	3823	3812	3812
q14	325	288	285	285
q15	583	547	518	518
q16	669	590	578	578
q17	1146	946	917	917
q18	7761	7261	7429	7261
q19	1674	1660	1671	1660
q20	564	302	304	302
q21	4377	3923	3931	3923
q22	477	371	373	371
Total cold run time: 91568 ms
Total hot run time: 49053 ms

run tpch-sf100 query with default conf and set session variable runtime_filter_mode=off
q1	4603	4587	4585	4585
q2	338	247	273	247
q3	3994	3977	3981	3977
q4	2689	2669	2675	2669
q5	9665	9674	9589	9589
q6	234	123	122	122
q7	2984	2489	2504	2489
q8	4427	4416	4432	4416
q9	13279	13096	13116	13096
q10	4084	4179	4185	4179
q11	771	723	673	673
q12	962	818	825	818
q13	4296	3576	3581	3576
q14	392	342	341	341
q15	585	522	538	522
q16	732	665	679	665
q17	3888	3861	3878	3861
q18	9594	8762	9059	8762
q19	1834	1765	1765	1765
q20	2378	2060	2076	2060
q21	8822	8351	8501	8351
q22	900	819	832	819
Total cold run time: 81451 ms
Total hot run time: 77582 ms

Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Nov 21, 2023
@jackwener jackwener merged commit dea40e7 into apache:master Nov 21, 2023
@jackwener jackwener deleted the equal branch November 21, 2023 11:08
eldenmoon added a commit that referenced this pull request Nov 23, 2023
* [keyword](decimalv2) Add DecimalV2 keyword #26283 (#26319)

* [fix](planner) Fix sample partition table #25912 (#26399)

In the past, two conditions needed to be met when sampling a partitioned table: 1. Data is evenly distributed between partitions; 2. Data is evenly distributed between buckets. Finally, the number of sampled rows in each partition and each bucket is the same.

Now, sampling will be proportional to the number of partitioned and bucketed rows.

* [fix](spark-load)fix-Unique-key-with-MOR-by-sparkload #26383 (#26414)

* [fix](nereids)fix bug of select mv in nereids #26235 (#26415)

* [improvement](show trash) Fix be restart slow when too many trash files #26147 (#26417)

* [fix](planner)should keep at least one slot materialized in agg node #26116 (#26419)

* [fix](multi-catalog)add the FAQ for Aliyun DLF and add the fs.xx.impl check #25594 (#26422)

* [coverage](pipeline) Remove unless code and add call method for coverage #25552 (#26423)

Remove unless code and add call method for coverage

* [Fix](statistics)Fix analyze min max sql syntax error. #26240 (#26443)

backport #26240

* [fix](auditlog) fix without lock in QueryStatisticsRecvr find  (#26441)

* [fix](invert index) Fix the timing error when opening the searcher #26401 (#26472)

* [fix](nereids)only enable colocate scan for one phase global parttion topn in some condition #26473 (#26481)

* [branch-2.0](cherry-pick) Add more indexed column reader be unit test #25652 (#26430)

* [enhancement](regression) fault injection for segcompaction test (#25709) (#26305)

1. generalized debug point facilities from docker suites for
   fault-injection/stubbing cases
2. add segcompaction fault-injection cases for demonstration
3. add -238 TOO_MANY_SEGMENTS fault-injection case for good

Co-authored-by: zhengyu <freeman.zhang1992@gmail.com>

* [fix](case) rm non-visiable charactor null in out file (#26540)

* [fix](load) fix merged row number miscounting because of race condition (#26516)

row numbers miscounting because of race condition, will cause load to
fail sometimes with warning 'the rows number written doesn't match'.

Signed-off-by: freemandealer <freeman.zhang1992@gmail.com>

* [test](regression) Add more regression test for FE (#26539)

* [test](coverage) Improve test coverage for runtime filter (#26314) (#26547)

* [fix](Nereids) RewriteCteChildren not work with cost based rewritter (#26326) (#26530)

we use a map to record rewrite cte children result to avoid rewrite
twice in cost based rewritter. However, we record cte outer and
inner in one map, and use null as outer result's key, use cte id as
inner result's key. This is wrong, because every anchor has an outer,
and we could only record one outer. So when we use the cache in cost
based rewritter, we get wrong outer plan from the cache. Then the error
will be thrown as below:

```
Caused by: java.lang.IllegalArgumentException: Stats for CTE: CTEId#1 not found
    at com.google.common.base.Preconditions.checkArgument(Preconditions.java:143) ~[guava-32.1.2-jre.jar:?]
    at org.apache.doris.nereids.stats.StatsCalculator.visitLogicalCTEConsumer(StatsCalculator.java:1049) ~[classes/:?]
    at org.apache.doris.nereids.stats.StatsCalculator.visitLogicalCTEConsumer(StatsCalculator.java:147) ~[classes/:?]
    at org.apache.doris.nereids.trees.plans.logical.LogicalCTEConsumer.accept(LogicalCTEConsumer.java:111) ~[classes/:?]
    at org.apache.doris.nereids.stats.StatsCalculator.estimate(StatsCalculator.java:222) ~[classes/:?]
    at org.apache.doris.nereids.stats.StatsCalculator.estimate(StatsCalculator.java:200) ~[classes/:?]
    at org.apache.doris.nereids.jobs.cascades.DeriveStatsJob.execute(DeriveStatsJob.java:108) ~[classes/:?]
    at org.apache.doris.nereids.jobs.scheduler.SimpleJobScheduler.executeJobPool(SimpleJobScheduler.java:39) ~[classes/:?]
    at org.apache.doris.nereids.jobs.executor.Optimizer.execute(Optimizer.java:51) ~[classes/:?]
    at org.apache.doris.nereids.jobs.rewrite.CostBasedRewriteJob.getCost(CostBasedRewriteJob.java:98) ~[classes/:?]
    at org.apache.doris.nereids.jobs.rewrite.CostBasedRewriteJob.execute(CostBasedRewriteJob.java:64) ~[classes/:?]
    at org.apache.doris.nereids.jobs.executor.AbstractBatchJobExecutor.execute(AbstractBatchJobExecutor.java:119) ~[classes/:?]
    at org.apache.doris.nereids.rules.rewrite.RewriteCteChildren.visit(RewriteCteChildren.java:72) ~[classes/:?]
    at org.apache.doris.nereids.rules.rewrite.RewriteCteChildren.visit(RewriteCteChildren.java:56) ~[classes/:?]
    at org.apache.doris.nereids.trees.plans.visitor.PlanVisitor.visitLogicalSink(PlanVisitor.java:118) ~[classes/:?]
    at org.apache.doris.nereids.trees.plans.visitor.SinkVisitor.visitLogicalResultSink(SinkVisitor.java:72) ~[classes/:?]
    at org.apache.doris.nereids.trees.plans.logical.LogicalResultSink.accept(LogicalResultSink.java:58) ~[classes/:?]
    at org.apache.doris.nereids.rules.rewrite.RewriteCteChildren.visitLogicalCTEAnchor(RewriteCteChildren.java:86) ~[classes/:?]
    at org.apache.doris.nereids.rules.rewrite.RewriteCteChildren.visitLogicalCTEAnchor(RewriteCteChildren.java:56) ~[classes/:?]
    at org.apache.doris.nereids.trees.plans.logical.LogicalCTEAnchor.accept(LogicalCTEAnchor.java:60) ~[classes/:?]
    at org.apache.doris.nereids.rules.rewrite.RewriteCteChildren.visitLogicalCTEAnchor(RewriteCteChildren.java:86) ~[classes/:?]
    at org.apache.doris.nereids.rules.rewrite.RewriteCteChildren.visitLogicalCTEAnchor(RewriteCteChildren.java:56) ~[classes/:?]
    at org.apache.doris.nereids.trees.plans.logical.LogicalCTEAnchor.accept(LogicalCTEAnchor.java:60) ~[classes/:?]
    at org.apache.doris.nereids.rules.rewrite.RewriteCteChildren.rewriteRoot(RewriteCteChildren.java:67) ~[classes/:?]
    at org.apache.doris.nereids.jobs.rewrite.CustomRewriteJob.execute(CustomRewriteJob.java:58) ~[classes/:?]
    at org.apache.doris.nereids.jobs.executor.AbstractBatchJobExecutor.execute(AbstractBatchJobExecutor.java:119) ~[classes/:?]
    at org.apache.doris.nereids.NereidsPlanner.rewrite(NereidsPlanner.java:275) ~[classes/:?]
    at org.apache.doris.nereids.NereidsPlanner.plan(NereidsPlanner.java:218) ~[classes/:?]
    at org.apache.doris.nereids.NereidsPlanner.plan(NereidsPlanner.java:118) ~[classes/:?]
    at org.apache.doris.nereids.trees.plans.commands.ExplainCommand.run(ExplainCommand.java:81) ~[classes/:?]
    at org.apache.doris.qe.StmtExecutor.executeByNereids(StmtExecutor.java:550) ~[classes/:?]
```

* [fix](Nereids) could not run query with repeat node in cte (#26330) (#26531)

pick from master
PR: #26330 
commit id: a89477e

ExpressionDeepCopier not process VirtualReference, so we generate inline
plan with mistake.

* [opt](Nereids) remove Nondeterministic trait from date related functions (#26444) (#26568)

* change version to 2.0.3-rc03dev

* [fix](regression-test)  Fix regiressin test syncer suit use master fe directly (#26456) (#26583)

Signed-off-by: Jack Drogon <jack.xsuperman@gmail.com>

* Revert "[improvement](scanner_schedule) reduce memory consumption of scanner #24199 (#25547)" (#26613)

This reverts commit 9a19581 to investigate ANALYZE TABLE WITH SYNC problem

* [enhancement](Nereids): add LOG info to show the phase of NereidsPlanner. (#26542)

* [opt](regression test) Add string-like column order by test #26379 (#26533)

* [Feature](auditloader) Plugin auditloader use auth token to avoid using cleartext passwords in config (#26278) (#26532)

Doris FE will check if stream load http request has auth token after checking password failed;
Plugin audit-log loader can use auth token if plugin config set use_auth_token to true

Co-authored-by: Mingyu Chen <morningman.cmy@gmail.com>

* [branch-2.0](JdbcCatalog) fix that the predicate column name does not have back quote when querying the JDBC appearance (#26479) (#26560)

master pr: #26479

* [fix](prepare statement) Not supported such prepared statement if prepare a forward master sql (#26512) (#26638)

* [Pick-2.0](regression) add failure injection in inverted index writer #26121 (#26376)

* [fix](regression) fix regression framework bug: if real test result is negative, it will miss check test result #25734 (#25734) (#26551)

* [Branch-2.0](regression-test) Add tvf regression tests #26322 #26455 (#26566)

* [fix](BE)Branch-2.0 unknown runtime filter when get filter from _consumer_map (#26570)

* [regression-test](framework) support Non concurrent mode #26487  (#26574)

* [regression-test](fix) fix case bug #26561  (#26578)

* [fix](backup) Add repo id to local meta/info files to avoid overwriting #26536 (#26622)

The local meta/info files generated during backup are not distinguished
by repo names. If two backup jobs with the same name are submitted to
different repos at the same time, meta/info may be overwritten by another
backup job.

* [cases](regression-test) Add backup & restore test case #26490 #26491 (#26623)

* [case](regression) Adapt show create table and views to 2.0 (#26624)

* [fix](regression-test) add more check to address flaky test_partial_update_with_delete_stmt #26474 (#26628)

* [feature](Nereids): push down topN through join #24720 (#26634)

Push TopN through Join.

JoinType just can be left/right outer join or cross join, because data of their one child can't be filtered.

new TopN is (original limit + original offset, 0) as limit and offset.

(cherry picked from commit 3c9ff7a)

* [Test](statistics) Add test cases for external table statistics #26511 (#26636)

1. Test for close and open auto collection for external catalog.
2. Test for analyze table table_name (column) and whole table.

* [fix](runtime filter) append late arrival runtime filters in vfilecanner #25996 (#26640)

`VFileScanner` will try to append late arrival runtime filters in each loop of `ScannerScheduler::_scanner_scan`.  However, `VFileScanner::_get_next_reader` only generates the `_push_down_conjuncts` in the first loop, so the late arrival runtime filters are ignored.

* [fix](information_schema)fix bug that metadata_name_ids error tableid and append information_schema case #26238 (#26646)

fix bug that  #24059 .
Added some information_schema scanner tests.
files
schema_privileges
table_privileges
partitions
rowsets
statistics
table_constraints

Based on infodb_support_ext_catalog=false, it currently includes tests for all tables under the information_schema database.

* [Improve](map)Map impli cast #26126 (#26654)

* [chore](regression) Do stale resource reclaim before executing cold heat separation p2 case #26596 (#26660)

* fix shrink in topN for complext type #26609 (#26661)

* [fix](planner) Fix decimal precision and scale wrong when create table like #25802 (#26666)

Use field datatype such as decimal(10, 0) to create table like. Because the scale is 0, the precision and scale will lost when create table like done. this will fix the bug.

**Before fix, create table with following SQL**:
CREATE TABLE IF NOT EXISTS db_test.table_test
(
    `name` varchar COMMENT "1m size",
    `id` SMALLINT COMMENT "[-32768, 32767]",
    `timestamp0` decimal null comment "c0",
    `timestamp1` decimal(38, 0) null comment "c1"
)
DISTRIBUTED BY HASH(`id`) BUCKETS 1
PROPERTIES ('replication_num' = '1');

**and Then run**
CREATE TABLE db_test.table_test_like LIKE db_test.table_test
SHOW CREATE TABLE db_test.table_test_like;

the field `timestamp1` will be decimal(9, 0), it's wrong. this will fix it.

Co-authored-by: JingDas <114388747+JingDas@users.noreply.github.com>

* [fix](test) fix sql block rule test (#26671)

* [Coverage](BE) Delete vinfo_func in BE #26562 (#26674)

* [Fix](partial update) Fix core when successfully schema change and load during a partial update #26210 (#26518)

* [typo] copy branch master docs to branch-2.0 (#26703)

* [typo] update sql-functions to upper-case (#26706)

* [Bug](cherry-pick) Add status dispose in branch 2.0 beta rowset reader (#26684)

* (selectdb-cloud) Reduce FE db lock range for ShowDataStmt #26588 (#26621)

Reduce read lock critical sections and avoid execution timeouts

* [brach-2.0](pick)use 2 phase agg above union all #26245 (#26664)

* [bug](bitmap) fix bitmap value copy operator not call reset #26451 (#26681)

when a empty bitmap assign to other bitmap
the other bitmap should reset self firstly, and then set empty type.

* [fix](planner)isnull predicate can't be safely constant folded in inlineview #25377 (#26685)

* [fix](nereids)unnest in-subquery with agg node in proper condition #25800 (#26687)

* [fix](nereids)add visitMarkJoinReference method in ExpressionDeepCopier #25874 (#26688)

* [fix](nereids)don't normalize column name for base index #26476 (#26690)

* [fix](planner)cast floating point type to bigint for bit functions #26598 (#26691)

* [fix](Nereids) storage later agg rule process agg children by mistake #26101 (#26698)

pick from master
PR #26101
commit id c0ed5f7

update Project#findProject
agg function's children could be any expression rather than only slot.
we use Project#findProject to process them. But this util could only
process slot. This PR update this util to let it could process all type
expression.

* [fix](Nereids) time extract function constant folding core (#26292) (#26699)

pick from master
PR: #26292
commit id: 74fd5da

some time extract function changed return type in the previous PR #18369
but it is not change FE constant folding function signature.
This is let them have same signature to avoid BE core.

* [fix](Nereids) only search internal funcftion when dbName is empty (#26296) (#26700)

pick from master
PR: #26296
commit id: 6892fc9

if call function with database name. we should only search UDF

* [fix](Nereids) ban right outer, right anti, full outer with bucket shuffle (#26529) (#26702)

pick from master
PR: #26529
commit id: f80495d

if left bucket has no data, we do not generate left bucket instance.
These join should reserve all right side data. But because left instance
is not exists. So right data will be discard since no dest be set.

We ban these join temporarily until we could generate all instance
for left side in Coordinator.

* [test](statistics)Add hive statistics all data type p0 test (#26676) (#26715)

* [test](serialisation) Serialise some cases and enable str_to_date tests #26651 (#26716)

1 enable the cases about str_to_date, which have been muted because some parallel config influence.
2 serialise some cases which called admin set config

* Revert "[Coverage](BE) Delete vinfo_func in BE #26562 (#26674)" (#26724)

This reverts commit 22eafa4.

* [fix](regression-test) add tests for jdbc catalog (#26608) (#26719)

* [fix](nereids)SimplifyRange rule may mess up and/or predicate #26304 (#26693)

* [Fix](fs_benchmark_tools) Fix `run_fs_benchmark.sh` classpath issue. (#26183) (#26704)

Backport from #26183.

* [Fix](partial update) Fix core when doing partial update on tables with row column after schema change #26632 (#26695)

* [Opt](orc-reader) Optimize orc string dict filter in not_single_conjunct case. (#26386) (#26696)

Optimize orc/parquet string dict filter in not_single_conjunct case. We can optimize this processing to filter block firstly by dict code, then filter by not_single_conjunct. Because dict code is int, it will filter faster than string.

For example:
```
select count(l_receiptdate) from lineitem_date_as_string where l_shipmode in ('MAIL', 'SHIP') and l_commitdate < l_receiptdate  and l_receiptdate >= '1994-01-01' and l_receiptdate < '1995-01-01';
```
 `l_receiptdate` and `l_shipmode` will using string dict filtering, and `l_commitdate < l_receiptdate` is the an not_single_conjunct which contains dict filter field. We can optimize this processing to filter block firstly by dict code, then filter by not_single_conjunct. Because dict code is int, it will filter faster than string.

Before:
 mysql> select count(l_receiptdate) from lineitem_date_as_string where l_shipmode in ('MAIL', 'SHIP') and l_commitdate < l_receiptdate  and l_receiptdate >= '1994-01-01' and l_receiptdate < '1995-01-01';
+----------------------+
| count(l_receiptdate) |
+----------------------+
|             49314694 |
+----------------------+
1 row in set (6.87 sec)

After:
mysql> select count(l_receiptdate) from lineitem_date_as_string where l_shipmode in ('MAIL', 'SHIP') and l_commitdate < l_receiptdate  and l_receiptdate >= '1994-01-01' and l_receiptdate < '1995-01-01';
+----------------------+
| count(l_receiptdate) |
+----------------------+
|             49314694 |
+----------------------+
1 row in set (4.85 sec)

* [docs](docs) Update Files of Branch-2.0 (#26737)

* [date](parser) Support DateV1 keyword (#25414) (#26746)

* [Fix](orc-reader) Fix orc complex types when late materialization was turned on by disabling late materialization in this case. (#26548) (#26743)

Fix orc complex types when late materialization was turned on in orc reader by disabling late materialization in this case.

* [fix](udf)java udf does not support overloaded evaluate method (#22681) (#26768)

Co-authored-by: HB <hubiao01@corp.netease.com>

* [fix](show_proc) fix show statistic proc dir to ensure that result only contains dbs in internal catalog (#26254) (#26763)

backport #26254
Co-authored-by: caiconghui <55968745+caiconghui@users.noreply.github.com>

* [Enhancement](sql-cache) Use update time of hive to avoid cache miss through multi fe nodes. (#26424) (#26762)

backport #26424

* [Fix](partial update) Fix partial update info loss when the delete bitmaps of the committed transactions are calculated by the compaction #26556 (#26735)

* [hotfix](editlog) Fix upsert replay on follower not contains loadedTableIndexIds (#26597) (#26756)

* [chore](regression-test) Fix error add partition operation due to duplicate partition range #26742 (#26758)

* [Bug](materialized-view) fix some bugs on create mv with percentile_approx (#26528) (#26764)

1. percentile_approx have wrong symbol
2. fnCall.getParams() get obsolete childrens

* [Bug](agg-state) fix file load insert wrong data to agg_state (#26581) (#26765)

* [Bug](decimalv2)  getCmpType return decimalv2 when lhs/rhs type both is decimalv2  (#26705) (#26767)

* [fix](Nereids) fix plan shape of query64 unstable  (#26012) (#26775)

don't remove the physical plan after optimizing the plan in dphyper.

* [FIX](complextype) fxi array nested struct literal #26270 (#26778)

* [improvement](disk balance) Prevent duplicate disk balance tasks afte… (#25990) (#26745)

* [branch-2.0](transaction) Fix publish txn wait too long when not meet quorum #26659 (#26759)

* [bugfix](clickhouse) fix datetime convert error. (#26128) (#26766)

Co-authored-by: Guangdong Liu <liugddx@gmail.com>

* [Fix](row store) cache invalidate key should not include sequence column #26771 (#26780)

* [branch-2.0](pick) support HTTP request with chunked transfer (#26520) (#26785)

* [feature](nestedType) add nested data type to create table tool (#26787)

* [fix](hudi) fix wrong schema when query hudi table on obs #26789 (#26791)

* [fix](decimal) fix undefined behaviour of divide by zero when cast string to decimal (#26792)

* [fix](refresh) fix priv issue of refresh database and table operation #26793 (#26794)

* [minor] add disable swap command tip (#26798)

* [fix](information_schema) fix test_query_sys_tables schema_privileges  regression case #26753 (#26800)

* [branch-2.0] fix test result (#26801)

fix output error from #26743
On master branch, the value in struct field is wrapped by quota,
but on branch 2.0, the value in struct field is NOT wrapped by quota

* fix: restore load job progress before retry load task (#26802)

Co-authored-by: chenboyang.922 <chenboyang.922@bytedance.com>

* [fix](thrift)limit be and fe thrift server max pkg size,avoid accepting error or too large package causing OOM #26179 (#26805)

* [fix](Planner): don't push down isNull predicate into view (#26288) (#26773)

* [opt](scanner) increase the connection num of s3 client #26795 (#26796)

* [enhancement](metrics)  enhance visibility of flush thread pool (#26544) (#26819)

* [fix](regression) move fault-injection data to the right place (#26825)

Signed-off-by: freemandealer <freeman.zhang1992@gmail.com>

* [feature](binlog) Add ingest_binlog/http_get_snapshot limit download speed && Add async ingest_binlog (#26323) (#26733)

* [fix](jdbc catalog) fix mysql zero date (#26569) (#26837)

* [ci](pipeline) add tpch sff100 test on branch-2.0 (#26824)

* [pick](nerieds) make AGG_SCALAR_SUBQUERY_TO_WINDOW_FUNCTION rewrite rule #25969 (#26852)

* [enhancement](230) print max version and spec version when -230 happens (#26643) (#26854)

* [chore](fs) Don't print the stack for file system and it's derived class #26814 (#26838)

* [compile](gcc) fix gcc compile error #26863

* [test](jdbc) pick some jdbc test from branch master (#26860)

* [pipeline](exec) disable shared scan in default and disable shared scan in limit with where scan (#25952) (#26815)

* [regression](partial update) Add cases when the deleted rows have non nullable columns without default value #26776 (#26848)

* [feature](fe) Add coverage tool for FE UT (#26203) (#26857)

* [fix](map) the implementation of ColumnMap::replicate was incorrect (#26647) (#26868)

* [fix](broker load) pass loadToSingleTablet to olapTableSink (#26680) (#26869)

* [regression-test](framework) Support running tests multiple times and reporting correctly to TeamCity (#26606) (#26871)

* [refactor](stats) refactor collection logic and opt some config #26163 (#26858)

picked from #26163

* [bug](user login)fix PASSWORD_LOCK_TIME setting UNBOUNDED does not take effect #26585 (#26859)

* [Improvement](statistics)Improve stats sample strategy (#26435) (#26890)

backport #26435
Improve the accuracy of sample stats collection. For non distribution columns, use
`n*d / (n - f1 + f1*n/N)`

where `f1` is the number of distinct values that occurred exactly once in our sample of n rows (from a total of N),
and `d` is the total number of distinct values in the sample.

For distribution columns, use `ndv(n) * fraction of tablets sampled` for NDV.

For very large tablet to sample, use limit to control the total lines to scan (for non key column only, because key column is sorted and will be inaccurate using limit).

* [fix](partial update) Fix NPE when the query statement of an update statement is a point query in OriginPlanner #26881 (#26900)

* [bug](function) add signature for precentile function (#26867) (#26926)

Co-authored-by: zhangstar333 <87313068+zhangstar333@users.noreply.github.com>

* enable pipeline and nereids in test-pipeline (#26918)

* [Fix](Planner) fix varchar does not show real length #25171 (#26850)

* [improvement](statistics)Multi bucket columns using DUJ1 to collect ndv #26950 (#26976)

backport #26950

* [fix](statistics)Fix external table show column stats type bug #26910  (#26921)

backport: #26910

* [minor](stats) rename stats related session variable name #26936 (#26928)

* [nereids](datetime) fix wrong result type of datetime add with interval as first arg (#26957) (#26987)

* [fix](Nereids) column pruning under union broken unexpectedly (#26884) (#26985)

* [fix](catalog) Fix ClickHouse DataTime64 precision parsing (#26980)

* [opt](MergeIO) use equivalent merge size to measure merge effectiveness (#26741) (#26923)

backport #26741

* add defensive code in runtime predicate to avoid crash due to column not in tablet schema #26990 (#26991)

* [fix](stats) fix auto collector always create sample job no matter the table size #26968 (#26972)

* [Enhance](regression)enhance docker network by add docker network subnet (#26872)

* [fix](case) regression-test/suites/show_p0/test_show_statistic_proc.groovy (#26925)

Co-authored-by: stephen <hello-stephen@qq.com>

* [fix](auth) fix overwrite logic of user with domain (#27003)

backport #27002

* [Branch-2.0](Serde) Fix content displayed by complex types in MySQL Client (#26880)

backport #25946 and #26301

* [test](tvf) append tvf read hive_text file  regression case. (#26790) (#26989)

backport #26790

* [test](information_schema)append information_schema external_table_p0 case. (#27029)

backport : #26846

* [fix](parquet) compressed_page_size has the same meaning in page v1 and v2 (#26783) (#26922)

backport #26783

* [BugFix](JDBC Catalog) fix jdbc catalog query bitmap may cause be core sometimes (#26933) (#27018)

* [Enhance](regression) skip test_information_schema_external (#27058)

* [improvement](pipeline) task group scan entity (#19924) (#27040)

Co-authored-by: Lijia Liu <liutang123@yeah.net>

* [opt](pipeline) Return InternalError to FE instead of doing a useless DCHECK in ExecNode #27035 (#27057)

Effect: Client will see error message like below when BE meeting plan logical error.

RROR 1105 (HY000): errCode = 2, detailMessage = ([xxx]())[CANCELLED]Logical error during processing VNewOlapScanNode(dr_case_tag), output of projections 2 mismatches with exec node output 3

* [fix](nereids)Fix nereids fail to parse tablesample bug (#26982)

backport #26981

* [branch2.0](test) fix external table test case with nested type display (#27092)

* [fix](load) skip cancel already cancelled channels (#27109)

* [fix](Nereids) store user variable in connect context (#26655) (#26920)

pick from master #26655

1. user variable should be case insensitive
2. user variable should be cleared after the connection reset

* [test](parquet)append parquet reader byte_array_decimal and rle_bool case (#26751) (#27026)

backport #26751

* [fix](nereids) support uncorrelated subquery in join condition (#26893)

pick from master #26672 
commit id: 17b1108

* [Bug](pipeline) try fix the exchange sink buffer result error (#27087)

* [fix](function)return NULL rather than 'null' if path not found #25880 (#26823)

* [enhancement](nereids)make error message more readable when bind logicalRepeat node #26744 (#26895)

* [regression](delete) add delete case for every type (#26961)

* [branch-2.0](paimon)disable paimon decimal case (#26971)

* [regression](partial update) Add row store cases for all existing partial update cases #26924 (#27017)

* [fix](statistics) fix updated rows incorrect due to typo in code #26979 (#27034)

* [fix](typo) Use minutes as auto analyze schedule interval #26968 (#27041)

* [Improvement](function) opt for case when #23068 (#27054)

* [fix](planner)scan node should project all required expr from parent node #26886 (#27096)

* [fix](nereids)count in correlated subquery shoud not output null value #27064 (#27097)

* [fix](load) add lock in active_memtable_mem_consumption #25207 (#27100)

* [branch-2.0](suites) Enable test_cast_with_scale_type since Nereids is ON (#26986)

* [Fix](multi-catalog) Fix NPE when replaying hms events #26803 (#26997)

Co-authored-by: wangxiangyu <wangxiangyu@360shuke.com>

* [Opt](scanner-scheduler) Optimize `BlockingQueue`, `BlockingPriorityQueue` and change remote scan thread pool #26784 (#27053)

- Optimize `BlockingQueue`, `BlockingPriorityQueue` by swapping `notify` and `unlock` to reduce lock competition. Ref: https://www.boost.org/doc/libs/1_54_0/boost/thread/sync_bounded_queue.hpp
- Change remote scan thread pool to `PriorityQueue`.

* [fix](errmsg) fix multiple FE processes start err msg (#27009) (#27080)

* [FIX](regresstest) fix test_load_with_map_nested_array csv for id #27105 (#27107)

* [FIX](map)fix map nested decimal with element at #27030 (#27110)

* [feature](tvf)(jni-avro)jni-avro scanner add complex data types (#26236) (#26731)

* [fix](nereids)fix bug that query infomation_schema.rowsets fe send fragment to one of muilti be. (#27025) (#27090)

Fixed the bug of incomplete query results when querying information_schema.rowsets in the case of multiple BEs.

The reason is that the schema scanner sends the scan fragment to one of multiple bes, and be queries the information of fe through rpc. Since the rowsets information requires information about all BEs, the scan fragment needs to be sent to all BEs.

* [Config](statistics)Set enable_auto_analyze default value to true. #27146

* [branch2.0](test) fix doris jdbc catalog test case (#27150)

1. Fix doris_jdbc_catalog test case out file
2. Add log to debug 2 unstable test cases: pg_jdbc_catalog and oracle_jdbc_catalog

* [bugfix](tablet)fix the tablet will be deleted when clone due to concurrency #25784 (#26777)

* [fix](sink) crash caused by wild pointer of counter in VDataStreamSender (#26947) (#27148)

If preparation fails, the counter _peak_memory_usage_counter will be a wild pointer.

*** SIGSEGV address not mapped to object (@0x454d49545f) received by PID 16992 (TID 18856 OR 0x7f4d05444700) from PID 1296651359; stack trace: ***
 0# doris::signal::(anonymous namespace)::FailureSignalHandler(int, siginfo_t*, void*) at /root/doris/be/src/common/signal_handler.h:417
 1# os::Linux::chained_handler(int, siginfo*, void*) in /app/doris/Nexchip-doris-1.2.4.2-bin-x86_64/java8/jre/lib/amd64/server/libjvm.so
 2# JVM_handle_linux_signal in /app/doris/Nexchip-doris-1.2.4.2-bin-x86_64/java8/jre/lib/amd64/server/libjvm.so
 3# signalHandler(int, siginfo*, void*) in /app/doris/Nexchip-doris-1.2.4.2-bin-x86_64/java8/jre/lib/amd64/server/libjvm.so
 4# 0x00007F55C85B9400 in /lib64/libc.so.6
 5# doris::vectorized::VDataStreamSender::close(doris::RuntimeState*, doris::Status) at /root/doris/be/src/vec/sink/vdata_stream_sender.cpp:734
 6# doris::PlanFragmentExecutor::close() at /root/doris/be/src/runtime/plan_fragment_executor.cpp:543
 7# doris::PlanFragmentExecutor::~PlanFragmentExecutor() at /root/doris/be/src/runtime/plan_fragment_executor.cpp:95
 8# doris::FragmentExecState::~FragmentExecState() at /root/doris/be/src/runtime/fragment_mgr.cpp:112
 9# std::_Sp_counted_ptr<doris::FragmentExecState*, (__gnu_cxx::_Lock_policy)2>::_M_dispose() at /root/ldb/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/shared_ptr_base.h:348
10# doris::FragmentMgr::exec_plan_fragment(doris::TExecPlanFragmentParams const&, std::function<void (doris::RuntimeState*, doris::Status*)> const&) at /root/doris/be/src/runtime/fragment_mgr.cpp:855
11# doris::FragmentMgr::exec_plan_fragment(doris::TExecPlanFragmentParams const&) at /root/doris/be/src/runtime/fragment_mgr.cpp:592
12# doris::PInternalServiceImpl::_exec_plan_fragment_impl(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, doris::PFragmentRequestVersion, bool) at /root/doris/be/src/service/internal_service.cpp:463
13# doris::PInternalServiceImpl::_exec_plan_fragment_in_pthread(google::protobuf::RpcController*, doris::PExecPlanFragmentRequest const*, doris::PExecPlanFragmentResult*, google::protobuf::Closure*) at /root/doris/be/src/service/internal_service.cpp:305
14# doris::WorkThreadPool<false>::work_thread(int) at /root/doris/be/src/util/work_thread_pool.hpp:160
15# execute_native_thread_routine at ../../../../../libstdc++-v3/src/c++11/thread.cc:84
16# start_thread in /lib64/libpthread.so.0
17# clone in /lib64/libc.so.6

* [improvement](log) log desensitization without displaying user info (#26912) (#27167)

* [branch-2.0](cherry-pick)  add chunked transfer json test (#26902) (#27164)

* [fix](statistics)Fix alter column stats bug (#27093) (#27189)

backport #27093

* (fix)[schema change] fix incorrect setting of schema change jobstate when replay editlog (#26992) (#27139)

* [fix](jni) avoid BE crash and NPE when close paimon reader #27129 (#27204)

bp #27129

* [enhancement](jdbc catalog) Add lowercase column name mapping to Jdbc data source & optimize database and table mapping #27124 (#27130)

* [case] Load json data with enable_simdjson_reader=false (#26601) (#27158)

Co-authored-by: HowardQin <hao.qin@esgyn.cn>

* [fix](function) fix error when use negative number in explode_numbers #27020 (#27180)

* [fix](iceberg) iceberg use customer method to encode special characters of field name (#27108) (#27205)

Fix two bugs:
1. Missing column is case sensitive, change the column name to lower case in FE for hive/iceberg/hudi
2. Iceberg use custom method to encode special characters in column name. Decode the column name to match the right column in parquet reader.

* [enhancement](binlog)  Add dbName && tableName in CreateTableRecord (#26901) (#27208)

* [Branch2.0](Export) add show export regression testes #27140 (#27160)

* [log](tablet invert)  add preconditition check failed log (#26770) (#27171)

* [branch-2.0](publish version) publish version task no need return VERSION_NOT_EXIST #27005 (#27174)

* [minor](stats) Add start/end time for analyze job, precise to seconds of TableStats update time #27123 (#27185)

* [test](regression) Add more alter stmt regression case (#26988) (#27193)

* [test](external_table_p0)append log in external_table_p0 for debug unknown table case #27212 (#27213)

* [Improve](txn) Add some fuzzy test stub in txn (#26712) (#27144)

* [branch-2.0](fe ut) fix decommission test #27082 (#27175)

* [Fix](multi-catalog) Fix complex type crash when using dict filter facility in the parquet-reader. (#27151) (#27187)

- Fix complex type crash when using the dict filter facility in the parquet-reader by turning off the dict filter facility in this case.
- Add orc complex types regression test.

* [Optimize](point query) clear names to reduce mem consumption and cpu cost related to block column name (#26931) (#27157)

* [fix](fe) Fix `enable_nereids_planner` forward not take effect (#26782) (#27159)

* The java reflection method `getFields()` only return public fields,
  but enable_nereids_planner is private

* [fix](fe ut) Fix borrow oject throw npe (#27072) (#27207)

occasional failure of fe ut, borrowObject throw npe
```
get agent task request. type: CREATE, signature: 10008, fe addr: null
java.lang.NullPointerException
	at java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:936)
	at org.apache.commons.pool2.impl.GenericKeyedObjectPool.register(GenericKeyedObjectPool.java:1079)
	at org.apache.commons.pool2.impl.GenericKeyedObjectPool.borrowObject(GenericKeyedObjectPool.java:347)
get agent task request. type: CREATE, signature: 10012, fe addr: TNetworkAddress(hostname:127.0.0.1, port:56072)
	at org.apache.commons.pool2.impl.GenericKeyedObjectPool.borrowObject(GenericKeyedObjectPool.java:277)
	at org.apache.doris.common.GenericPool.borrowObject(GenericPool.java:99)
	at org.apache.doris.utframe.MockedBackendFactory$DefaultBeThriftServiceImpl$1.run(MockedBackendFactory.java:219)
	at java.lang.Thread.run(Thread.java:750)
```

* [regression](conf) Make checkpoint/clean thread trigger more frequent (#26883) (#27194)

* When run p0, we want some checkpoint/clean thread in FE work more
  frequently

* [hotfix](priv) Fix restore snapshot user priv with add cluster in UserIdentity (#26969) (#27210)

Signed-off-by: Jack Drogon <jack.xsuperman@gmail.com>

* [branch-2.0](fe ut) fix unstable test DecommissionBackendTest  (#27173)

* [fix](disk migrate) migrate ignore not exists tablet (#26779) (#27172)

* [fix](build)macos clang 15 version compilation error (#25457)

* [fix](tablet sched) fix sched delete stale remain replica (#27050) (#27179)

* Revert "[Branch2.0](Export) add show export regression testes #27140 (#27160)" (#27217)

This reverts commit d76581d, since it caused test_show_export testcase fail.

* Revert "[test](regression) Add more alter stmt regression case (#26988) (#27193)" (#27216)

This reverts commit 42d4806, since it caused test_alter_table_drop_column and test_alter_table_modify_column testcases fail.

* [fix](nereids)remove literal partition by and order by expression in window function #26899 (#27214)

* [fix](agg) fix coredump of multi distinct of decimal128I (#27014) (#27228)

* [fix](agg) fix coredump of multi distinct of decimal128

* fix

* Revert "[enhancement](jdbc catalog) Add lowercase column name mapping to Jdbc data source & optimize database and table mapping #27124 (#27130)" (#27230)

This reverts commit 087fccd.

* [feature](Nereids): eliminate sort under subquery (#26993) (#27218)

* [fix](ccr) Mark getBinlog,getBinlogLag,getMeta,getBackendMeta as from master (#27211) (#27227)

Signed-off-by: Jack Drogon <jack.xsuperman@gmail.com>

* [fix](Nereids): NullSafeEqual should be in HashJoinCondition #27127 (#27232)

* [fix](planner)the data type should be the same between input slot and sort slot #27137 (#27215)

* [branch2.0](nereids)Pick #26873 #25769: partition prune fix (#27222)

* [improvement](fe and broker) support specify broker to getSplits, check isSplitable, file scan for HMS Multi-catalog (#24830) (#27236)

bp #24830

* [fix](fe ut) fix unstable ut TabletRepairAndBalanceTest (#27044) (#27239)

* [minor](stats) Report error with more friendly meesage when timeout #27197 (#27240)

* [fix](build index) Fix inverted index hardlink leak and missing problem #26903 (#27244)

* [fix](multi-catalog)add the max compute fe ut and fix download expired  #27007 (#27220)

bp #27007

* [cherry-pick](regression) add hms catalog broker scan case (#25453) (#27253)

* [cherry-pick](fe) select BE local broker to scan Hive table when 'broker.name' in hms catalog is specified (#27122) (#27252)

Since #24830 introduce `broker.name` in hms catalog, data scan will run on specified brokers.
And [doris operator](https://github.com/selectdb/doris-operator) support BE and broker deployed in same pod, BE access local broker is the fastest approach to access data.
In previous logic, every inputSplit will select one BE to execute,  then randomly select one broker for actual data access, BE and related broker are always located on  separate K8S pod.
This pr optimizes the broker select strategy to prioritize BE-local broker when `broker.name` is specified in hms catalog.

* [improvement](statistics)Use count as ndv for unique/agg olap table single key column (#27186) (#27275)

Single key column of unique/agg olap table has the same value of count and ndv, for this kind of column,
don't need to calculate ndv, simply use count as ndv.
backport #27186

* [minor](stats) Fix potential npe when loading stats #27200 (#27241)

* [fix](tablesample) Fix computeSampleTabletIds NullPointerException (#27165) (#27258)

* [fix](partial update) keep case insensitivity and use the columns' origin names in partialUpdateCols in origin planner #27223 (#27255)

* [chore](fix) sync check-pr-if-need-run-build.sh with master branch (#27250)

* [fix](compile) fix BE compile failure on Mac (#27206) (#27281)

* [chore](clucene) coverage compilation option added #27162 (#27284)

* [FIX]Fix complex type meta schema in information database  #27203 (#27286)

* [feature](Nereids): Pushdown LimitDistinct Through Join (#25113) (#27288)

Push down limit-distinct through left/right outer join or cross join.
such as select t1.c1 from t1 left join t2 on t1.c1 = t2.c1 order by t1.c1 limit 1;

* [fix](inverted index) reset fs_writer to nullptr before throw exception (#27202) (#27289)

* [fix](planner)output slot should be materialized as intermediate slot in agg node #27282 (#27285)

* [FIX](complextype)Fix complex nested and add regress test #26973 (#27293)

* [fix](test) disable forbid_unknown_col_stats (#27303)

* [fix](stats) Release analyze tasks once job finished #27310 (#27309)

* [doc](fix) a new docs for k8s deploy by operator to 2.0 (#26927)

* [doc](fix) fix date trunc doc (#27320)

* [Fix](statistics)Fix analyze sql including key word bug  #27321 (#27322)

backport #27321

* [cherry-pick](function) improve compoundPred optimization work with children is nullable #26160 (#27354)

* Revert "[improvement](routine-load) add routine load rows check (#25818)" (#27336)

* [refactor](planner) filter empty partitions in a unified location (#27190) (#27256)

* [fix](hms) fix compatibility issue of hive metastore client #27327 (#27328)

* [Branch2.0](Export) add show export regression testes (#27330)

* [fix](stats) Fix thread leaks when doing checkpoint #27334 #27335

* [fix](stats) Fix creating too many tasks on new env (#27362)

* [fix](build index) fix core when build index for a new column which without data (#27276)

* change version to 2.0.3-rc04 (#27392)

* fix merge

* update clucene

---------

Signed-off-by: freemandealer <freeman.zhang1992@gmail.com>
Signed-off-by: Jack Drogon <jack.xsuperman@gmail.com>
Co-authored-by: Gabriel <gabrielleebuaa@gmail.com>
Co-authored-by: Xinyi Zou <zouxinyi02@gmail.com>
Co-authored-by: wuwenchi <wuwenchihdu@hotmail.com>
Co-authored-by: starocean999 <40539150+starocean999@users.noreply.github.com>
Co-authored-by: deardeng <565620795@qq.com>
Co-authored-by: slothever <18522955+wsjz@users.noreply.github.com>
Co-authored-by: HappenLee <happenlee@hotmail.com>
Co-authored-by: Jibing-Li <64681310+Jibing-Li@users.noreply.github.com>
Co-authored-by: Mryange <59914473+Mryange@users.noreply.github.com>
Co-authored-by: zzzxl <33418555+zzzxl1993@users.noreply.github.com>
Co-authored-by: abmdocrt <Yukang.Lian2022@gmail.com>
Co-authored-by: HHoflittlefish777 <77738092+HHoflittlefish777@users.noreply.github.com>
Co-authored-by: zhengyu <freeman.zhang1992@gmail.com>
Co-authored-by: Dongyang Li <hello_stephen@qq.com>
Co-authored-by: walter <w41ter.l@gmail.com>
Co-authored-by: morrySnow <101034200+morrySnow@users.noreply.github.com>
Co-authored-by: Kang <kxiao.tiger@gmail.com>
Co-authored-by: Jack Drogon <jack.xsuperman@gmail.com>
Co-authored-by: jakevin <jakevingoo@gmail.com>
Co-authored-by: zhiqiang <seuhezhiqiang@163.com>
Co-authored-by: Mingyu Chen <morningman.cmy@gmail.com>
Co-authored-by: Tiewei Fang <43782773+BePPPower@users.noreply.github.com>
Co-authored-by: meiyi <myimeiyi@gmail.com>
Co-authored-by: airborne12 <airborne08@gmail.com>
Co-authored-by: TengJianPing <18241664+jacktengg@users.noreply.github.com>
Co-authored-by: minghong <englefly@gmail.com>
Co-authored-by: shuke <37901441+shuke987@users.noreply.github.com>
Co-authored-by: walter <patricknicholas@foxmail.com>
Co-authored-by: zhannngchen <48427519+zhannngchen@users.noreply.github.com>
Co-authored-by: Ashin Gau <AshinGau@users.noreply.github.com>
Co-authored-by: daidai <2017501503@qq.com>
Co-authored-by: amory <wangqiannan@selectdb.com>
Co-authored-by: AlexYue <yj976240184@gmail.com>
Co-authored-by: seawinde <149132972+seawinde@users.noreply.github.com>
Co-authored-by: JingDas <114388747+JingDas@users.noreply.github.com>
Co-authored-by: zclllyybb <zhaochangle@selectdb.com>
Co-authored-by: bobhan1 <bh2444151092@outlook.com>
Co-authored-by: Jeffrey <color.dove@gmail.com>
Co-authored-by: zhangstar333 <87313068+zhangstar333@users.noreply.github.com>
Co-authored-by: Qi Chen <kaka11.chen@gmail.com>
Co-authored-by: KassieZ <139741991+KassieZ@users.noreply.github.com>
Co-authored-by: Mingyu Chen <morningman@163.com>
Co-authored-by: HB <hubiao01@corp.netease.com>
Co-authored-by: Pxl <pxl290@qq.com>
Co-authored-by: 谢健 <jianxie0@gmail.com>
Co-authored-by: yujun <yu.jun.reach@gmail.com>
Co-authored-by: Guangdong Liu <liugddx@gmail.com>
Co-authored-by: zfr95 <87513668+zfr9527@users.noreply.github.com>
Co-authored-by: chen <czjourney@163.com>
Co-authored-by: TsukiokaKogane <cby141994@gmail.com>
Co-authored-by: chenboyang.922 <chenboyang.922@bytedance.com>
Co-authored-by: ryanzryu <143597717+ryanzryu@users.noreply.github.com>
Co-authored-by: Siyang Tang <82279870+TangSiyang2001@users.noreply.github.com>
Co-authored-by: zy-kkk <zhongyk10@gmail.com>
Co-authored-by: Yongqiang YANG <98214048+dataroaring@users.noreply.github.com>
Co-authored-by: Lei Zhang <27994433+SWJTU-ZhangLei@users.noreply.github.com>
Co-authored-by: Jerry Hu <mrhhsg@gmail.com>
Co-authored-by: qiye <jianliang5669@gmail.com>
Co-authored-by: AKIRA <33112463+Kikyou1997@users.noreply.github.com>
Co-authored-by: Liqf <109049295+LemonLiTree@users.noreply.github.com>
Co-authored-by: LiBinfeng <46676950+LiBinfeng-01@users.noreply.github.com>
Co-authored-by: zhangguoqiang <18372634969@163.com>
Co-authored-by: stephen <hello-stephen@qq.com>
Co-authored-by: GoGoWen <82132356+GoGoWen@users.noreply.github.com>
Co-authored-by: wangbo <wangbo@apache.org>
Co-authored-by: Lijia Liu <liutang123@yeah.net>
Co-authored-by: Kaijie Chen <ckj@apache.org>
Co-authored-by: Yulei-Yang <yulei.yang0699@gmail.com>
Co-authored-by: lsy3993 <110876560+lsy3993@users.noreply.github.com>
Co-authored-by: zhangdong <493738387@qq.com>
Co-authored-by: Xiangyu Wang <dut.xiangyu@gmail.com>
Co-authored-by: wangxiangyu <wangxiangyu@360shuke.com>
Co-authored-by: wudongliang <46414265+DongLiang-0@users.noreply.github.com>
Co-authored-by: Houliang Qi <neuyilan@163.com>
Co-authored-by: Luwei <814383175@qq.com>
Co-authored-by: HowardQin <hao.qin@esgyn.cn>
Co-authored-by: jiafeng.zhang <zhangjf1@gmail.com>
Co-authored-by: DuRipeng <453243496@qq.com>
Co-authored-by: catpineapple <42031973+catpineapple@users.noreply.github.com>
Co-authored-by: YueW <45946325+Tanya-W@users.noreply.github.com>
seawinde pushed a commit to seawinde/doris that referenced this pull request Nov 28, 2023
…27127)

Originally, we just put `EqualTo` in `HashJoinCondition`, we also need to allow `NullSafeEqual`
@xiaokang xiaokang mentioned this pull request Dec 4, 2023
XuJianxu pushed a commit to XuJianxu/doris that referenced this pull request Dec 14, 2023
…27127)

Originally, we just put `EqualTo` in `HashJoinCondition`, we also need to allow `NullSafeEqual`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. dev/2.0.3-merged reviewed usercase Important user case type label
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants