Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix](nereids)EliminateGroupByConstant should replace agg's output after removing constant group by keys #32878

Merged
merged 1 commit into from
Mar 27, 2024

Conversation

starocean999
Copy link
Contributor

Proposed changes

Issue Number: close #xxx

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

@starocean999
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 38138 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 1fbbee4c56d6aa27b28270483a00c885c30d2458, data reload: false

------ Round 1 ----------------------------------
q1	17641	6571	4293	4293
q2	2410	170	160	160
q3	11397	1140	1227	1140
q4	10591	735	831	735
q5	7718	3055	3006	3006
q6	212	130	127	127
q7	1055	627	589	589
q8	9583	2065	1999	1999
q9	7365	6712	6648	6648
q10	8448	3465	3564	3465
q11	441	227	219	219
q12	372	197	195	195
q13	18133	2877	2876	2876
q14	239	206	208	206
q15	511	461	477	461
q16	485	375	376	375
q17	956	517	603	517
q18	7151	6452	6399	6399
q19	2284	1390	1464	1390
q20	541	262	230	230
q21	3515	2858	2824	2824
q22	348	304	284	284
Total cold run time: 111396 ms
Total hot run time: 38138 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4215	4082	4032	4032
q2	327	222	233	222
q3	2972	2815	2851	2815
q4	1844	1526	1543	1526
q5	5316	5346	5281	5281
q6	196	116	117	116
q7	2262	1892	1887	1887
q8	3146	3301	3286	3286
q9	8712	8752	8689	8689
q10	3777	3768	3745	3745
q11	550	444	447	444
q12	720	553	538	538
q13	16901	2857	2849	2849
q14	284	251	254	251
q15	512	464	457	457
q16	490	425	446	425
q17	1719	1496	1453	1453
q18	7462	7275	7080	7080
q19	1610	1496	1544	1496
q20	1909	1708	1734	1708
q21	4662	4722	4726	4722
q22	519	446	462	446
Total cold run time: 70105 ms
Total hot run time: 53468 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 181931 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 1fbbee4c56d6aa27b28270483a00c885c30d2458, data reload: false

query1	935	370	350	350
query2	6537	1974	1868	1868
query3	6750	225	224	224
query4	31885	21342	21320	21320
query5	4259	397	394	394
query6	271	179	180	179
query7	4635	297	294	294
query8	225	164	170	164
query9	9389	2280	2283	2280
query10	569	252	266	252
query11	17201	14239	14256	14239
query12	139	91	91	91
query13	1632	449	424	424
query14	9409	8044	7841	7841
query15	253	201	194	194
query16	8180	261	270	261
query17	1942	577	557	557
query18	2114	292	297	292
query19	350	162	166	162
query20	92	89	89	89
query21	202	134	136	134
query22	5030	4829	4825	4825
query23	33581	32961	32774	32774
query24	11043	2914	2872	2872
query25	632	363	372	363
query26	1410	152	156	152
query27	3002	352	362	352
query28	7386	1872	1886	1872
query29	909	617	607	607
query30	303	148	144	144
query31	970	721	751	721
query32	94	58	54	54
query33	756	255	238	238
query34	1070	488	484	484
query35	826	608	602	602
query36	997	918	910	910
query37	126	63	66	63
query38	3636	3411	3439	3411
query39	1464	1455	1438	1438
query40	209	111	110	110
query41	52	44	51	44
query42	105	104	94	94
query43	495	428	434	428
query44	1181	722	739	722
query45	274	246	245	245
query46	1113	687	693	687
query47	1921	1870	1855	1855
query48	441	362	355	355
query49	1125	339	322	322
query50	765	374	368	368
query51	6798	6647	6479	6479
query52	111	89	93	89
query53	339	271	281	271
query54	292	243	238	238
query55	94	89	79	79
query56	265	237	223	223
query57	1209	1130	1140	1130
query58	237	210	245	210
query59	2821	2611	2482	2482
query60	272	239	250	239
query61	104	113	111	111
query62	663	449	451	449
query63	306	276	279	276
query64	5655	4139	4098	4098
query65	3112	3051	3026	3026
query66	1446	388	374	374
query67	15244	15083	14974	14974
query68	5859	547	530	530
query69	589	391	380	380
query70	1262	1183	1143	1143
query71	449	276	275	275
query72	6834	2855	2662	2662
query73	719	317	325	317
query74	6838	6358	6338	6338
query75	3158	2203	2173	2173
query76	3766	885	890	885
query77	613	258	263	258
query78	10894	10130	10118	10118
query79	9308	530	536	530
query80	1498	365	372	365
query81	552	225	217	217
query82	865	85	84	84
query83	220	144	146	144
query84	284	82	76	76
query85	1513	327	307	307
query86	463	316	289	289
query87	3745	3501	3547	3501
query88	4911	2369	2361	2361
query89	513	365	361	361
query90	1973	176	176	176
query91	173	135	131	131
query92	71	47	48	47
query93	7157	504	490	490
query94	1153	174	175	174
query95	431	328	325	325
query96	616	271	276	271
query97	2717	2467	2459	2459
query98	224	218	206	206
query99	1285	928	869	869
Total cold run time: 307506 ms
Total hot run time: 181931 ms

@doris-robot
Copy link

Load test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'

Load test result on commit 1fbbee4c56d6aa27b28270483a00c885c30d2458 with default session variables
Stream load json:         18 seconds loaded 2358488459 Bytes, about 124 MB/s
Stream load orc:          58 seconds loaded 1101869774 Bytes, about 18 MB/s
Stream load parquet:      32 seconds loaded 861443392 Bytes, about 25 MB/s
Insert into select:       13.8 seconds inserted 10000000 Rows, about 724K ops/s

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Mar 27, 2024
Copy link
Contributor

PR approved by at least one committer and no changes requested.

Copy link
Contributor

PR approved by anyone and no changes requested.

@yiguolei yiguolei merged commit 4303934 into apache:master Mar 27, 2024
28 of 31 checks passed
Jibing-Li added a commit that referenced this pull request Mar 29, 2024
* [fix](merge cloud) Fix cloud be set be tag map (#32864)

* [chore] Add gavinchou to collaborators (#32881)

* [chore](show) support statement to show views from table (#32358)

MySQL [test]> show views;
+----------------+
| Tables_in_test |
+----------------+
| t1_view        |
| t2_view        |
+----------------+
2 rows in set (0.00 sec)

MySQL [test]> show views like '%t1%';
+----------------+
| Tables_in_test |
+----------------+
| t1_view        |
+----------------+
1 row in set (0.01 sec)

MySQL [test]> show views where create_time > '2024-03-18';
+----------------+
| Tables_in_test |
+----------------+
| t2_view        |
+----------------+
1 row in set (0.02 sec)

* [Enhancement](ranger) Disable some permission operations when Ranger or LDAP are enabled (#32538)

Disable some permission operations when Ranger or LDAP are enabled.

* [chore](ci) exclude unstable trino_connector case (#32892)

Co-authored-by: stephen <hello-stephen@qq.com>

* [fix](Nereids) NPE when create table with implicit index type (#32893)

* [improvement](mtmv) Support more join types for query rewriting by materialized view (#32685)

This pattern of rewriting is supported for multi-table joins and supported join types is as following:

INNER JOIN
LEFT OUTER JOIN
RIGHT OUTER JOIN
FULL OUTER JOIN
LEFT SEMI JOIN
RIGHT SEMI JOIN
LEFT ANTI JOIN
RIGHT ANTI JOIN

* [Serde](Variant) support arrow serialization for varint type (#32780)

* [fix](multicatalog) fix no data error when read hive table on cosn (#32815)

Currently, when reading a hive on cosn table, doris return empty result, but the table has data.
iceberg on cosn is ok.
The reason is misuse of cosn's file sytem. according to cosn's doc, its fs.cosn.impl should be org.apache.hadoop.fs.CosFileSystem

* [fix](nereids)EliminateGroupByConstant should replace agg's output after removing constant group by keys (#32878)

* [Fix](executor)Fix regression test for test_active_queries/test_backend_active_tasks #32899

* [fix](iceberg) fix iceberg catalog bug and p2 test cases (#32898)

1. Fix iceberg catalog bug

    This PR #30198 change the logic of `IcebergHMSExternalCatalog.java`,
    to get locationUrl by calling hive metastore's `getCatalog()` method.
    But this method only exists in hive 3+. So it will fail if we using hive 2.x.

    I temporary remove this logic, because this logic is only used from iceberg table writing.
    Which is still under development. We will rethink this logic later.

2. Fix test cases

    Some of P2 test cases missed `order_qt`. And because the output format of the floating point
    type is changed, some result in `out` files need to be regenerated.

* [revert](jni) revert part of #32455 (#32904)

* [fix](spill) Avoid releasing resources while spill tasks are executing (#32783)

* [chore](log) print query id before logging profile in be.INFO (#32922)

* [fix](grace-exit) Stop incorrectly of reportwork cause heap use after free #32929

* [improvement](decommission be) decommission check replica num (#32748)

* [fix](arrow-flight) Fix reach limit of connections error (#32911)

Fix Reach limit of connections error
in fe.conf , arrow_flight_token_cache_size is mandatory less than qe_max_connection/2. arrow flight sql is a stateless protocol, connection is usually not actively disconnected, bearer token is evict from the cache will unregister ConnectContext.

Fix ConnectContext.command not be reset to COM_SLEEP in time, this will result in frequent kill connection after query timeout.

Fix bearer token evict log and exception.

TODO: use arrow flight session: https://mail.google.com/mail/u/0/#inbox/FMfcgzGxRdxBLQLTcvvtRpqsvmhrHpdH

* [bugfix](cloud) few variable not initialized (#32868)

../../cloud/src/recycler/meta_checker.cpp
can cause uninitialised memory read.

* [fix](arrow-flight) Fix arrow flight sql compatible with JDK 17 and upgrade arrow 15.0.2 (#32796)

--add-opens=java.base/java.nio=ALL-UNNAMED, see: https://arrow.apache.org/docs/java/install.html#java-compatibility
groovy use flight sql connection to execute query SUM(MAX(c1) OVER (PARTITION BY)) report error: AGGREGATE clause must not contain analytic expressions, but no problem in Java execute it with jdbc::arrow-flight-sql.
groovy not support print arrow array type, throw IndexOutOfBoundsException.
"arrow_flight_sql" not support two phase read
./run-regression-test.sh --run --clean -g arrow_flight_sql

* [fix](spill) SpillStream's writer maybe may not have been finalized (#32931)

* [improvement](spill) Disable DistinctStreamingAgg when spill is enabled (#32932)

* [Improve](inverted_index) update clucene and improve array inverted index writer  (#32436)

* [Performance](exec) replace SipHash in function by XXHash (#32919)

* [feature](agg) add aggregate function sum0 (#32541)

* [improvement](mtmv) Support to get tables in materialized view when collecting table in plan (#32797)

Support to get tables in materialized view when collecting table in plan

table scehma as fllowing:

create materialized view mv1
BUILD IMMEDIATE REFRESH COMPLETE ON MANUAL
DISTRIBUTED BY RANDOM BUCKETS 1 
PROPERTIES ('replication_num' = '1')
 as 
select 
  t1.c1, 
  t3.c2 
from 
  table1 t1 
  inner join table3 t3 on t1.c1 = t3.c2

if get table from the plan as follwoing, we can get [table1, table3, table2], the mv1 is expanded to get base tables;

SELECT 
  mv1.*, 
  uuid() 
FROM 
  mv1 LEFT SEMI 
  JOIN table2 ON mv1.c1 = table2.c1 
WHERE 
  mv1.c1 IN (
    SELECT 
      c1 
    FROM 
      table2
  ) 
  OR mv1.c1 < 10

* [enhance](mtmv)support olap table partition column is null (#32698)

* [enhancement](cloud) add table version to cloud (#32738)

Add table version to cloud.

In Fe:
Get: If Fe is cloud mode, get table version from meta service.
Update: Op drop/replace temp partition, commit transaction.

In meta service:
Add: create Index. init value is 1.
Remove: by recycler.
Update: commit/drop partition rpc, commit txn rpc. Atomic++.

* [fix](cloud) schema change from not null to null (#32913)

1. Use equals instead of == for type comparing
2. null bitmap size is reisze by size of ref column.

* [feature](Nereids): add ColumnPruningPostProcessor. (#32800)

* [case](rowpolicy)fix row policy has been exist (#32880)

* [fix](pipeline) fix use error row desc when origin block clear (#32803)

* [fix](Nereids) support variant column with index when create table (#32948)

* [opt](Nereids) support create table with variant type (#32953)

* [test](insert-overwrite) Add insert overwrite auto detect concurrency cases (#32935)

* [fix](compile) fe cannot compile in idea (#32955)

* [enhancement](plsql) Support select * from routines (#32866)

Support show of plsql procedure using select * from routines.

* [fix](trino-connector) fix `NoClassDefFoundError` of hudi `Utils` class (#32846)

Due to the change of this PR #32455 , the `trino-connector-scanner` package cannot access the `hudi_scanner` package, so the exception NoclassDeffounderror will appear.

We need to write a separate Utils class.

* [exec](column) change some complex column move to noexcept (#32954)

* [Enhancement](data skew) extends show data skew (#32732)

* [chore](test) let suite compatible with Nereids (#32964)

* Support identical column name in different index. (#32792)

* Limit the max string length to 1024 while collecting column stats to control BE memory usage. (#32470)

* [fix](merge-iterator) fix NOT_IMPLEMENTED_ERROR when read next block view (#32961)

* [improvement](executor)Add tag property for workload group #32874

* [fix](auth)unified workload and resource permission logic (#32907)

- `Grant resource` can no longer grant global `usage_priv`
-  `grant resource %` instead of `grant resource *`

before change:
```
grant usage_priv on resource * to f;
show grants for f\G
*************************** 1. row ***************************
      UserIdentity: 'f'@'%'
           Comment: 
          Password: No
             Roles: 
       GlobalPrivs: Usage_priv 
      CatalogPrivs: NULL
     DatabasePrivs: internal.information_schema: Select_priv ; internal.mysql: Select_priv 
        TablePrivs: NULL
          ColPrivs: NULL
     ResourcePrivs: NULL
 CloudClusterPrivs: NULL
WorkloadGroupPrivs: normal: Usage_priv 
```
after change
```
grant usage_priv on resource '%' to f;
show grants for f\G
*************************** 1. row ***************************
      UserIdentity: 'f'@'%'
           Comment: 
          Password: No
             Roles: 
       GlobalPrivs: NULL
      CatalogPrivs: NULL
     DatabasePrivs: internal.information_schema: Select_priv ; internal.mysql: Select_priv 
        TablePrivs: NULL
          ColPrivs: NULL
     ResourcePrivs: %: Usage_priv 
 CloudClusterPrivs: NULL
WorkloadGroupPrivs: normal: Usage_priv 

```

---------

Co-authored-by: yujun <yu.jun.reach@gmail.com>
Co-authored-by: Gavin Chou <gavineaglechou@gmail.com>
Co-authored-by: xy720 <22125576+xy720@users.noreply.github.com>
Co-authored-by: yongjinhou <109586248+yongjinhou@users.noreply.github.com>
Co-authored-by: Dongyang Li <hello_stephen@qq.com>
Co-authored-by: stephen <hello-stephen@qq.com>
Co-authored-by: morrySnow <101034200+morrySnow@users.noreply.github.com>
Co-authored-by: seawinde <149132972+seawinde@users.noreply.github.com>
Co-authored-by: lihangyu <15605149486@163.com>
Co-authored-by: Yulei-Yang <yulei.yang0699@gmail.com>
Co-authored-by: starocean999 <40539150+starocean999@users.noreply.github.com>
Co-authored-by: wangbo <wangbo@apache.org>
Co-authored-by: Mingyu Chen <morningman@163.com>
Co-authored-by: Jerry Hu <mrhhsg@gmail.com>
Co-authored-by: zhiqiang <seuhezhiqiang@163.com>
Co-authored-by: Xinyi Zou <zouxinyi02@gmail.com>
Co-authored-by: Vallish Pai <vallishpai@gmail.com>
Co-authored-by: amory <wangqiannan@selectdb.com>
Co-authored-by: HappenLee <happenlee@hotmail.com>
Co-authored-by: Jensen <czjourney@163.com>
Co-authored-by: zhangdong <493738387@qq.com>
Co-authored-by: Yongqiang YANG <98214048+dataroaring@users.noreply.github.com>
Co-authored-by: jakevin <jakevingoo@gmail.com>
Co-authored-by: Mryange <59914473+Mryange@users.noreply.github.com>
Co-authored-by: zclllyybb <zhaochangle@selectdb.com>
Co-authored-by: Tiewei Fang <43782773+BePPPower@users.noreply.github.com>
Co-authored-by: Xin Liao <liaoxinbit@126.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. dev/2.0.8-merged dev/2.1.1-merged reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants