Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix](schemachange) schema change from not null to null #32913

Merged
merged 1 commit into from
Mar 28, 2024

Conversation

dataroaring
Copy link
Contributor

@dataroaring dataroaring commented Mar 27, 2024

  1. Use equals instead of == for type comparing
  2. null bitmap size is reiszed by size of ref column.

Proposed changes

Issue Number: close #xxx

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

1. Use equals instead of == for type comparing
2. null bitmap size is reisze by size of ref column.
@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

@dataroaring
Copy link
Contributor Author

run buildall

Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 35.25% (8741/24796)
Line Coverage: 27.03% (71544/264685)
Region Coverage: 26.28% (37129/141293)
Branch Coverage: 23.17% (18981/81920)
Coverage Report: http://coverage.selectdb-in.cc/coverage/86b95794fe42c62af8cca85bb75e743a32fa4020_86b95794fe42c62af8cca85bb75e743a32fa4020/report/index.html

@doris-robot
Copy link

TPC-H: Total hot run time: 39537 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 86b95794fe42c62af8cca85bb75e743a32fa4020, data reload: false

------ Round 1 ----------------------------------
q1	17610	4593	4508	4508
q2	2104	150	149	149
q3	10601	1209	1286	1209
q4	10229	849	860	849
q5	7478	3223	3183	3183
q6	220	125	120	120
q7	1079	567	554	554
q8	9344	2172	2144	2144
q9	7633	6974	6868	6868
q10	8515	3571	3709	3571
q11	436	221	220	220
q12	440	203	188	188
q13	17827	2842	2845	2842
q14	246	209	199	199
q15	519	472	458	458
q16	495	376	376	376
q17	979	671	557	557
q18	7061	6428	6423	6423
q19	2368	1528	1619	1528
q20	556	247	253	247
q21	3812	3086	3061	3061
q22	347	283	286	283
Total cold run time: 109899 ms
Total hot run time: 39537 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4480	4469	4435	4435
q2	349	227	227	227
q3	3058	2944	2906	2906
q4	1926	1626	1554	1554
q5	5393	5401	5442	5401
q6	211	115	115	115
q7	2296	1852	1856	1852
q8	3377	3564	3526	3526
q9	8751	8792	8779	8779
q10	3903	3880	3882	3880
q11	559	434	433	433
q12	727	558	558	558
q13	16919	2824	2875	2824
q14	282	252	245	245
q15	507	451	466	451
q16	476	423	424	423
q17	1872	1571	1548	1548
q18	7443	7153	7092	7092
q19	1754	1649	1695	1649
q20	1971	1696	1697	1696
q21	4997	4897	4774	4774
q22	535	456	443	443
Total cold run time: 71786 ms
Total hot run time: 54811 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 181550 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 86b95794fe42c62af8cca85bb75e743a32fa4020, data reload: false

query1	929	351	354	351
query2	6565	1909	1838	1838
query3	6707	217	221	217
query4	31657	21255	21487	21255
query5	4313	396	397	396
query6	274	181	181	181
query7	4622	305	291	291
query8	231	163	175	163
query9	9219	2284	2271	2271
query10	566	240	242	240
query11	17177	14106	14155	14106
query12	142	92	92	92
query13	1620	425	418	418
query14	9840	7961	7800	7800
query15	253	197	202	197
query16	8102	257	270	257
query17	1938	601	567	567
query18	1932	291	286	286
query19	355	163	157	157
query20	101	90	91	90
query21	210	130	124	124
query22	5050	4861	4906	4861
query23	33619	32573	33195	32573
query24	10836	2897	2877	2877
query25	625	389	392	389
query26	1151	154	161	154
query27	2904	345	359	345
query28	7296	1898	1847	1847
query29	933	643	632	632
query30	302	159	151	151
query31	982	729	750	729
query32	104	62	55	55
query33	776	251	254	251
query34	1056	486	494	486
query35	810	618	610	610
query36	1023	866	887	866
query37	134	65	64	64
query38	3612	3483	3419	3419
query39	1456	1444	1448	1444
query40	203	119	114	114
query41	51	45	48	45
query42	103	99	95	95
query43	492	445	445	445
query44	1185	743	740	740
query45	276	268	261	261
query46	1130	695	721	695
query47	1930	1837	1869	1837
query48	449	395	377	377
query49	1141	348	344	344
query50	774	378	379	378
query51	6810	6709	6725	6709
query52	116	92	93	92
query53	353	280	281	280
query54	304	249	232	232
query55	83	81	80	80
query56	251	233	229	229
query57	1231	1164	1132	1132
query58	234	212	199	199
query59	2739	2482	2462	2462
query60	266	236	240	236
query61	95	92	108	92
query62	690	442	453	442
query63	308	278	271	271
query64	5630	4061	4101	4061
query65	3069	3024	3037	3024
query66	869	365	364	364
query67	15323	14991	14735	14735
query68	7152	534	560	534
query69	619	382	382	382
query70	1259	1143	1165	1143
query71	507	270	266	266
query72	6528	2723	2528	2528
query73	735	327	324	324
query74	7994	6428	6399	6399
query75	3501	2228	2260	2228
query76	4903	928	908	908
query77	649	272	256	256
query78	10663	10127	10013	10013
query79	7169	527	535	527
query80	1186	390	394	390
query81	525	219	218	218
query82	865	89	90	89
query83	211	153	150	150
query84	285	84	89	84
query85	1507	385	366	366
query86	409	309	291	291
query87	3737	3468	3580	3468
query88	4763	2425	2432	2425
query89	474	370	366	366
query90	1998	180	177	177
query91	178	150	140	140
query92	63	50	47	47
query93	5390	502	501	501
query94	1115	174	171	171
query95	431	318	323	318
query96	605	271	276	271
query97	2697	2519	2467	2467
query98	227	217	212	212
query99	1190	940	870	870
Total cold run time: 305062 ms
Total hot run time: 181550 ms

@doris-robot
Copy link

Load test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'

Load test result on commit 86b95794fe42c62af8cca85bb75e743a32fa4020 with default session variables
Stream load json:         18 seconds loaded 2358488459 Bytes, about 124 MB/s
Stream load orc:          59 seconds loaded 1101869774 Bytes, about 17 MB/s
Stream load parquet:      31 seconds loaded 861443392 Bytes, about 26 MB/s
Insert into select:       13.7 seconds inserted 10000000 Rows, about 729K ops/s

Copy link
Member

@airborne12 airborne12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

PR approved by anyone and no changes requested.

@dataroaring dataroaring changed the title [fix](cloud) schema change from not null to null [fix](schemachange) schema change from not null to null Mar 28, 2024
@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Mar 28, 2024
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@dataroaring dataroaring merged commit 49875c3 into apache:master Mar 28, 2024
31 of 35 checks passed
Jibing-Li added a commit that referenced this pull request Mar 29, 2024
* [fix](merge cloud) Fix cloud be set be tag map (#32864)

* [chore] Add gavinchou to collaborators (#32881)

* [chore](show) support statement to show views from table (#32358)

MySQL [test]> show views;
+----------------+
| Tables_in_test |
+----------------+
| t1_view        |
| t2_view        |
+----------------+
2 rows in set (0.00 sec)

MySQL [test]> show views like '%t1%';
+----------------+
| Tables_in_test |
+----------------+
| t1_view        |
+----------------+
1 row in set (0.01 sec)

MySQL [test]> show views where create_time > '2024-03-18';
+----------------+
| Tables_in_test |
+----------------+
| t2_view        |
+----------------+
1 row in set (0.02 sec)

* [Enhancement](ranger) Disable some permission operations when Ranger or LDAP are enabled (#32538)

Disable some permission operations when Ranger or LDAP are enabled.

* [chore](ci) exclude unstable trino_connector case (#32892)

Co-authored-by: stephen <hello-stephen@qq.com>

* [fix](Nereids) NPE when create table with implicit index type (#32893)

* [improvement](mtmv) Support more join types for query rewriting by materialized view (#32685)

This pattern of rewriting is supported for multi-table joins and supported join types is as following:

INNER JOIN
LEFT OUTER JOIN
RIGHT OUTER JOIN
FULL OUTER JOIN
LEFT SEMI JOIN
RIGHT SEMI JOIN
LEFT ANTI JOIN
RIGHT ANTI JOIN

* [Serde](Variant) support arrow serialization for varint type (#32780)

* [fix](multicatalog) fix no data error when read hive table on cosn (#32815)

Currently, when reading a hive on cosn table, doris return empty result, but the table has data.
iceberg on cosn is ok.
The reason is misuse of cosn's file sytem. according to cosn's doc, its fs.cosn.impl should be org.apache.hadoop.fs.CosFileSystem

* [fix](nereids)EliminateGroupByConstant should replace agg's output after removing constant group by keys (#32878)

* [Fix](executor)Fix regression test for test_active_queries/test_backend_active_tasks #32899

* [fix](iceberg) fix iceberg catalog bug and p2 test cases (#32898)

1. Fix iceberg catalog bug

    This PR #30198 change the logic of `IcebergHMSExternalCatalog.java`,
    to get locationUrl by calling hive metastore's `getCatalog()` method.
    But this method only exists in hive 3+. So it will fail if we using hive 2.x.

    I temporary remove this logic, because this logic is only used from iceberg table writing.
    Which is still under development. We will rethink this logic later.

2. Fix test cases

    Some of P2 test cases missed `order_qt`. And because the output format of the floating point
    type is changed, some result in `out` files need to be regenerated.

* [revert](jni) revert part of #32455 (#32904)

* [fix](spill) Avoid releasing resources while spill tasks are executing (#32783)

* [chore](log) print query id before logging profile in be.INFO (#32922)

* [fix](grace-exit) Stop incorrectly of reportwork cause heap use after free #32929

* [improvement](decommission be) decommission check replica num (#32748)

* [fix](arrow-flight) Fix reach limit of connections error (#32911)

Fix Reach limit of connections error
in fe.conf , arrow_flight_token_cache_size is mandatory less than qe_max_connection/2. arrow flight sql is a stateless protocol, connection is usually not actively disconnected, bearer token is evict from the cache will unregister ConnectContext.

Fix ConnectContext.command not be reset to COM_SLEEP in time, this will result in frequent kill connection after query timeout.

Fix bearer token evict log and exception.

TODO: use arrow flight session: https://mail.google.com/mail/u/0/#inbox/FMfcgzGxRdxBLQLTcvvtRpqsvmhrHpdH

* [bugfix](cloud) few variable not initialized (#32868)

../../cloud/src/recycler/meta_checker.cpp
can cause uninitialised memory read.

* [fix](arrow-flight) Fix arrow flight sql compatible with JDK 17 and upgrade arrow 15.0.2 (#32796)

--add-opens=java.base/java.nio=ALL-UNNAMED, see: https://arrow.apache.org/docs/java/install.html#java-compatibility
groovy use flight sql connection to execute query SUM(MAX(c1) OVER (PARTITION BY)) report error: AGGREGATE clause must not contain analytic expressions, but no problem in Java execute it with jdbc::arrow-flight-sql.
groovy not support print arrow array type, throw IndexOutOfBoundsException.
"arrow_flight_sql" not support two phase read
./run-regression-test.sh --run --clean -g arrow_flight_sql

* [fix](spill) SpillStream's writer maybe may not have been finalized (#32931)

* [improvement](spill) Disable DistinctStreamingAgg when spill is enabled (#32932)

* [Improve](inverted_index) update clucene and improve array inverted index writer  (#32436)

* [Performance](exec) replace SipHash in function by XXHash (#32919)

* [feature](agg) add aggregate function sum0 (#32541)

* [improvement](mtmv) Support to get tables in materialized view when collecting table in plan (#32797)

Support to get tables in materialized view when collecting table in plan

table scehma as fllowing:

create materialized view mv1
BUILD IMMEDIATE REFRESH COMPLETE ON MANUAL
DISTRIBUTED BY RANDOM BUCKETS 1 
PROPERTIES ('replication_num' = '1')
 as 
select 
  t1.c1, 
  t3.c2 
from 
  table1 t1 
  inner join table3 t3 on t1.c1 = t3.c2

if get table from the plan as follwoing, we can get [table1, table3, table2], the mv1 is expanded to get base tables;

SELECT 
  mv1.*, 
  uuid() 
FROM 
  mv1 LEFT SEMI 
  JOIN table2 ON mv1.c1 = table2.c1 
WHERE 
  mv1.c1 IN (
    SELECT 
      c1 
    FROM 
      table2
  ) 
  OR mv1.c1 < 10

* [enhance](mtmv)support olap table partition column is null (#32698)

* [enhancement](cloud) add table version to cloud (#32738)

Add table version to cloud.

In Fe:
Get: If Fe is cloud mode, get table version from meta service.
Update: Op drop/replace temp partition, commit transaction.

In meta service:
Add: create Index. init value is 1.
Remove: by recycler.
Update: commit/drop partition rpc, commit txn rpc. Atomic++.

* [fix](cloud) schema change from not null to null (#32913)

1. Use equals instead of == for type comparing
2. null bitmap size is reisze by size of ref column.

* [feature](Nereids): add ColumnPruningPostProcessor. (#32800)

* [case](rowpolicy)fix row policy has been exist (#32880)

* [fix](pipeline) fix use error row desc when origin block clear (#32803)

* [fix](Nereids) support variant column with index when create table (#32948)

* [opt](Nereids) support create table with variant type (#32953)

* [test](insert-overwrite) Add insert overwrite auto detect concurrency cases (#32935)

* [fix](compile) fe cannot compile in idea (#32955)

* [enhancement](plsql) Support select * from routines (#32866)

Support show of plsql procedure using select * from routines.

* [fix](trino-connector) fix `NoClassDefFoundError` of hudi `Utils` class (#32846)

Due to the change of this PR #32455 , the `trino-connector-scanner` package cannot access the `hudi_scanner` package, so the exception NoclassDeffounderror will appear.

We need to write a separate Utils class.

* [exec](column) change some complex column move to noexcept (#32954)

* [Enhancement](data skew) extends show data skew (#32732)

* [chore](test) let suite compatible with Nereids (#32964)

* Support identical column name in different index. (#32792)

* Limit the max string length to 1024 while collecting column stats to control BE memory usage. (#32470)

* [fix](merge-iterator) fix NOT_IMPLEMENTED_ERROR when read next block view (#32961)

* [improvement](executor)Add tag property for workload group #32874

* [fix](auth)unified workload and resource permission logic (#32907)

- `Grant resource` can no longer grant global `usage_priv`
-  `grant resource %` instead of `grant resource *`

before change:
```
grant usage_priv on resource * to f;
show grants for f\G
*************************** 1. row ***************************
      UserIdentity: 'f'@'%'
           Comment: 
          Password: No
             Roles: 
       GlobalPrivs: Usage_priv 
      CatalogPrivs: NULL
     DatabasePrivs: internal.information_schema: Select_priv ; internal.mysql: Select_priv 
        TablePrivs: NULL
          ColPrivs: NULL
     ResourcePrivs: NULL
 CloudClusterPrivs: NULL
WorkloadGroupPrivs: normal: Usage_priv 
```
after change
```
grant usage_priv on resource '%' to f;
show grants for f\G
*************************** 1. row ***************************
      UserIdentity: 'f'@'%'
           Comment: 
          Password: No
             Roles: 
       GlobalPrivs: NULL
      CatalogPrivs: NULL
     DatabasePrivs: internal.information_schema: Select_priv ; internal.mysql: Select_priv 
        TablePrivs: NULL
          ColPrivs: NULL
     ResourcePrivs: %: Usage_priv 
 CloudClusterPrivs: NULL
WorkloadGroupPrivs: normal: Usage_priv 

```

---------

Co-authored-by: yujun <yu.jun.reach@gmail.com>
Co-authored-by: Gavin Chou <gavineaglechou@gmail.com>
Co-authored-by: xy720 <22125576+xy720@users.noreply.github.com>
Co-authored-by: yongjinhou <109586248+yongjinhou@users.noreply.github.com>
Co-authored-by: Dongyang Li <hello_stephen@qq.com>
Co-authored-by: stephen <hello-stephen@qq.com>
Co-authored-by: morrySnow <101034200+morrySnow@users.noreply.github.com>
Co-authored-by: seawinde <149132972+seawinde@users.noreply.github.com>
Co-authored-by: lihangyu <15605149486@163.com>
Co-authored-by: Yulei-Yang <yulei.yang0699@gmail.com>
Co-authored-by: starocean999 <40539150+starocean999@users.noreply.github.com>
Co-authored-by: wangbo <wangbo@apache.org>
Co-authored-by: Mingyu Chen <morningman@163.com>
Co-authored-by: Jerry Hu <mrhhsg@gmail.com>
Co-authored-by: zhiqiang <seuhezhiqiang@163.com>
Co-authored-by: Xinyi Zou <zouxinyi02@gmail.com>
Co-authored-by: Vallish Pai <vallishpai@gmail.com>
Co-authored-by: amory <wangqiannan@selectdb.com>
Co-authored-by: HappenLee <happenlee@hotmail.com>
Co-authored-by: Jensen <czjourney@163.com>
Co-authored-by: zhangdong <493738387@qq.com>
Co-authored-by: Yongqiang YANG <98214048+dataroaring@users.noreply.github.com>
Co-authored-by: jakevin <jakevingoo@gmail.com>
Co-authored-by: Mryange <59914473+Mryange@users.noreply.github.com>
Co-authored-by: zclllyybb <zhaochangle@selectdb.com>
Co-authored-by: Tiewei Fang <43782773+BePPPower@users.noreply.github.com>
Co-authored-by: Xin Liao <liaoxinbit@126.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. dev/2.0.13-merged dev/2.1.5-merged reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants