Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fix](statistics)Need to recalculate health value when table row count become 0. #27673

Merged
merged 1 commit into from
Nov 28, 2023

Conversation

Jibing-Li
Copy link
Contributor

Need to recalculate health value when table row count become 0. Otherwise, when user truncate a table, the old statistics will not be updated.

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

@Jibing-Li Jibing-Li marked this pull request as ready for review November 28, 2023 03:07
@Jibing-Li
Copy link
Contributor Author

run buildall

Copy link
Contributor

PR approved by anyone and no changes requested.

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 44.22 seconds
stream load tsv: 580 seconds loaded 74807831229 Bytes, about 123 MB/s
stream load json: 18 seconds loaded 2358488459 Bytes, about 124 MB/s
stream load orc: 65 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 33 seconds loaded 861443392 Bytes, about 24 MB/s
insert into select: 28.7 seconds inserted 10000000 Rows, about 348K ops/s
storage size: 17100867117 Bytes

@doris-robot
Copy link

TPC-H test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
Tpch sf100 test result on commit 23a58a85dc88785e2946adb8d6d4c77c7aec6a9e, data reload: false

run tpch-sf100 query with default conf and session variables
q1	4886	4654	4638	4638
q2	358	157	159	157
q3	2043	1940	1840	1840
q4	1388	1246	1270	1246
q5	3943	3949	4002	3949
q6	259	132	129	129
q7	1372	848	852	848
q8	2784	2813	2768	2768
q9	9824	10749	9626	9626
q10	3453	3536	3520	3520
q11	372	256	237	237
q12	440	298	297	297
q13	4552	3803	3813	3803
q14	326	304	296	296
q15	585	531	527	527
q16	487	456	463	456
q17	1130	965	915	915
q18	7875	7426	7445	7426
q19	1694	1694	1679	1679
q20	546	303	280	280
q21	4392	3989	4008	3989
q22	476	371	385	371
Total cold run time: 53185 ms
Total hot run time: 48997 ms

run tpch-sf100 query with default conf and set session variable runtime_filter_mode=off
q1	4592	4567	4569	4567
q2	337	244	279	244
q3	4034	4027	4008	4008
q4	2729	2716	2869	2716
q5	9635	9691	9600	9600
q6	247	126	125	125
q7	2985	2487	2519	2487
q8	4483	4470	4450	4450
q9	13197	13105	13079	13079
q10	4039	4133	4165	4133
q11	778	630	649	630
q12	991	805	810	805
q13	4296	3562	3534	3534
q14	370	345	363	345
q15	584	517	512	512
q16	602	574	553	553
q17	3848	3889	3896	3889
q18	9548	9192	9105	9105
q19	1881	1801	1784	1784
q20	2390	2071	2033	2033
q21	8829	8630	8666	8630
q22	861	828	801	801
Total cold run time: 81256 ms
Total hot run time: 78030 ms

@Jibing-Li
Copy link
Contributor Author

run buildall

@Jibing-Li
Copy link
Contributor Author

run buildall

morningman
morningman previously approved these changes Nov 28, 2023
Copy link
Contributor

@morningman morningman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Nov 28, 2023
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions github-actions bot removed the approved Indicates a PR has been approved by one committer. label Nov 28, 2023
@Jibing-Li
Copy link
Contributor Author

run buildall

Copy link
Contributor

@morningman morningman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Nov 28, 2023
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 45.62 seconds
stream load tsv: 563 seconds loaded 74807831229 Bytes, about 126 MB/s
stream load json: 29 seconds loaded 2358488459 Bytes, about 77 MB/s
stream load orc: 68 seconds loaded 1101869774 Bytes, about 15 MB/s
stream load parquet: 32 seconds loaded 861443392 Bytes, about 25 MB/s
insert into select: 28.8 seconds inserted 10000000 Rows, about 347K ops/s
storage size: 17099157324 Bytes

@doris-robot
Copy link

TPC-H test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
Tpch sf100 test result on commit 727361664f722f866d93871e6b9bb7d005292f3e, data reload: false

run tpch-sf100 query with default conf and session variables
q1	4891	4631	4638	4631
q2	360	163	158	158
q3	2053	1932	1999	1932
q4	1419	1355	1281	1281
q5	3970	3926	4047	3926
q6	255	131	132	131
q7	1365	833	846	833
q8	2815	2807	2802	2802
q9	9720	9913	9809	9809
q10	3478	3517	3504	3504
q11	385	243	251	243
q12	439	298	301	298
q13	4544	3836	3793	3793
q14	321	286	292	286
q15	586	524	532	524
q16	500	471	448	448
q17	1138	995	947	947
q18	7854	7519	7383	7383
q19	1686	1696	1687	1687
q20	581	314	288	288
q21	4431	4014	4063	4014
q22	467	369	377	369
Total cold run time: 53258 ms
Total hot run time: 49287 ms

run tpch-sf100 query with default conf and set session variable runtime_filter_mode=off
q1	4575	4568	4560	4560
q2	338	242	238	238
q3	4032	4018	4145	4018
q4	2901	2709	2714	2709
q5	9579	9536	9520	9520
q6	250	123	126	123
q7	2979	2511	2516	2511
q8	4433	4420	4436	4420
q9	12939	12883	12838	12838
q10	4068	4134	4132	4132
q11	797	610	654	610
q12	974	808	806	806
q13	4317	3592	3572	3572
q14	374	356	365	356
q15	582	521	524	521
q16	591	591	580	580
q17	3883	3939	3950	3939
q18	9708	9162	9119	9119
q19	1818	1792	1796	1792
q20	2428	2093	2028	2028
q21	8749	8622	8554	8554
q22	915	795	898	795
Total cold run time: 81230 ms
Total hot run time: 77741 ms

@morningman morningman merged commit 1b509ab into apache:master Nov 28, 2023
morningman pushed a commit that referenced this pull request Nov 28, 2023
@Jibing-Li Jibing-Li deleted the row branch November 28, 2023 15:22
eldenmoon pushed a commit to eldenmoon/incubator-doris that referenced this pull request Dec 3, 2023
eldenmoon pushed a commit to eldenmoon/incubator-doris that referenced this pull request Dec 3, 2023
eldenmoon added a commit that referenced this pull request Dec 4, 2023
* [chore](case) Use correct insert stmt for cold heat separation case #27546 (#27585)

Co-authored-by: AlexYue <yj976240184@gmail.com>

* [enhance](S3) Print the error detail for every s3 operation (#27572) (#27615)

* [nereids] fix stats error when using dateTime type filter #27571 (#27577)

* [fix](planner)sort node should materialized required slots for itself #27605 (#27620)

* [fix](Nereids) non-deterministic expression should not be constant (#27606) (#27631)

* [enhancement](stats) Add process for aggstate type #27640 (#27642)

* [Fix](statistics)Fix bug and improve auto analyze. (#27626) (#27657)

1. Implement needReAnalyzeTable for ExternalTable. For now, external table will not be reanalyzed in 10 days.
2. For HiveMetastoreCache.loadPartitions, handle the empty iterator case to avoid Index out of boundary exception.
3. Wrap handle show analyze loop with try catch, so that when one table failed (for example, catalog dropped so the table couldn't be found anymore), we can still show the other tables.
4. For now, only OlapTable and Hive HMSExternalTable support sample analyze, throw exception for other types of table.
5. In StatisticsCollector, call constructJob after createTableLevelTaskForExternalTable to avoid NPE.

* [profile](bugfix) should not cache profile content because the profile may not be a full profile (#27635)

---------

Co-authored-by: yiguolei <yiguolei@gmail.com>

* [Enhance](fe) Support setting initial root password when FE firstly launch (#27438) (#27603)

* [opt](plan) only lock olap table when query plan #27639 (#27656)

bp #27639

* select coordinator node from user's tag when exec streaming load (#27106) (#27677)

* [fix](statistics)Need to recalculate health value when table row count become 0  #27673 (#27674)

backport #27673

* [fix](statistics)Fix sample min max npe bug  #27702 (#27707)

backport #27702

* [Bug](join) try fix wrong _has_null_in_build_side setted (#27684) (#27710)

* [Fix](show-load)Show load npe(userinfo is null) (#27698) (#27719)

* [pick](nereids)temporary partition is always pruned #27636 (#27722)

* [enhancement](stats) limit bq cap size for analyze task #27685 (#27687)

* [improvement](statistics) Add config for the threshold of column count for auto analyze #27713 (#27723)

* [doc](fix) k8s operator docs fix to 2.0 (#27476)

* [Improvement](planner)support select tablets with nereids optimize #23164 #23365 (#27740)

#23164
#23365

* [FIX](complextype)fix complex type hash equals (#27743)

* [fix](statistics) Fix show auto analyze missing jobs bug (#27761)

* [bugfix](topn) fix coredump in copy_column_data_to_block when nullable mismatch

return RuntimeError if copy_column_data_to_block nullable mismatch to avoid coredump in input_col_ptr->filter_by_selector(sel_rowid_idx, select_size, raw_res_ptr) .

The problem is reported by a doris user but I can not reproduce it, so there is no testcase added currently.

* [opt](stats) Use escape rather than base64 for min/max value #27746 (#27748)

* [refactor](http) disable snapshot and get_log_file api (#27724) (#27770)

* [branch-2.0](pick 27738) Warning log to trace send fragment #27738 (#27760)

* [branch-2.0](pick #27771) Add more detail msg for waitRPC exception (#27773)

* [Bug](pipeline) prevent PipelineFragmentContext destruct early (#27790)

* [deps](compression) Opt gzip decompress by libdeflate on X86 and X86_64 platforms: 1. Add libdeflate lib.  (#27542) (#27711)

Backport from #27542.

* [FIX](case)fix case truncate table first #27792

* [doc](stats) add auto_analyze_table_width_threshold description. (#27818) (#27832)

* [fix](bdbje) Fix bdbje logging level not work (#27597) (#27788)

* `EnvironmentConfig.FILE_LOGGING_LEVEL` only set FileHandlerLevel, we should
   set logger level firstly, otherwise it will not take effect.

* [Opt](compression) Opt gzip decompress by libdeflate on X86 and X86_64 platforms: 2. Opt gzip decompression by libdeflate lib. (#27669) (#27801)

Backport from #27669.

* [branch-2.0](fix) Fix broken exception message #27836

* [Bug](func) coredump in equal for null in function (#27843)

* [minor](stats) Update olap table row count after analyze (#27858)

pick from master #27814

* [fix](stats)min and max return NaN when table is empty (#27863)

fix analyze empty table and min/max null value bug:
1. Skip empty analyze task for sample analyze task. (Full analyze task already skipped).
2. Check sample rows is not 0 before calculate the scale factor.
3. Remove ' in sql template after remove base64 encoding for min/max value.

backport #27862

* [minor](stats) Throw error when sync analyze failed (#27846)

pick from master #27845

* [fix](stats) Don't save colToPartitions anymore to save mem (#27880)

pick from master #27879

* [fix](nereids) set operation's result type is wrong if decimal overflows (#27872)

pick from master #27870

* [Config] Modify the default value of tablet_schema_cache_recycle_interval (#27877)

* [fix](like_func) incorrect result of like with 'NO_BACKSLASH_ESCAPES' mode(#27842) (#27851)

* [fix](fe) Fix show frontends npt in some situations (#27295) (#27789)

```
java.lang.NullPointerException: null
    at com.sleepycat.je.rep.util.ReplicationGroupAdmin.getMasterSocket(ReplicationGroupAdmin.java:191)
    at com.sleepycat.je.rep.util.ReplicationGroupAdmin.doMessageExchange(ReplicationGroupAdmin.java:607)
    at com.sleepycat.je.rep.util.ReplicationGroupAdmin.getGroup(ReplicationGroupAdmin.java:406)
    at org.apache.doris.ha.BDBHA.getElectableNodes(BDBHA.java:132)
    at org.apache.doris.common.proc.FrontendsProcNode.getFrontendsInfo(FrontendsProcNode.java:84)
    at org.apache.doris.qe.ShowExecutor.handleShowFrontends(ShowExecutor.java:1923)
    at org.apache.doris.qe.ShowExecutor.execute(ShowExecutor.java:355)
    at org.apache.doris.qe.StmtExecutor.handleShow(StmtExecutor.java:2113)
    ...
```

* [branch-2.0](fix) Fix extremely high CPU usage caused by rf merge #27894 (#27895)

* [fix](stacktrace) ignore stacktrace for error code INVALID_ARGUMENT INVERTED_INDEX_NOT_IMPLEMENTED (#27898)

* ignore stacktrace for error INVALID_ARGUMENT INVERTED_INDEX_NOT_IMPLEMENTED

* AndBlockColumnPredicate::evaluate

* [opt](nereids) Branch-2.0: remove partition & histogram from col stats to reduce memory usage #27885 (#27896)

* [pick](Nereids) temporary partition is selected only if user manually specified: Branch-2.0 #27893 (#27905)

* [fix](multi-catalog)support the max compute partition prune (#27154) (#27902)

backport #27154

* [fix](Nereids) should not push down project to the nullable side of outer join #27912 (#27913)

* fix compile

---------

Co-authored-by: Dongyang Li <hello_stephen@qq.com>
Co-authored-by: AlexYue <yj976240184@gmail.com>
Co-authored-by: xzj7019 <131111794+xzj7019@users.noreply.github.com>
Co-authored-by: starocean999 <40539150+starocean999@users.noreply.github.com>
Co-authored-by: morrySnow <101034200+morrySnow@users.noreply.github.com>
Co-authored-by: AKIRA <33112463+Kikyou1997@users.noreply.github.com>
Co-authored-by: Jibing-Li <64681310+Jibing-Li@users.noreply.github.com>
Co-authored-by: yiguolei <676222867@qq.com>
Co-authored-by: yiguolei <yiguolei@gmail.com>
Co-authored-by: DuRipeng <453243496@qq.com>
Co-authored-by: Mingyu Chen <morningman@163.com>
Co-authored-by: wangbo <wangbo@apache.org>
Co-authored-by: Pxl <pxl290@qq.com>
Co-authored-by: Calvin Kirs <acm_master@163.com>
Co-authored-by: minghong <englefly@gmail.com>
Co-authored-by: catpineapple <42031973+catpineapple@users.noreply.github.com>
Co-authored-by: amory <wangqiannan@selectdb.com>
Co-authored-by: Kang <kxiao.tiger@gmail.com>
Co-authored-by: zhiqiang <seuhezhiqiang@163.com>
Co-authored-by: Qi Chen <kaka11.chen@gmail.com>
Co-authored-by: Lei Zhang <27994433+SWJTU-ZhangLei@users.noreply.github.com>
Co-authored-by: HappenLee <happenlee@hotmail.com>
Co-authored-by: Lightman <31928846+Lchangliang@users.noreply.github.com>
Co-authored-by: Jerry Hu <mrhhsg@gmail.com>
Co-authored-by: slothever <18522955+wsjz@users.noreply.github.com>
gnehil pushed a commit to gnehil/doris that referenced this pull request Dec 4, 2023
@xiaokang xiaokang mentioned this pull request Dec 4, 2023
XuJianxu pushed a commit to XuJianxu/doris that referenced this pull request Dec 14, 2023
…t become 0 (apache#27673)

Need to recalculate health value when table row count become 0. Otherwise, when user truncate a table, the old statistics will not be updated.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. dev/2.0.3-merged p0_b reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants