Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix](statistics) correct update rows when doing multi-table load #35474

Merged
merged 1 commit into from
May 28, 2024

Conversation

freemandealer
Copy link
Contributor

rows of only one table is updated correctly, need to merge all table commit infos.

Proposed changes

Issue Number: close #xxx

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

rows of only one table is updated correctly, need to merge all table commit infos.
@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

@freemandealer freemandealer marked this pull request as ready for review May 27, 2024 14:04
@freemandealer
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 41574 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit aa2556bbb66aaaf8889c35924565e429d28370a3, data reload: false

------ Round 1 ----------------------------------
q1	17615	4282	4437	4282
q2	2006	193	195	193
q3	10476	1229	1164	1164
q4	10192	793	798	793
q5	7507	2654	2654	2654
q6	230	133	132	132
q7	971	617	606	606
q8	9228	2136	2076	2076
q9	9460	6676	6786	6676
q10	9355	3841	3950	3841
q11	448	242	236	236
q12	456	234	231	231
q13	17555	3220	3123	3123
q14	261	221	218	218
q15	511	480	506	480
q16	493	403	402	402
q17	966	649	695	649
q18	8426	7809	7694	7694
q19	6280	1578	1546	1546
q20	648	311	315	311
q21	5187	4009	3991	3991
q22	360	276	283	276
Total cold run time: 118631 ms
Total hot run time: 41574 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4579	4439	4398	4398
q2	372	266	258	258
q3	3189	2867	2810	2810
q4	1886	1657	1661	1657
q5	5475	5516	5492	5492
q6	215	123	127	123
q7	2226	1876	1824	1824
q8	3251	3346	3414	3346
q9	8709	8737	8533	8533
q10	3953	3896	3843	3843
q11	601	479	492	479
q12	786	618	638	618
q13	15835	3176	3198	3176
q14	296	277	281	277
q15	521	496	482	482
q16	479	431	425	425
q17	1772	1473	1475	1473
q18	7685	7681	7358	7358
q19	1666	1588	1518	1518
q20	2016	1765	1815	1765
q21	10005	4677	4716	4677
q22	591	486	494	486
Total cold run time: 76108 ms
Total hot run time: 55018 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 171552 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit aa2556bbb66aaaf8889c35924565e429d28370a3, data reload: false

query1	939	392	380	380
query2	6431	2466	2326	2326
query3	6643	208	209	208
query4	19009	17388	17358	17358
query5	4133	422	400	400
query6	248	154	182	154
query7	4588	295	287	287
query8	243	181	177	177
query9	8433	2311	2316	2311
query10	445	273	256	256
query11	10421	10089	10030	10030
query12	140	90	84	84
query13	1643	364	359	359
query14	10087	6691	7505	6691
query15	214	168	167	167
query16	7713	261	261	261
query17	1317	554	510	510
query18	1958	266	263	263
query19	190	149	149	149
query20	92	81	80	80
query21	194	131	140	131
query22	4085	3931	4080	3931
query23	33970	33172	33008	33008
query24	6906	2811	2923	2811
query25	473	350	360	350
query26	698	154	157	154
query27	1908	319	327	319
query28	3756	2040	2024	2024
query29	822	607	595	595
query30	223	148	155	148
query31	933	746	755	746
query32	59	53	55	53
query33	513	297	256	256
query34	832	465	469	465
query35	704	612	576	576
query36	1035	900	913	900
query37	102	65	67	65
query38	2846	2781	2782	2781
query39	858	803	792	792
query40	194	123	122	122
query41	45	47	43	43
query42	105	93	93	93
query43	567	548	578	548
query44	1077	721	742	721
query45	180	158	164	158
query46	1053	703	726	703
query47	1849	1771	1804	1771
query48	371	303	297	297
query49	764	373	382	373
query50	771	377	382	377
query51	6870	6747	6748	6747
query52	105	91	98	91
query53	350	279	283	279
query54	546	443	435	435
query55	74	72	77	72
query56	286	257	258	257
query57	1133	1029	1051	1029
query58	242	227	217	217
query59	3542	3219	3016	3016
query60	286	271	271	271
query61	108	104	105	104
query62	548	461	451	451
query63	314	282	290	282
query64	2623	1867	1742	1742
query65	3146	3074	3140	3074
query66	784	333	318	318
query67	15256	14663	14744	14663
query68	4604	521	533	521
query69	454	265	271	265
query70	1126	1119	1136	1119
query71	398	279	261	261
query72	7513	5958	5444	5444
query73	726	324	315	315
query74	6085	5706	5577	5577
query75	3304	2619	2599	2599
query76	2333	1076	1026	1026
query77	400	260	256	256
query78	10356	9947	9757	9757
query79	2160	510	508	508
query80	1321	436	422	422
query81	520	215	225	215
query82	662	90	94	90
query83	225	169	178	169
query84	250	95	83	83
query85	1605	275	264	264
query86	522	304	301	301
query87	3300	3113	3118	3113
query88	4099	2409	2418	2409
query89	466	389	379	379
query90	2034	182	181	181
query91	123	100	99	99
query92	63	49	50	49
query93	3044	509	484	484
query94	1223	187	242	187
query95	401	308	303	303
query96	606	273	264	264
query97	3175	3043	3017	3017
query98	243	213	209	209
query99	1131	856	868	856
Total cold run time: 253465 ms
Total hot run time: 171552 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.63 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit aa2556bbb66aaaf8889c35924565e429d28370a3, data reload: false

query1	0.04	0.03	0.03
query2	0.08	0.03	0.04
query3	0.22	0.05	0.05
query4	1.68	0.07	0.08
query5	0.50	0.50	0.50
query6	1.12	0.72	0.72
query7	0.01	0.01	0.01
query8	0.05	0.04	0.04
query9	0.55	0.49	0.49
query10	0.55	0.55	0.54
query11	0.15	0.12	0.11
query12	0.16	0.12	0.13
query13	0.59	0.59	0.60
query14	0.75	0.76	0.78
query15	0.83	0.83	0.81
query16	0.36	0.37	0.36
query17	1.03	1.02	1.04
query18	0.21	0.24	0.26
query19	1.81	1.71	1.73
query20	0.01	0.01	0.02
query21	15.61	0.66	0.63
query22	4.31	7.71	1.92
query23	18.24	1.32	1.28
query24	2.05	0.22	0.20
query25	0.15	0.08	0.07
query26	0.28	0.17	0.17
query27	0.08	0.07	0.08
query28	13.33	1.01	0.98
query29	13.50	3.33	3.27
query30	0.25	0.06	0.05
query31	2.88	0.38	0.38
query32	3.28	0.49	0.47
query33	2.89	2.92	2.93
query34	17.12	4.42	4.42
query35	4.54	4.54	4.61
query36	0.67	0.48	0.49
query37	0.18	0.14	0.16
query38	0.16	0.15	0.14
query39	0.04	0.04	0.03
query40	0.16	0.15	0.16
query41	0.09	0.04	0.05
query42	0.06	0.04	0.05
query43	0.04	0.03	0.04
Total cold run time: 110.61 s
Total hot run time: 30.63 s

orgSet.addAll(newSet);
return orgSet;
});
beIdToBaseTabletIds.putIfAbsent(beId, newSet);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is line 311 necessary?
The lib says that for computeIfPresent

If the specified key is not already associated with a value (or is mapped to null), attempts to compute its value using the given mapping function and enters it into this map unless null.
If the function returns null no mapping is recorded. If the function itself throws an (unchecked) exception, the exception is rethrown, and no mapping is recorded. The most common usage is to construct a new object serving as an initial mapped value or memoized result, as in:
map.computeIfAbsent(key, k -> new Value(f(k)));

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

    default V computeIfPresent(K key,
            BiFunction<? super K, ? super V, ? extends V> remappingFunction) {
        Objects.requireNonNull(remappingFunction);
        V oldValue;
        if ((oldValue = get(key)) != null) {
            V newValue = remappingFunction.apply(key, oldValue);
            if (newValue != null) {
                put(key, newValue);
                return newValue;
            } else {
                remove(key);
                return null;
            }
        } else {
            return null;
        }
    }

Look at the outer else branch, it seems like we need to keep line 311. Because it doesn't put new value to the map when the key doesn't exist.

Copy link
Contributor

@Jibing-Li Jibing-Li left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

orgSet.addAll(newSet);
return orgSet;
});
beIdToBaseTabletIds.putIfAbsent(beId, newSet);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

    default V computeIfPresent(K key,
            BiFunction<? super K, ? super V, ? extends V> remappingFunction) {
        Objects.requireNonNull(remappingFunction);
        V oldValue;
        if ((oldValue = get(key)) != null) {
            V newValue = remappingFunction.apply(key, oldValue);
            if (newValue != null) {
                put(key, newValue);
                return newValue;
            } else {
                remove(key);
                return null;
            }
        } else {
            return null;
        }
    }

Look at the outer else branch, it seems like we need to keep line 311. Because it doesn't put new value to the map when the key doesn't exist.

Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label May 28, 2024
Copy link
Contributor

PR approved by at least one committer and no changes requested.

Copy link
Contributor

PR approved by anyone and no changes requested.

1 similar comment
Copy link
Contributor

PR approved by anyone and no changes requested.

@Jibing-Li Jibing-Li merged commit 972df19 into apache:master May 28, 2024
30 of 33 checks passed
dataroaring pushed a commit that referenced this pull request May 31, 2024
…5474)

rows of only one table is updated correctly, need to merge all table
commit infos.

## Proposed changes

Issue Number: close #xxx

<!--Describe your changes.-->

## Further comments

If this is a relatively large or complex change, kick off the discussion
at [dev@doris.apache.org](mailto:dev@doris.apache.org) by explaining why
you chose the solution you did and what alternatives you considered,
etc...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2.1.0-conflict approved Indicates a PR has been approved by one committer. dev/3.0.0-merged reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants