Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix](load) fix multi table load repeated failures and retries when meet data quality error #49938

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

sollhui
Copy link
Contributor

@sollhui sollhui commented Apr 10, 2025

What problem does this PR solve?

Multi table load repeated failures and retries when meet data quality error (abort task continuously increasing):

Statistic: {"receivedBytes":0,"runningTxns":[],"errorRows":0,"committedTaskNum":0,"loadedRows":0,"loadRowsRate":0,"abortedTaskNum":2830,"errorRowsAfterResumed":0,"totalRows":0,"unselectedRows":0,"receivedBytesRate":0,"taskExecuteTimeMs":1}

The expected outcome should be tolerance for erroneous data.

Problem Summary:

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@sollhui
Copy link
Contributor Author

sollhui commented Apr 10, 2025

run buildall

@sollhui sollhui force-pushed the fix_multi_table_load branch from d7f3b26 to 5936ad7 Compare April 10, 2025 03:43
@sollhui
Copy link
Contributor Author

sollhui commented Apr 10, 2025

run buildall

Copy link
Contributor

@liaoxin01 liaoxin01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Apr 10, 2025
Copy link
Contributor

PR approved by at least one committer and no changes requested.

Copy link
Contributor

PR approved by anyone and no changes requested.

@doris-robot
Copy link

TPC-H: Total hot run time: 33990 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 5936ad789b0c35f2a1495ec4d48f2d0bd92384d0, data reload: false

------ Round 1 ----------------------------------
q1	25829	4991	4977	4977
q2	2073	289	181	181
q3	10393	1235	676	676
q4	10218	1046	522	522
q5	7518	2248	2362	2248
q6	178	161	130	130
q7	891	747	602	602
q8	9306	1223	1182	1182
q9	6807	5059	5070	5059
q10	6847	2285	1872	1872
q11	471	282	266	266
q12	349	347	218	218
q13	17794	3653	3104	3104
q14	241	224	217	217
q15	534	474	499	474
q16	618	629	581	581
q17	592	857	364	364
q18	7610	7287	7133	7133
q19	1229	950	554	554
q20	331	334	236	236
q21	4123	3362	2429	2429
q22	1015	994	965	965
Total cold run time: 114967 ms
Total hot run time: 33990 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5087	5082	5095	5082
q2	240	332	240	240
q3	2170	2630	2239	2239
q4	1439	1854	1381	1381
q5	4410	4389	4424	4389
q6	224	171	126	126
q7	1969	1909	1778	1778
q8	2608	2492	2539	2492
q9	7318	7264	7165	7165
q10	2978	3224	2769	2769
q11	573	507	505	505
q12	708	779	663	663
q13	3618	3997	3444	3444
q14	314	307	284	284
q15	525	480	465	465
q16	676	690	653	653
q17	1164	1535	1401	1401
q18	7792	7482	7343	7343
q19	816	870	906	870
q20	1921	1940	1810	1810
q21	5276	4848	4802	4802
q22	1111	1043	1030	1030
Total cold run time: 52937 ms
Total hot run time: 50931 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 192848 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 5936ad789b0c35f2a1495ec4d48f2d0bd92384d0, data reload: false

query1	1402	1058	1021	1021
query2	6303	1960	1939	1939
query3	11002	4450	4482	4450
query4	52508	23973	22922	22922
query5	4942	607	442	442
query6	320	194	180	180
query7	4860	500	277	277
query8	318	243	222	222
query9	5350	2574	2577	2574
query10	445	331	268	268
query11	15026	15013	14782	14782
query12	156	114	106	106
query13	1022	515	400	400
query14	10070	6360	6457	6360
query15	212	197	184	184
query16	7085	624	486	486
query17	1076	754	601	601
query18	1544	417	321	321
query19	204	195	165	165
query20	122	122	123	122
query21	222	135	105	105
query22	4575	4458	4275	4275
query23	34073	33269	33276	33269
query24	6760	2452	2420	2420
query25	455	456	405	405
query26	721	282	148	148
query27	2570	506	328	328
query28	3281	2454	2439	2439
query29	580	569	451	451
query30	270	224	191	191
query31	872	889	800	800
query32	72	61	64	61
query33	443	364	310	310
query34	765	853	532	532
query35	783	849	767	767
query36	954	1009	891	891
query37	121	105	77	77
query38	4129	4179	4103	4103
query39	1548	1570	1443	1443
query40	210	119	108	108
query41	52	56	53	53
query42	129	110	115	110
query43	499	496	480	480
query44	1371	833	829	829
query45	184	173	168	168
query46	863	1030	628	628
query47	1841	1898	1818	1818
query48	380	417	334	334
query49	691	515	419	419
query50	668	701	405	405
query51	4289	4305	4316	4305
query52	110	111	95	95
query53	237	265	192	192
query54	577	590	526	526
query55	92	84	83	83
query56	290	294	284	284
query57	1159	1231	1122	1122
query58	268	267	262	262
query59	2768	2860	2884	2860
query60	346	324	325	324
query61	130	138	133	133
query62	751	728	667	667
query63	221	181	188	181
query64	1939	1157	703	703
query65	4417	4364	4325	4325
query66	739	403	303	303
query67	15763	15665	15301	15301
query68	7102	889	518	518
query69	533	293	304	293
query70	1191	1100	1137	1100
query71	505	313	284	284
query72	5771	4733	4714	4714
query73	1381	613	344	344
query74	9032	9103	9044	9044
query75	4178	3185	2699	2699
query76	4200	1203	756	756
query77	771	344	273	273
query78	9999	10119	9413	9413
query79	4334	791	554	554
query80	653	513	435	435
query81	481	258	221	221
query82	486	123	97	97
query83	292	251	236	236
query84	296	102	90	90
query85	784	443	304	304
query86	346	326	298	298
query87	4389	4585	4372	4372
query88	3587	2248	2239	2239
query89	431	311	284	284
query90	1911	287	219	219
query91	143	139	114	114
query92	79	59	57	57
query93	2734	968	590	590
query94	682	419	296	296
query95	359	293	281	281
query96	497	568	272	272
query97	3191	3233	3118	3118
query98	217	190	195	190
query99	1436	1385	1288	1288
Total cold run time: 300136 ms
Total hot run time: 192848 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 31.91 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 5936ad789b0c35f2a1495ec4d48f2d0bd92384d0, data reload: false

query1	0.03	0.04	0.03
query2	0.12	0.10	0.10
query3	0.24	0.19	0.19
query4	1.59	0.19	0.18
query5	0.57	0.57	0.59
query6	1.17	0.72	0.72
query7	0.02	0.02	0.02
query8	0.03	0.04	0.03
query9	0.58	0.51	0.51
query10	0.58	0.57	0.56
query11	0.14	0.11	0.11
query12	0.15	0.10	0.11
query13	0.61	0.60	0.61
query14	2.69	2.70	2.69
query15	0.91	0.86	0.85
query16	0.41	0.40	0.38
query17	1.05	1.03	1.02
query18	0.22	0.19	0.20
query19	1.90	1.84	1.89
query20	0.02	0.01	0.01
query21	15.37	0.93	0.56
query22	0.75	1.19	0.91
query23	14.65	1.40	0.67
query24	7.10	1.68	1.30
query25	0.52	0.18	0.13
query26	0.51	0.16	0.14
query27	0.05	0.05	0.05
query28	9.48	0.77	0.43
query29	12.54	3.98	3.26
query30	0.24	0.08	0.06
query31	2.84	0.60	0.38
query32	3.22	0.54	0.46
query33	3.05	3.09	3.13
query34	15.67	5.12	4.48
query35	4.53	4.51	4.48
query36	0.65	0.50	0.48
query37	0.09	0.06	0.06
query38	0.05	0.04	0.03
query39	0.03	0.03	0.03
query40	0.18	0.13	0.13
query41	0.08	0.03	0.03
query42	0.03	0.03	0.02
query43	0.03	0.04	0.03
Total cold run time: 104.69 s
Total hot run time: 31.91 s

Copy link
Contributor

@dataroaring dataroaring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. dev/2.1.x dev/3.0.x p0_b reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants