Skip to content

Fix busywait on adding to full async input buffer #14522

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Mar 11, 2025

Conversation

yumkam
Copy link
Collaborator

@yumkam yumkam commented Feb 13, 2025

DoExecute->PollAsyncInput->source.PollAsyncInput->ContinueExecution->DoExecute...

Changelog entry

...

Changelog category

  • Not for changelog (changelog entry is not required)

Additional information

There are two scenario where busywait was triggered:

  1. when both input and output is full, CA does not progress and busywaits;
  2. when input is full, cpu quota enabled, ratelimiter was asked for cpu time, and have not answered yet; again, busywait, and worse: this one is self-inducing (busywait spends cpu, it is not bounded by rate-limiter, so rate-limiter delays actual processing more and more)
    ...

Checks passed:

  • gh/arc CI;
  • dqrun, tpc h s10G/s100G sync/async CA, saves ~4-10%cpu, no/minor changes in wallclock
  • dqrun, tpc ds s10G sync/async ca, about same
  • Modified ("huge") streaming integration test, with slow yds output topic and ratelimiter enabled (used to be extremely slow), amount of polling cycles reduced from 80M to 400k (for processing of 1.5M rows)
  • dqrun/kqprun with grace join fixed (7354680 reverted): 10%-30% improvement in cpu time (comparing main + revert, patched + revert)

DoExecute->PollAsyncInput->source.PollAsyncInput->ContinueExecution->DoExecute...
Copy link

github-actions bot commented Feb 13, 2025

2025-02-13 10:06:06 UTC Pre-commit check linux-x86_64-release-asan for abd1799 has started.
2025-02-13 10:06:17 UTC Artifacts will be uploaded here
2025-02-13 10:09:06 UTC ya make is running...
🟡 2025-02-13 11:43:52 UTC Some tests failed, follow the links below. This fail is not in blocking policy yet Going to retry failed tests...

Test history | Ya make output | Test bloat

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
11645 11562 0 34 13 36

2025-02-13 11:44:56 UTC ya make is running... (failed tests rerun, try 2)
🟡 2025-02-13 12:06:25 UTC Some tests failed, follow the links below. This fail is not in blocking policy yet Going to retry failed tests...

Test history | Ya make output | Test bloat | Test bloat

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
130 (only retried tests) 92 0 1 4 33

2025-02-13 12:06:34 UTC ya make is running... (failed tests rerun, try 3)
🟡 2025-02-13 12:18:35 UTC Some tests failed, follow the links below. This fail is not in blocking policy yet

Test history | Ya make output | Test bloat | Test bloat | Test bloat

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
68 (only retried tests) 31 0 1 2 34

🟢 2025-02-13 12:18:42 UTC Build successful.
🟢 2025-02-13 12:19:11 UTC ydbd size 3.6 GiB changed* by -176 Bytes, which is <= 0 Bytes vs main: OK

ydbd size dash main: 749f7cd merge: abd1799 diff diff %
ydbd size 3 878 463 792 Bytes 3 878 463 616 Bytes -176 Bytes -0.000%
ydbd stripped size 1 358 247 648 Bytes 1 358 247 584 Bytes -64 Bytes -0.000%

*please be aware that the difference is based on comparing your commit and the last completed build from the post-commit, check comparation

Copy link

github-actions bot commented Feb 13, 2025

2025-02-13 10:06:30 UTC Pre-commit check linux-x86_64-relwithdebinfo for abd1799 has started.
2025-02-13 10:06:42 UTC Artifacts will be uploaded here
2025-02-13 10:09:34 UTC ya make is running...
🟡 2025-02-13 11:08:57 UTC Some tests failed, follow the links below. Going to retry failed tests...

Test history | Ya make output | Test bloat

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
25801 23197 0 6 2467 131

2025-02-13 11:11:19 UTC ya make is running... (failed tests rerun, try 2)
🟡 2025-02-13 11:28:37 UTC Some tests failed, follow the links below. Going to retry failed tests...

Test history | Ya make output | Test bloat | Test bloat

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
218 (only retried tests) 85 0 3 2 128

2025-02-13 11:28:45 UTC ya make is running... (failed tests rerun, try 3)
🔴 2025-02-13 11:43:46 UTC Some tests failed, follow the links below.

Test history | Ya make output | Test bloat | Test bloat | Test bloat

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
197 (only retried tests) 70 0 2 0 125

🟢 2025-02-13 11:43:52 UTC Build successful.
🟢 2025-02-13 11:44:15 UTC ydbd size 2.1 GiB changed* by -2.2 KiB, which is <= 0 Bytes vs main: OK

ydbd size dash main: 749f7cd merge: abd1799 diff diff %
ydbd size 2 230 113 376 Bytes 2 230 111 120 Bytes -2.2 KiB -0.000%
ydbd stripped size 473 416 696 Bytes 473 416 568 Bytes -128 Bytes -0.000%

*please be aware that the difference is based on comparing your commit and the last completed build from the post-commit, check comparation

@yumkam yumkam changed the title Fix busywait on adding to full async input buffer [WIP] Fix busywait on adding to full async input buffer Feb 14, 2025
@github-actions github-actions bot added bugfix and removed bugfix labels Feb 14, 2025
Copy link

github-actions bot commented Feb 14, 2025

2025-02-14 17:10:34 UTC Pre-commit check linux-x86_64-relwithdebinfo for 5ad7d5a has started.
2025-02-14 17:11:21 UTC Artifacts will be uploaded here
2025-02-14 17:14:55 UTC ya make is running...
🟡 2025-02-14 18:07:55 UTC Some tests failed, follow the links below. Going to retry failed tests...

Test history | Ya make output | Test bloat

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
26112 23519 0 3 2456 134

2025-02-14 18:12:40 UTC ya make is running... (failed tests rerun, try 2)
🟡 2025-02-14 18:32:03 UTC Some tests failed, follow the links below. Going to retry failed tests...

Test history | Ya make output | Test bloat | Test bloat

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
206 (only retried tests) 75 0 1 2 128

2025-02-14 18:32:12 UTC ya make is running... (failed tests rerun, try 3)
🟢 2025-02-14 18:49:50 UTC Tests successful.

Test history | Ya make output | Test bloat | Test bloat | Test bloat

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
193 (only retried tests) 67 0 0 1 125

🟢 2025-02-14 18:49:58 UTC Build successful.
🟢 2025-02-14 18:50:22 UTC ydbd size 2.1 GiB changed* by +272 Bytes, which is < 100.0 KiB vs main: OK

ydbd size dash main: c15923e merge: 5ad7d5a diff diff %
ydbd size 2 234 037 192 Bytes 2 234 037 464 Bytes +272 Bytes +0.000%
ydbd stripped size 474 107 800 Bytes 474 107 992 Bytes +192 Bytes +0.000%

*please be aware that the difference is based on comparing your commit and the last completed build from the post-commit, check comparation

Copy link

github-actions bot commented Feb 14, 2025

2025-02-14 17:11:34 UTC Pre-commit check linux-x86_64-release-asan for 5ad7d5a has started.
2025-02-14 17:12:11 UTC Artifacts will be uploaded here
2025-02-14 17:14:59 UTC ya make is running...
🟡 2025-02-14 18:23:40 UTC Some tests failed, follow the links below. This fail is not in blocking policy yet Going to retry failed tests...

Test history | Ya make output | Test bloat

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
11947 11853 0 41 16 37

2025-02-14 18:25:15 UTC ya make is running... (failed tests rerun, try 2)
🟢 2025-02-14 18:37:15 UTC Tests successful.

Test history | Ya make output | Test bloat | Test bloat

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
139 (only retried tests) 98 0 0 10 31

🟢 2025-02-14 18:37:27 UTC Build successful.
🟢 2025-02-14 18:37:57 UTC ydbd size 3.6 GiB changed* by +592 Bytes, which is < 100.0 KiB vs main: OK

ydbd size dash main: c15923e merge: 5ad7d5a diff diff %
ydbd size 3 885 196 560 Bytes 3 885 197 152 Bytes +592 Bytes +0.000%
ydbd stripped size 1 360 360 112 Bytes 1 360 360 880 Bytes +768 Bytes +0.000%

*please be aware that the difference is based on comparing your commit and the last completed build from the post-commit, check comparation

(cherry picked from commit 7cb4eb1)
Copy link

github-actions bot commented Feb 18, 2025

2025-02-18 13:56:59 UTC Pre-commit check linux-x86_64-relwithdebinfo for 6f20883 has started.
2025-02-18 13:57:27 UTC Artifacts will be uploaded here
2025-02-18 14:01:05 UTC ya make is running...
2025-02-18 14:13:18 UTC Check cancelled

Copy link

github-actions bot commented Feb 18, 2025

2025-02-18 13:59:10 UTC Pre-commit check linux-x86_64-release-asan for 6f20883 has started.
2025-02-18 13:59:24 UTC Artifacts will be uploaded here
2025-02-18 14:02:21 UTC ya make is running...
2025-02-18 14:13:42 UTC Check cancelled

Copy link

github-actions bot commented Feb 18, 2025

🟢 2025-02-27 16:57:16 UTC The validation of the Pull Request description is successful.

@yumkam yumkam force-pushed the fix-busywait-full-input-buffers branch from 4e6644d to fd18923 Compare February 18, 2025 14:12
Copy link

github-actions bot commented Feb 18, 2025

2025-02-18 14:17:14 UTC Pre-commit check linux-x86_64-release-asan for ffdea0a has started.
2025-02-18 14:17:27 UTC Artifacts will be uploaded here
2025-02-18 14:20:18 UTC ya make is running...
🟡 2025-02-18 15:28:02 UTC Some tests failed, follow the links below. This fail is not in blocking policy yet Going to retry failed tests...

Test history | Ya make output | Test bloat

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
11959 11872 0 35 16 36

2025-02-18 15:29:12 UTC ya make is running... (failed tests rerun, try 2)
🟢 2025-02-18 15:41:42 UTC Tests successful.

Test history | Ya make output | Test bloat | Test bloat

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
131 (only retried tests) 90 0 0 8 33

🟢 2025-02-18 15:41:49 UTC Build successful.
🟢 2025-02-18 15:42:19 UTC ydbd size 3.6 GiB changed* by +7.9 KiB, which is < 100.0 KiB vs main: OK

ydbd size dash main: d1f5b91 merge: ffdea0a diff diff %
ydbd size 3 891 711 472 Bytes 3 891 719 608 Bytes +7.9 KiB +0.000%
ydbd stripped size 1 362 428 448 Bytes 1 362 430 688 Bytes +2.2 KiB +0.000%

*please be aware that the difference is based on comparing your commit and the last completed build from the post-commit, check comparation

Copy link

2025-02-18 14:18:57 UTC Pre-commit check linux-x86_64-relwithdebinfo for ffdea0a has started.

@yumkam yumkam added the rebase-and-check Rebase PR with the current base branch and check label Feb 18, 2025
@github-actions github-actions bot removed the rebase-and-check Rebase PR with the current base branch and check label Feb 18, 2025
Copy link

github-actions bot commented Feb 18, 2025

2025-02-18 16:38:38 UTC Pre-commit check linux-x86_64-relwithdebinfo for 646395b has started.
2025-02-18 16:39:13 UTC Artifacts will be uploaded here
2025-02-18 16:42:02 UTC ya make is running...
🟢 2025-02-18 18:07:43 UTC Tests successful.

Test history | Ya make output | Test bloat

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
26153 23557 0 0 2460 136

🟢 2025-02-18 18:10:11 UTC Build successful.
🟢 2025-02-18 18:10:30 UTC ydbd size 2.1 GiB changed* by +5.4 KiB, which is < 100.0 KiB vs main: OK

ydbd size dash main: e31ed43 merge: 646395b diff diff %
ydbd size 2 237 549 376 Bytes 2 237 554 896 Bytes +5.4 KiB +0.000%
ydbd stripped size 474 150 104 Bytes 474 151 192 Bytes +1.1 KiB +0.000%

*please be aware that the difference is based on comparing your commit and the last completed build from the post-commit, check comparation

Copy link

github-actions bot commented Feb 18, 2025

2025-02-18 16:38:38 UTC Pre-commit check linux-x86_64-release-asan for 646395b has started.
2025-02-18 16:42:26 UTC Artifacts will be uploaded here
2025-02-18 16:45:17 UTC ya make is running...
🟡 2025-02-18 18:18:36 UTC Some tests failed, follow the links below. This fail is not in blocking policy yet Going to retry failed tests...

Test history | Ya make output | Test bloat

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
11965 11821 0 90 16 38

2025-02-18 18:19:53 UTC ya make is running... (failed tests rerun, try 2)
🟡 2025-02-18 18:39:38 UTC Some tests failed, follow the links below. This fail is not in blocking policy yet Going to retry failed tests...

Test history | Ya make output | Test bloat | Test bloat

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
202 (only retried tests) 157 0 3 7 35

2025-02-18 18:39:51 UTC ya make is running... (failed tests rerun, try 3)
🟢 2025-02-18 18:57:10 UTC Tests successful.

Test history | Ya make output | Test bloat | Test bloat | Test bloat

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
75 (only retried tests) 36 0 0 6 33

🟢 2025-02-18 18:57:18 UTC Build successful.
🟢 2025-02-18 18:57:49 UTC ydbd size 3.6 GiB changed* by +7.9 KiB, which is < 100.0 KiB vs main: OK

ydbd size dash main: e31ed43 merge: 646395b diff diff %
ydbd size 3 892 841 272 Bytes 3 892 849 328 Bytes +7.9 KiB +0.000%
ydbd stripped size 1 362 901 728 Bytes 1 362 903 904 Bytes +2.1 KiB +0.000%

*please be aware that the difference is based on comparing your commit and the last completed build from the post-commit, check comparation

LastPollResult = PollAsyncInput();

if (LastPollResult && *LastPollResult != EResumeSource::CAPollAsyncNoSpace) {
ContinueExecute(*std::exchange(LastPollResult, {}));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Зачем еще раз запрашивать новый цикл? Он же сюда же в итоге придет. Данные сорсов синхронно пушаться в TaskRunner (TaskRunnerActor->AsyncInputPush), поэтому можно/нужно сразу вызывать AskContinueRun().

Т.е. тут даже можно прокинуть EResumeSource и анализировать: если пришли по CANewAsyncInput, то делаем (PollAsyncInput()); если пришли по другой причине, то безусловано делаем AskContinueRun().

Это всё неточно(

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Если

  1. нам не выделили cpu quota;
  2. есть непереполненные входы;
    то мы можем продолжать их вычитывать, не дожидаясь квоты

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Я не вполне уверен, что от всей этой асинхронности много толку, с высокой вероятностью данные всё равно будут оседать в каких-то буферах, но я предпочёл заткнуть очевидную дырку ( busywait) и как можно меньше менять работу остальной части

@@ -1185,6 +1190,9 @@ class TDqAsyncComputeActor : public TDqComputeActorBase<TDqAsyncComputeActor, TC
CA_LOG_T("AsyncCheckRunStatus: TakeInputChannelDataRequests: " << TakeInputChannelDataRequests.size());
return;
}
if (ProcessOutputsState.LastRunStatus == ERunStatus::PendingInput && LastPollResult) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Как я понял тут именно CAPollAsyncNoSpace.
Еще вариант (теоретический) не использовать вообще CAPollAsyncNoSpace, а по TEvNewAsyncInputDataArrived сохранять какой то флажок, а не опрашивать в цикле

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Да, тут ожидается только NoSpace; других вариантов, кмк, тут уже быть не может.

@yumkam yumkam requested a review from kardymonds March 5, 2025 09:18
kardymonds
kardymonds previously approved these changes Mar 5, 2025
GrigoriyPA
GrigoriyPA previously approved these changes Mar 5, 2025
@yumkam yumkam dismissed stale reviews from GrigoriyPA and kardymonds via 66f2e27 March 5, 2025 11:11
Copy link

github-actions bot commented Mar 5, 2025

2025-03-05 11:14:34 UTC Pre-commit check linux-x86_64-release-asan for 4a0cb5d has started.
2025-03-05 11:14:49 UTC Artifacts will be uploaded here
2025-03-05 11:17:42 UTC ya make is running...
🟡 2025-03-05 12:28:38 UTC Some tests failed, follow the links below. This fail is not in blocking policy yet Going to retry failed tests...

Test history | Ya make output | Test bloat

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
11874 11696 0 120 18 40

2025-03-05 12:29:38 UTC ya make is running... (failed tests rerun, try 2)
🟡 2025-03-05 12:42:44 UTC Some tests failed, follow the links below. This fail is not in blocking policy yet Going to retry failed tests...

Test history | Ya make output | Test bloat | Test bloat

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
236 (only retried tests) 193 0 6 2 35

2025-03-05 12:42:55 UTC ya make is running... (failed tests rerun, try 3)
🟢 2025-03-05 12:54:52 UTC Tests successful.

Test history | Ya make output | Test bloat | Test bloat | Test bloat

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
75 (only retried tests) 41 0 0 0 34

🟢 2025-03-05 12:55:02 UTC Build successful.
🟢 2025-03-05 12:55:32 UTC ydbd size 3.7 GiB changed* by +6.4 KiB, which is < 100.0 KiB vs main: OK

ydbd size dash main: 5a1e03d merge: 4a0cb5d diff diff %
ydbd size 3 990 863 936 Bytes 3 990 870 440 Bytes +6.4 KiB +0.000%
ydbd stripped size 1 388 994 760 Bytes 1 388 995 496 Bytes +736 Bytes +0.000%

*please be aware that the difference is based on comparing your commit and the last completed build from the post-commit, check comparation

Copy link

github-actions bot commented Mar 5, 2025

2025-03-05 11:14:44 UTC Pre-commit check linux-x86_64-relwithdebinfo for 4a0cb5d has started.
2025-03-05 11:14:59 UTC Artifacts will be uploaded here
2025-03-05 11:17:53 UTC ya make is running...
🟡 2025-03-05 12:14:48 UTC Some tests failed, follow the links below. Going to retry failed tests...

Test history | Ya make output | Test bloat

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
26235 23641 0 2 2465 127

2025-03-05 12:17:18 UTC ya make is running... (failed tests rerun, try 2)
🟢 2025-03-05 12:35:12 UTC Tests successful.

Test history | Ya make output | Test bloat | Test bloat

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
174 (only retried tests) 65 0 0 0 109

🟢 2025-03-05 12:35:22 UTC Build successful.
🟢 2025-03-05 12:35:45 UTC ydbd size 2.1 GiB changed* by +11.3 KiB, which is < 100.0 KiB vs main: OK

ydbd size dash main: 5a1e03d merge: 4a0cb5d diff diff %
ydbd size 2 290 085 376 Bytes 2 290 096 944 Bytes +11.3 KiB +0.001%
ydbd stripped size 479 952 640 Bytes 479 954 816 Bytes +2.1 KiB +0.000%

*please be aware that the difference is based on comparing your commit and the last completed build from the post-commit, check comparation

@yumkam yumkam merged commit b2df7c8 into ydb-platform:main Mar 11, 2025
12 checks passed
yumkam added a commit to yumkam/ydb that referenced this pull request Mar 11, 2025
yumkam added a commit to yumkam/ydb that referenced this pull request Mar 11, 2025
yumkam added a commit to yumkam/ydb that referenced this pull request Mar 11, 2025
lberserq pushed a commit to lberserq/ydb that referenced this pull request Mar 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants