Fix transfer limiting in `_select_keys_for_gather` #7071

hendrikmakait · 2022-09-26T18:38:11Z

Closes

25% performance regression in merges #7052

Tests added / passed
Passes pre-commit run --all-files

github-actions · 2022-09-26T20:38:53Z

Unit Test Results

See test report for an extended history of previous test failures. This is useful for diagnosing flaky tests.

      15 files +      1       15 suites +1 6h 7m 18s ⏱️ + 30m 14s
  3 122 tests +      6   3 036 ✔️ +    17   85 💤 - 11 1 ❌ ±0
23 106 runs +1 281 22 198 ✔️ +1 341 907 💤 - 60 1 ❌ ±0

For more details on these failures, see this check.

Results for commit ff4c4ed. ± Comparison against base commit 8c4133c.

♻️ This comment has been updated with latest results.

hendrikmakait · 2022-09-27T08:51:33Z

CI flakes:

wence-

Thanks! Can confirm that this change fixes the performance regression we observe.

Mostly minor comments to do with naming.

distributed/worker_state_machine.py

wence- · 2022-09-27T08:25:27Z

distributed/worker_state_machine.py

-                self.transfer_incoming_bytes
-                or to_gather
-            ) and total_nbytes + ts.get_nbytes() > bytes_left_to_fetch:
+            if self._task_exceeds_transfer_limits(ts, total_nbytes):


Not for this PR, but it seems that this gather logic is somewhat eager to terminate. Suppose we have many tasks available, a few already in flight, and the top priority task is large (will exceed transfer limits). We'll never go further through the available list to check if we could actually eagerly transfer some lower-priority (but smaller) tasks.

This may be deliberate, but not sure.

Agreed, it's a rather aggressive stop criterion. The main reason we have this is that I haven't seen any evidence that would support a more elaborate one. The alternative would be to implement some form of looking at what's behind the task that exceeds the limits (or a completely different algorithm for packing). Depending on how that's done, this might lead to a lot of unnecessary work. Suppose we have many tasks available and they are all roughly equally-sized. Chances are pretty high that if the top-priority task doesn't fit into the message anymore, none of the following ones would. I'm happy to iterate on this assuming it's becoming an actual problem.

Co-authored-by: Lawrence Mitchell <wence@gmx.li>

wence-

Two minor nits, otherwise looks great, thanks for the quick fix!

distributed/worker_state_machine.py

distributed/tests/test_worker_state_machine.py

Co-authored-by: Lawrence Mitchell <wence@gmx.li>

gjoseph92

Thanks @hendrikmakait and @wence-

hendrikmakait added 2 commits September 26, 2022 19:20

Fix limiting

b6380c7

Improve readability

0808455

hendrikmakait force-pushed the fix-transfer-limiting branch from 75bd8eb to 0808455 Compare September 26, 2022 18:43

hendrikmakait added 2 commits September 26, 2022 20:54

Refactor tests

bdc9b1a

Replace default

c92add6

hendrikmakait marked this pull request as ready for review September 26, 2022 19:26

wence- suggested changes Sep 27, 2022

View reviewed changes

hendrikmakait and others added 4 commits September 27, 2022 11:35

Apply suggestions from code review

83f920b

Co-authored-by: Lawrence Mitchell <wence@gmx.li>

Review comments

2f82305

Stick with existing vocab

73449c5

Improve variable naming and comments

a5b7efb

hendrikmakait requested a review from wence- September 27, 2022 10:36

wence- suggested changes Sep 27, 2022

View reviewed changes

distributed/worker_state_machine.py Outdated Show resolved Hide resolved

distributed/tests/test_worker_state_machine.py Outdated Show resolved Hide resolved

hendrikmakait and others added 3 commits September 27, 2022 13:13

Update distributed/worker_state_machine.py

081e721

Co-authored-by: Lawrence Mitchell <wence@gmx.li>

Adjust tests

47fdab4

Minor

9919a21

wence- approved these changes Sep 27, 2022

View reviewed changes

hendrikmakait mentioned this pull request Sep 27, 2022

Expose message-bytes-limit in config #7074

Merged

2 tasks

Merge branch 'main' into fix-transfer-limiting

ff4c4ed

gjoseph92 approved these changes Sep 27, 2022

View reviewed changes

gjoseph92 merged commit a007345 into dask:main Sep 27, 2022

hendrikmakait mentioned this pull request Sep 28, 2022

25% performance regression in merges #7052

Closed

gjoseph92 pushed a commit to gjoseph92/distributed that referenced this pull request Oct 31, 2022

Fix transfer limiting in _select_keys_for_gather (dask#7071)

fce949a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Fix transfer limiting in `_select_keys_for_gather` #7071

Fix transfer limiting in `_select_keys_for_gather` #7071

Uh oh!

hendrikmakait commented Sep 26, 2022 •

edited

Loading

Uh oh!

github-actions bot commented Sep 26, 2022 •

edited

Loading

Uh oh!

hendrikmakait commented Sep 27, 2022

Uh oh!

wence- left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

wence- Sep 27, 2022

Uh oh!

hendrikmakait Sep 27, 2022

Uh oh!

wence- left a comment

Uh oh!

Uh oh!

Uh oh!

gjoseph92 left a comment

Uh oh!

Uh oh!

Uh oh!

Fix transfer limiting in _select_keys_for_gather #7071

Fix transfer limiting in _select_keys_for_gather #7071

Uh oh!

Conversation

hendrikmakait commented Sep 26, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Sep 26, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Unit Test Results

Uh oh!

hendrikmakait commented Sep 27, 2022

Uh oh!

wence- left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

wence- Sep 27, 2022

Choose a reason for hiding this comment

Uh oh!

hendrikmakait Sep 27, 2022

Choose a reason for hiding this comment

Uh oh!

wence- left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

gjoseph92 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Fix transfer limiting in `_select_keys_for_gather` #7071

Fix transfer limiting in `_select_keys_for_gather` #7071

hendrikmakait commented Sep 26, 2022 •

edited

Loading

github-actions bot commented Sep 26, 2022 •

edited

Loading