Don't end computations until cluster is truly idle #7790

crusaderky · 2023-04-18T14:46:09Z

XREF Fine performance metrics: apportion to Computations #7776

Testing for Scheduler.total_occupancy to be zero is not a great pick for declaring a computation finished and starting the next one.
This PR will not create a new computation as long as there are

any long-running tasks
any tasks with unmet resource constraints
zero workers, but queued tasks

This reduces (but doesn't fully eliminate) use cases where you have overlapping Computation objects (read #7776 for more).

github-actions · 2023-04-18T15:55:56Z

Unit Test Results

See test report for an extended history of previous test failures. This is useful for diagnosing flaky tests.

      26 files ±  0       26 suites ±0 15h 12m 20s ⏱️ - 43m 22s
  3 606 tests +  3   3 496 ✔️ +  2   105 💤 ±0 5 ❌ +1
45 629 runs +39 43 455 ✔️ +38 2 169 💤 +1 5 ❌ ±0

For more details on these failures, see this check.

Results for commit 0cf5145. ± Comparison against base commit e887fde.

This pull request removes 2 and adds 5 tests. Note that renamed tests count towards both.

distributed.tests.test_scheduler ‑ test_computations
distributed.tests.test_scheduler ‑ test_computations_futures

distributed.tests.test_computations ‑ test_computations
distributed.tests.test_computations ‑ test_computations_futures
distributed.tests.test_computations ‑ test_computations_long_running
distributed.tests.test_computations ‑ test_computations_no_resources
distributed.tests.test_computations ‑ test_computations_no_workers

♻️ This comment has been updated with latest results.

hendrikmakait

Thanks, @crusaderky! This logic is much more sensible.

hendrikmakait · 2023-05-22T16:00:00Z

distributed/scheduler.py

@@ -1775,6 +1775,19 @@ def _clear_task_state(self) -> None:
        ):
            collection.clear()  # type: ignore

+    @property
+    def fully_idle(self) -> bool:


nit: Is there a way we can stress the semantic difference between idle and fully_idle (e.g., by renaming fully_idle -> is_idle or idle -> idle_workers)?

renamed fully_idle -> is_idle

crusaderky force-pushed the computation_idle branch from 707afb1 to 777521d Compare April 18, 2023 14:59

crusaderky force-pushed the computation_idle branch from 777521d to 57b8868 Compare April 19, 2023 14:18

Don't let no-worker and long-running tasks break Computations

37d6e01

crusaderky force-pushed the computation_idle branch from 57b8868 to 37d6e01 Compare May 10, 2023 11:06

crusaderky self-assigned this May 10, 2023

crusaderky marked this pull request as ready for review May 10, 2023 11:06

crusaderky requested a review from fjetter as a code owner May 10, 2023 11:06

crusaderky removed the request for review from fjetter May 10, 2023 11:06

crusaderky mentioned this pull request May 10, 2023

Worker crash causes computations to overlap #7825

Open

hendrikmakait added the needs review Needs review from a contributor. label May 10, 2023

crusaderky mentioned this pull request May 10, 2023

Computations meta-issue #7830

Open

crusaderky mentioned this pull request May 22, 2023

Fine performance metrics meta-issue #7665

Open

Merge branch 'main' into computation_idle

6251ded

hendrikmakait approved these changes May 22, 2023

View reviewed changes

hendrikmakait removed the needs review Needs review from a contributor. label May 22, 2023

rename fully_idle -> is_idle

0cf5145

crusaderky merged commit fcd921c into dask:main May 23, 2023

crusaderky deleted the computation_idle branch May 23, 2023 09:01

crusaderky mentioned this pull request Jun 4, 2023

Refactor Scheduler.is_idle #7881

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Don't end computations until cluster is truly idle #7790

Don't end computations until cluster is truly idle #7790

crusaderky commented Apr 18, 2023

github-actions bot commented Apr 18, 2023 •

edited

Loading

hendrikmakait left a comment

hendrikmakait May 22, 2023

crusaderky May 22, 2023

Don't end computations until cluster is truly idle #7790

Don't end computations until cluster is truly idle #7790

Conversation

crusaderky commented Apr 18, 2023

github-actions bot commented Apr 18, 2023 • edited Loading

Unit Test Results

hendrikmakait left a comment

Choose a reason for hiding this comment

hendrikmakait May 22, 2023

Choose a reason for hiding this comment

crusaderky May 22, 2023

Choose a reason for hiding this comment

github-actions bot commented Apr 18, 2023 •

edited

Loading