Skip to content

Test restart_limit#813

Merged
jan-janssen merged 1 commit intomainfrom
restart_limit
Sep 8, 2025
Merged

Test restart_limit#813
jan-janssen merged 1 commit intomainfrom
restart_limit

Conversation

@jan-janssen
Copy link
Member

@jan-janssen jan-janssen commented Sep 8, 2025

Summary by CodeRabbit

  • Tests
    • Expanded test coverage for single-node execution scenarios with block allocation and restart limits, improving confidence in crash/restart handling under multi-worker conditions.
    • Strengthens validation of failure-handling behavior to reduce regressions and enhance overall reliability.
    • No user-facing functionality or API changes; runtime behavior remains unchanged.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Sep 8, 2025

Walkthrough

A test in tests/test_singlenodeexecutor_noblock.py was updated to pass an additional argument resource_dict={"restart_limit": 2} when constructing SingleNodeExecutor in TestFunctionCrashes.test_single_node_executor_block_allocation. No other files or interfaces were changed.

Changes

Cohort / File(s) Summary
Tests
tests/test_singlenodeexecutor_noblock.py
Updated SingleNodeExecutor instantiation in TestFunctionCrashes.test_single_node_executor_block_allocation to include resource_dict={"restart_limit": 2} alongside max_workers=2 and block_allocation=True.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Possibly related PRs

  • Add tests for function crashes #802 — Also modifies TestFunctionCrashes.test_single_node_executor_block_allocation in the same test file; likely part of coordinating changes to SingleNodeExecutor test parameters.

Poem

A rabbit taps the keys with cheer,
Adds restart_limit, crisp and clear.
Two workers hop, allocations block—
Tests now burrow, tick and clock.
In nested nodes, we safely roam,
Green carrots greet a passing comb. 🥕

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch restart_limit

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov
Copy link

codecov bot commented Sep 8, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 98.12%. Comparing base (9bacadf) to head (32a9f6e).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #813   +/-   ##
=======================================
  Coverage   98.12%   98.12%           
=======================================
  Files          33       33           
  Lines        1649     1649           
=======================================
  Hits         1618     1618           
  Misses         31       31           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
tests/test_singlenodeexecutor_noblock.py (2)

160-160: Cover the non-block-allocation path too.

To ensure parity, add a sibling test that sets restart_limit without block allocation.

Apply:

@@
     def test_single_node_executor_block_allocation(self):
         with self.assertRaises(ExecutorlibSocketError):
-            with SingleNodeExecutor(max_workers=2, block_allocation=True, resource_dict={"restart_limit": 2}) as exe:
+            with SingleNodeExecutor(max_workers=2, block_allocation=True, resource_dict={"restart_limit": 2}) as exe:
                 f = exe.submit(exit_funct)
                 print(f.result())
 
+    def test_single_node_executor_no_block_allocation_with_restart_limit(self):
+        with self.assertRaises(ExecutorlibSocketError):
+            with SingleNodeExecutor(max_workers=2, resource_dict={"restart_limit": 2}) as exe:
+                f = exe.submit(exit_funct)
+                print(f.result())

160-160: Optional: add a behavior test that proves limited restarts happen.

Current test asserts the error path; consider an additional test where a function fails twice (e.g., via a counter persisted in a temp file) and then succeeds, and assert success only when restart_limit >= 2.

I can draft such a deterministic test if you want.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9bacadf and 32a9f6e.

📒 Files selected for processing (1)
  • tests/test_singlenodeexecutor_noblock.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
tests/test_singlenodeexecutor_noblock.py (1)
executorlib/executor/single.py (1)
  • SingleNodeExecutor (20-190)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (20)
  • GitHub Check: unittest_mpich (ubuntu-latest, 3.11)
  • GitHub Check: unittest_win
  • GitHub Check: unittest_old
  • GitHub Check: unittest_mpich (ubuntu-22.04-arm, 3.13)
  • GitHub Check: unittest_mpich (ubuntu-latest, 3.12)
  • GitHub Check: unittest_openmpi (ubuntu-latest, 3.12)
  • GitHub Check: unittest_mpich (ubuntu-24.04-arm, 3.13)
  • GitHub Check: unittest_openmpi (ubuntu-latest, 3.11)
  • GitHub Check: unittest_mpich (macos-latest, 3.13)
  • GitHub Check: unittest_flux_openmpi
  • GitHub Check: unittest_openmpi (ubuntu-22.04-arm, 3.13)
  • GitHub Check: unittest_openmpi (ubuntu-24.04-arm, 3.13)
  • GitHub Check: unittest_openmpi (ubuntu-latest, 3.13)
  • GitHub Check: notebooks
  • GitHub Check: notebooks_integration
  • GitHub Check: unittest_slurm_mpich
  • GitHub Check: unittest_flux_mpich
  • GitHub Check: unittest_openmpi (macos-latest, 3.13)
  • GitHub Check: benchmark (ubuntu-latest, 3.13, .ci_support/environment-mpich.yml)
  • GitHub Check: benchmark (ubuntu-latest, 3.13, .ci_support/environment-openmpi.yml)

def test_single_node_executor_block_allocation(self):
with self.assertRaises(ExecutorlibSocketError):
with SingleNodeExecutor(max_workers=2, block_allocation=True) as exe:
with SingleNodeExecutor(max_workers=2, block_allocation=True, resource_dict={"restart_limit": 2}) as exe:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Good addition; please verify the key propagates end-to-end.

Passing resource_dict={"restart_limit": 2} looks correct and aligns with the SingleNodeExecutor docstring. However, default_resource_dict in executorlib/executor/single.py doesn’t include restart_limit, so this key must be explicitly handled downstream (e.g., in create_single_node_executor). Please verify that restart_limit is actually consumed and not silently ignored in both block_allocation=True/False paths.

Run this to confirm usage coverage and call paths:


🏁 Script executed:

#!/bin/bash
# Where is restart_limit referenced?
rg -nC3 -P '\brestart_limit\b'

# Where is create_single_node_executor defined and how does it use resource_dict?
rg -nC5 -P 'def\s+create_single_node_executor\b'
rg -nC5 -P 'create_single_node_executor\('

# Check that worker (re)spawn logic reads restart_limit
rg -nC3 -P '(restart|respawn|relaunch).*restart_limit|restart_limit.*(restart|respawn|relaunch)'

Length of output: 17219


🏁 Script executed:

#!/bin/bash
# Inspect merging of default_resource_dict and resource_dict
sed -n '140,200p' executorlib/executor/single.py

# Inspect create_single_node_executor implementation
sed -n '360,450p' executorlib/executor/single.py

# Check for any direct usage of restart_limit in this file
rg -n 'restart_limit' -C5 executorlib/executor/single.py

Length of output: 9024


restart_limit ignored in executor pipeline
The restart_limit key from resource_dict isn’t read anywhere in create_single_node_executor (executorlib/executor/single.py:create_single_node_executor) nor by the spawned schedulers, so setting resource_dict={"restart_limit":2} has no effect. Add explicit handling—e.g. extend create_single_node_executor to pop restart_limit and pass it into the OneProcessTaskScheduler/BlockAllocationTaskScheduler or their spawners.

🤖 Prompt for AI Agents
In tests/test_singlenodeexecutor_noblock.py around line 160, the test sets
resource_dict={"restart_limit": 2} but this key is ignored by
create_single_node_executor and the spawned schedulers; update
executorlib/executor/single.py:create_single_node_executor to explicitly extract
(pop) "restart_limit" from resource_dict and forward it as a parameter to the
scheduler factory or spawner
(OneProcessTaskScheduler/BlockAllocationTaskScheduler) when
constructing/starting schedulers; ensure scheduler constructors/spawners accept
a restart_limit argument (add parameter and store/use it) and propagate it into
the process/task restart logic, and update any call sites that create schedulers
to pass the extracted restart_limit so the test's resource_dict takes effect.

@jan-janssen jan-janssen merged commit 79debaa into main Sep 8, 2025
86 of 91 checks passed
@jan-janssen jan-janssen deleted the restart_limit branch September 8, 2025 17:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant