Draft a slurm test by liamhuber · Pull Request #704 · pyiron/pyiron_workflow

liamhuber · 2025-07-17T17:58:27Z

based on Jan's work in pyiron/executorlib#726

github-actions · 2025-07-17T17:58:37Z

👈 Launch a binder notebook on branch pyiron/pyiron_workflow/slurm_test

codecov · 2025-07-17T18:02:24Z

Codecov Report

Attention: Patch coverage is 66.66667% with 1 line in your changes missing coverage. Please review.

Project coverage is 92.02%. Comparing base (0182d3a) to head (0175251).

Files with missing lines	Patch %	Lines
pyiron_workflow/workflow.py	66.66%	1 Missing ⚠️

❌ Your patch check has failed because the patch coverage (66.66%) is below the target coverage (95.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files

@@             Coverage Diff              @@
##           executor     #704      +/-   ##
============================================
- Coverage     92.10%   92.02%   -0.09%     
============================================
  Files            34       34              
  Lines          3726     3725       -1     
============================================
- Hits           3432     3428       -4     
- Misses          294      297       +3

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

liamhuber · 2025-07-17T19:50:28Z

 Traceback (most recent call last):
  File "/home/runner/work/pyiron_workflow/pyiron_workflow/cached-miniforge/my-env/lib/python3.12/site-packages/pyiron_snippets/dotdict.py", line 4, in __getattr__
    return self.__getitem__(item)
           ^^^^^^^^^^^^^^^^^^^^^^
KeyError: 'add_done_callback'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/runner/work/pyiron_workflow/pyiron_workflow/tests/cluster/slurm_discovery.py", line 13, in <module>
    wf.run_in_thread()
  File "/home/runner/work/pyiron_workflow/pyiron_workflow/cached-miniforge/my-env/lib/python3.12/site-packages/pyiron_workflow/workflow.py", line 404, in run_in_thread
    f.add_done_callback(lambda _: setattr(self, "executor", None))
    ^^^^^^^^^^^^^^^^^^^
  File "/home/runner/work/pyiron_workflow/pyiron_workflow/cached-miniforge/my-env/lib/python3.12/site-packages/pyiron_snippets/dotdict.py", line 6, in __getattr__
    raise AttributeError(
AttributeError: DotDict object has no attribute 'add_done_callback'

Interesting, the internal run call is resolving without returning a future, it's just giving the output dotdict directly.

liamhuber · 2025-07-18T16:43:19Z

Interesting, the internal run call is resolving without returning a future, it's just giving the output dotdict directly.

Because it was accessing a cached result. Cf. #707

Signed-off-by: liamhuber <liamhuber@greyhavensolutions.com>

Otherwise we run into trouble where it loads saved executor instructions (that already have what it would use anyhow) Signed-off-by: liamhuber <liamhuber@greyhavensolutions.com>

Signed-off-by: liamhuber <liamhuber@greyhavensolutions.com>

@jan-janssen

* Wrap executorlib executors To exploit the new caching interface so that nodes can rely on their running state and lexical path to access previously executed results. Locally with the SingleNodeExecutor everything is looking good, but that doesn't natively support terminating the process that submit the job. I'd like to play around with this using the SlurmClusterExecutor on the cluster before making further changes. Signed-off-by: liamhuber <liamhuber@greyhavensolutions.com> * Be kinder to fstrings Signed-off-by: liamhuber <liamhuber@greyhavensolutions.com> * Remove prints Signed-off-by: liamhuber <liamhuber@greyhavensolutions.com> * Black Signed-off-by: liamhuber <liamhuber@greyhavensolutions.com> * Bump lower bound of executorlib Since we now depend explicitly on a new feature Signed-off-by: liamhuber <liamhuber@greyhavensolutions.com> * Wrap SlurmClusterExecutor Signed-off-by: liamhuber <liamhuber@greyhavensolutions.com> * Don't re-parse executor tuples It causes a weird hang that blocks observability. Signed-off-by: liamhuber <liamhuber@greyhavensolutions.com> * Exploit lexical path cleaning And make the expected file independently accessible Signed-off-by: liamhuber <liamhuber@greyhavensolutions.com> * Move cache cleaning into the finally method And hide it behind its own boolean flag for testing Signed-off-by: liamhuber <liamhuber@greyhavensolutions.com> * Update executorlib syntax And make the file name fixed and accessible at the class level Signed-off-by: liamhuber <liamhuber@greyhavensolutions.com> * Test the single node executor Signed-off-by: liamhuber <liamhuber@greyhavensolutions.com> * Clean the associated cache subdirectory The slurm executor populates this with a submission script, etc. Signed-off-by: liamhuber <liamhuber@greyhavensolutions.com> * Clean up the slurm stuff too Signed-off-by: liamhuber <liamhuber@greyhavensolutions.com> * Add local file executor From @jan-janssen in [this comment](pyiron/executorlib#708 (comment)) Co-authored-by: Jan Janssen Signed-off-by: liamhuber <liamhuber@greyhavensolutions.com> * lint Signed-off-by: liamhuber <liamhuber@greyhavensolutions.com> * Test local both executors Signed-off-by: liamhuber <liamhuber@greyhavensolutions.com> * Add prefix to cleaning directory Signed-off-by: liamhuber <liamhuber@greyhavensolutions.com> * Use test executor The local file executor got directly included in executorlib as a testing tool. Signed-off-by: liamhuber <liamhuber@greyhavensolutions.com> * Validate executor at assignment Signed-off-by: liamhuber <liamhuber@greyhavensolutions.com> * Delay executor tuple parsing And always with-execute tuples since there is only ever one instance of this executor. If we have already been assigned an executor _instance_ then we trust the user to be managing its state and submit directly rather than wrapping in a with-clause Signed-off-by: liamhuber <liamhuber@greyhavensolutions.com> * Decrease improvement expectation Recent changes threw off the balance of times in the first vs second run, so rather compare to what you actually care about: that the second run is bypassing the sleep call. Signed-off-by: liamhuber <liamhuber@greyhavensolutions.com> * Use explicit shutdown Instead of a with-clause. This way the executor is still permitted to release the thread before the job is done, but we still guarantee that executors created by bespoke instructions get shutdown at the end of their one-future lifetime. Signed-off-by: liamhuber <liamhuber@greyhavensolutions.com> * Clean up written file Signed-off-by: liamhuber <liamhuber@greyhavensolutions.com> * Don't wait There was necessarily only the one future, so don't wait at shutdown. This removes the need for accepting the runtime error and prevents the wrapped executorlib executors from hanging indefinitely. Signed-off-by: liamhuber <liamhuber@greyhavensolutions.com> * Bump executorlib version Including the lower bound Signed-off-by: liamhuber <liamhuber@greyhavensolutions.com> * Test application to non-node And debug the error message Signed-off-by: liamhuber <liamhuber@greyhavensolutions.com> * Validate resources at init too Since that's the way users will typically interact with this field. I also had to change the inheritance order to make sure we were dealing with the user-facing executor and not the task scheduler, but this doesn't impact the submit loop. Signed-off-by: liamhuber <liamhuber@greyhavensolutions.com> * Test file cleaning Signed-off-by: liamhuber <liamhuber@greyhavensolutions.com> * Test uninterpretable executor setting Signed-off-by: liamhuber <liamhuber@greyhavensolutions.com> * Modify test for coverage So we pass throught the Runnable._shutdown_executor_callback process Signed-off-by: liamhuber <liamhuber@greyhavensolutions.com> * Decrease lower bound Signed-off-by: liamhuber <liamhuber@greyhavensolutions.com> * Draft a slurm test (#704) * Test slurm submission Signed-off-by: liamhuber <liamhuber@greyhavensolutions.com> * Don't apply callbacks to cached returns Signed-off-by: liamhuber <liamhuber@greyhavensolutions.com> * Only validate submission-time resources Otherwise we run into trouble where it loads saved executor instructions (that already have what it would use anyhow) Signed-off-by: liamhuber <liamhuber@greyhavensolutions.com> * Mark module Signed-off-by: liamhuber <liamhuber@greyhavensolutions.com> * Test cached result branch Signed-off-by: liamhuber <liamhuber@greyhavensolutions.com> --------- Signed-off-by: liamhuber <liamhuber@greyhavensolutions.com> --------- Signed-off-by: liamhuber <liamhuber@greyhavensolutions.com>

liamhuber added 4 commits July 18, 2025 10:01

Test slurm submission

de9aacf

Signed-off-by: liamhuber <liamhuber@greyhavensolutions.com>

Don't apply callbacks to cached returns

2483d61

Signed-off-by: liamhuber <liamhuber@greyhavensolutions.com>

Only validate submission-time resources

fcf07c1

Otherwise we run into trouble where it loads saved executor instructions (that already have what it would use anyhow) Signed-off-by: liamhuber <liamhuber@greyhavensolutions.com>

Mark module

0175251

Signed-off-by: liamhuber <liamhuber@greyhavensolutions.com>

liamhuber force-pushed the slurm_test branch from 9f71ccb to 0175251 Compare July 18, 2025 17:17

liamhuber marked this pull request as ready for review July 18, 2025 17:17

liamhuber mentioned this pull request Jul 18, 2025

Expose executorlib SLURM power in API #708

Closed

Test cached result branch

76e42b7

Signed-off-by: liamhuber <liamhuber@greyhavensolutions.com>

liamhuber merged commit f579159 into executor Jul 18, 2025
18 checks passed

liamhuber deleted the slurm_test branch July 18, 2025 17:53

liamhuber mentioned this pull request Jul 18, 2025

Wrap executorlib executors #678

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Draft a slurm test#704

Draft a slurm test#704
liamhuber merged 5 commits intoexecutorfrom
slurm_test

liamhuber commented Jul 17, 2025

Uh oh!

github-actions bot commented Jul 17, 2025

Uh oh!

codecov bot commented Jul 17, 2025 •

edited

Loading

Uh oh!

liamhuber commented Jul 17, 2025

Uh oh!

liamhuber commented Jul 18, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

liamhuber commented Jul 17, 2025

Uh oh!

github-actions bot commented Jul 17, 2025

Uh oh!

codecov bot commented Jul 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

liamhuber commented Jul 17, 2025

Uh oh!

liamhuber commented Jul 18, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

codecov bot commented Jul 17, 2025 •

edited

Loading