Skip to content

Push the task back on the queue if the daemon crashes? #95

Closed Answered by shikokuchuo
wlandau asked this question in Q&A
Discussion options

You must be logged in to vote

@wlandau this seems at odds with wlandau/crew#101 and the retry mechanism you already implemented...

The current behaviour is not surprising, and also seems to have remained the same throughout - I couldn't find a changelog entry. By design, the crashed task is isolated at the one daemon instance, so (assuming it's bad code) it doesn't go on and crash all 1,000 nodes in your HPC cluster!

At the point it's crashed, (where you see assigned > complete and online == 0), you have the option to (i) relaunch the daemon, or (ii) use saisei(force = TRUE) to return the task as an 'errorValue'. The consuming application e.g. targets can then contain logic to re-submit the task or handle otherwise.

W…

Replies: 1 comment 1 reply

Comment options

You must be logged in to vote
1 reply
@wlandau
Comment options

Answer selected by wlandau
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants