-
Notifications
You must be signed in to change notification settings - Fork 11.3k
Fail-fast on prompt_worker crash (prevent “accept but not execute”) #11513
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
@Kosinkadink @guill @comfyanonymous |
| logging.error(f"Failed to initialize database. Please ensure you have installed the latest requirements. If the error persists, please report this as in future the database will be required: {e}") | ||
|
|
||
|
|
||
| def prompt_worker_failfast(*args, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious, what happens in your test case if you infinite loop here?
Instead of os._exit() if you just while(True) with your exception discard does is recover?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point.
In fact, by the time we reach os._exit(), the prompt_worker thread is already no longer alive. If we replace os._exit() with while True, the process won’t exit; requests will continue to be enqueued, but the queue will never be consumed.
Reason
We reproduced a scenario where an exception triggers
handle_execution_errorand the code path accesses:class_type = prompt[node_id]["class_type"]If the reported
node_idis modified to a non-existent ID, this lookup can raise (e.g.,KeyError), causing theprompt_workerthread to crash.After that, ComfyUI still appears to accept requests and successfully
putthem into the queue("got prompt" looks normal), but since
prompt_workeris no longer alive, the queue is never consumed.This results in a silent “accept but not execute” state.
This PR is done:
The goal of this PR is to prevent ComfyUI from getting into a state where it accepts requests but does not process them, by adopting a fail-fast that terminates the process immediately.