Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to handle startup failure in custom Worker? #2867

Closed
Dreamsorcerer opened this issue Sep 25, 2022 · 12 comments
Closed

How to handle startup failure in custom Worker? #2867

Dreamsorcerer opened this issue Sep 25, 2022 · 12 comments

Comments

@Dreamsorcerer
Copy link

It would be great if anyone could provide some feedback on how a custom worker is meant to handle an exception when starting up the application. Please take a look at this PR, where it has been suggested that this should become a WORKER_BOOT_ERROR, and my comments: aio-libs/aiohttp#6968

Any feedback on correct handling would be greatly appreciated.

@benoitc
Copy link
Owner

benoitc commented Oct 14, 2022

why would you do that? the application should be stable enough so such error should no( be catched before it starts. Let it crash and cfix the error should be the rule there. Hacking around in the worker would just hide the error. Known exception themselves need to be catched by the application.

Gunicorn itself offer a way to test the configuration and start the application uysing the command line.

@benoitc
Copy link
Owner

benoitc commented Oct 14, 2022

if you really want to provide a different level, then you can sys exit with a custom error code if needed. But i thing logging and letting it crash would be safer and easier to fix.

@Dreamsorcerer
Copy link
Author

Please see the linked PR (I'm not the one using or having issues with Gunicorn, I just want to know what we're expected to do, if anything):

I've found out that GunicornWebWorker exits with 0 error code even when an error occurs during startup. This makes Gunicorn endlessly spawn new workers.

I did find there is an APP_LOAD_ERROR, so the proposal has now changed from WORKER_BOOT_ERROR to APP_LOAD_ERROR.

@Dreamsorcerer
Copy link
Author

Hacking around in the worker would just hide the error.

To be clear, we're talking about what the correct implementation should be for the aiohttp worker, which looking through git history actually lived in this repository a few years ago.

@benoitc benoitc closed this as completed May 7, 2023
@Dreamsorcerer
Copy link
Author

I guess we'll just take an educated guess then..

@benoitc
Copy link
Owner

benoitc commented May 7, 2023

what do you mean? I answered to the details provided there.

Just let it crash and don't try to be smart and catch mre error than needed in your worker. ie don't catch apps error and co. this is what is actually done. the arbiter will catch the exit code. If you want to provide a cust exit code then pass it via sys.exit.

If you want more details then please add the needed info there. I will reopen the ticket since you seem to expect more but please describe more precisely what part you're missing.

@Dreamsorcerer
Copy link
Author

As pointed out in my previous comments, I'm under the impression you are talking about changes to a specific app, but I'm asking about changes to a worker implementation used by hundreds of apps (which used to be maintained in this repo).

I figured you'd probably be in the best place to review the proposed change (as asvetlov is not really available currently), and linked the PR at the top. I was hoping you'd take 5 mins to look at it and confirm if that makes sense or not. Otherwise, I'm just making an educated guess on the correct behaviour, as I have no familiarity with gunicorn implementation.

@benoitc benoitc reopened this May 7, 2023
@benoitc
Copy link
Owner

benoitc commented May 7, 2023

I was talking about workers.

See what is done in the simpler sync worker to load the application:

def load_wsgi(self):
try:
self.wsgi = self.app.wsgi()
except SyntaxError as e:
if not self.cfg.reload:
raise
self.log.exception(e)
# fix from PR #1228
# storing the traceback into exc_tb will create a circular reference.
# per https://docs.python.org/2/library/sys.html#sys.exc_info warning,
# delete the traceback after use.
try:
_, exc_val, exc_tb = sys.exc_info()
self.reloader.add_extra_file(exc_val.filename)
tb_string = io.StringIO()
traceback.print_tb(exc_tb, file=tb_string)
self.wsgi = util.make_fail_app(tb_string.getvalue())
finally:
del exc_tb

If application would not load it would simply crash the spawned worker and be catched by the arbiter since the loader would have raised the exception AppImportError via the util function:

raise AppImportError(

And then to tell to the arbiter the worker booted it's really simple, self.booted need to be set to true or false if booted:

self.booted = True

For others error it is just raising exeception. Only signals set the sys._exit. BOOT and APP_LOAD_ERROR are set by the arbiter:

try:
util._setproctitle("worker [%s]" % self.proc_name)
self.log.info("Booting worker with pid: %s", worker.pid)
self.cfg.post_fork(self, worker)
worker.init_process()
sys.exit(0)
except SystemExit:
raise
except AppImportError as e:
self.log.debug("Exception while loading the application",
exc_info=True)
print("%s" % e, file=sys.stderr)
sys.stderr.flush()
sys.exit(self.APP_LOAD_ERROR)
except Exception:
self.log.exception("Exception in worker process")
if not worker.booted:
sys.exit(self.WORKER_BOOT_ERROR)
sys.exit(-1)
finally:
self.log.info("Worker exiting (pid: %s)", worker.pid)
try:
worker.tmp.close()
self.cfg.worker_exit(self, worker)
except Exception:
self.log.warning("Exception during worker exit:\n%s",
traceback.format_exc())

Hope this helps.

@benoitc
Copy link
Owner

benoitc commented May 7, 2023

updated previous comment for clarity.

@Dreamsorcerer
Copy link
Author

Wait, in run() it currently catches all exceptions and then exits with a 0 exit code:
https://github.com/aio-libs/aiohttp/blob/master/aiohttp/worker.py#L53-L60

Should that be completely changed and just allow exceptions to be raised instead? Or am I misunderstanding that?

Seem like it already did that originally, but received updates as if that wasn't the desired behaviour:
The exit call was added 8 years ago: aio-libs/aiohttp@763d07b
And exception suppression about 6 years ago: aio-libs/aiohttp@3d00ec1#diff-fa9062813a095af1b8f81504f078fedfc4440c1ffbcb29e4fc8ce6e713cf8cd9R63

@benoitc
Copy link
Owner

benoitc commented May 8, 2023

To clarify, generally speaking the worker should only catch the errors related to its own activity but raise when application is failing to load or is erroring. Once the worker is booted, ie. ready to accept requests it should set worker.booted to true.

Exit when expected (signaling) should exit with the codde 0.

If I get sometimes today I will check your code but you have the idea :)

@benoitc
Copy link
Owner

benoitc commented Aug 6, 2024

no activity since awhile. closing feel free to create a new ticket if needed.

@benoitc benoitc closed this as completed Aug 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants