-
Notifications
You must be signed in to change notification settings - Fork 14.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Propagate logs to stdout when in k8s executor pod #28440
Propagate logs to stdout when in k8s executor pod #28440
Conversation
57c260f
to
a6b0563
Compare
9195ee5
to
c90e631
Compare
@dstandish I think an abstraction point in the executor framework would be better rather than the In terms of actual implementation, rather than adjusting log handlers at runtime, perhaps an internal pub/sub mechanism could be devised. |
We already adjust handlers. This fixes a deficiency This does not add meaningful backcompat surface area, is basically a small adjustment to what we already do. Your suggestion may be a good one, but that's not what this PR does, and this PR does not in any way make your approach more difficult / less likely / less practical to implement. Meanwhile, logging in k8s executor is broken and this fixes it without, as far as I can tell, much in the way of downside. |
orig_level = root_logger.level | ||
root_logger.setLevel(task_logger.level) | ||
orig_handlers = root_logger.handlers.copy() | ||
root_logger.handlers[:] = task_logger.handlers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@malthe observe, we already copy (temporarily) to root logger the thandlers from airflow.task
This causes problems because we have to have complicated propagation rules at the airflow.task logger because we leave it there.
While my solution here is a few more lines (and a lot more comments) it's not very complicated. There are two parts.
- Instead of copying the handlers to root, I move them to root. We don't need them at task if they are already at root. This could ultimately allow for simplification of our propagation logic.
- Previously we remove our console handler from root at run time. Now, if we're in a k8s executor pod, I keep the console handler there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't quite understand what "stdout redirection" means – is that within Python or is that something happening outside of Python (in a shell script perhaps) – ?
When I think about this problem, it seems that there is an orthogonal concern in the logging setup of Airflow which is whether or not the task logger (during execution) should be emitted to the stdout stream.
That's something you could want in any situation, not just K8S.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't quite understand what "stdout redirection" means – is that within Python or is that something happening outside of Python (in a shell script perhaps) – ?
i am not sure what comment you refer to but, what this means is there are stdlib helpers to take things that would ordinarily go to stdout (e.g. print) and send them to some other stream.
see from contextlib import redirect_stdout
. it just monkey patches sys.stdout
temporarily.
we use them (for better or worse) to redirect stdout to task logs.
our log read logic when task is running is usually, read from flask log server. most task logging is redirected to log (and therefore file) and our "console" handler respects this redirection. for obvious reasons -- the celery worker log, or local executor log would get unreasonably chatty. but in k8s executor context, there is no log server on the worker, and there is no problem with keeping the stdout, and importantly our log read logic assumes everything will be forwarded to stdout -- and this is what's broken that i am here fixing.
ok updated the description so hopefully it's a little clearer whats going on |
1f50899
to
735a395
Compare
29239b0
to
2a514f1
Compare
Ok @malthe i have simplified the code, if you would like to have another look: |
1d8461a
to
776b819
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few small nits, but looks good to me overall.
# processing hundreds of simultaneous tasks. | ||
# this should be last thing before running, to reduce likelihood of an open session | ||
# which can cause trouble if running process in a fork. | ||
settings.reconfigure_orm(disable_connection_pool=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this related to this logging PR in anyway?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i don't think this will fail if we remove this change. you could call it a drive by and i can remove if you like.
task runner tests failed with this change. then i was like "oh why is it just throwing away whole log config, let's try and not do that". then after leaving the FTH intact for that test, that i discovered there was an extraneous session being created in created in _render_filename. as part of troubleshooting the failures and discovering that though, also saw that this reconfigure is happening earlier than it probably ought to. should be done as late as possible i think.
Co-authored-by: Ash Berlin-Taylor <ash_github@firemirror.com>
Co-authored-by: Ash Berlin-Taylor <ash_github@firemirror.com>
Co-authored-by: Ash Berlin-Taylor <ash_github@firemirror.com>
…text After apache#28440, instead of having a task logger both at `airflow.task` and root logger, we only have it at root logger. This means we can remove the logic to set propagate to False, because there's no longer a risk of record processed by FTH twice. It also means we can remove the logic to walk up the logger hierarchy and set context because we don't need to hit both airflow.task and root -- there will only ever be one such handler instance. So in effect we deprecate the MAINTAIN_PROPAGATE logic and no longer set propagate=False by default. While we could probably remove the "DISABLE_PROPAGATE" logic too (it's only used by file processor) it doesn't really hurt to leave it. (cherry picked from commit b9ed441f9127503f55e338f728e68f10bc77f3df)
Currently live logging (viewing from webserver while task running) for k8s executor pods doesn't work.
The cause has a few components from things we do during task run:
The reason for (3) is we copy the task handler to root logger instead of move, which requires that we disable propagation at the task logger.
This PR fixes all of these things and, ultimately, simplifies the setup.