Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improving stdlib integration #395

Open
Tinche opened this issue Jan 23, 2022 · 4 comments
Open

Improving stdlib integration #395

Tinche opened this issue Jan 23, 2022 · 4 comments
Labels

Comments

@Tinche
Copy link
Contributor

Tinche commented Jan 23, 2022

This is more of a discussion than an issue to be fixed.

I have recently refactored how our services log. Other folks might choose to do so too (and we could do some marketing to nudge them), so it might be useful to polish up this use case.

All our modules use structlog to perform logging calls. Before, the logs would pass through structlog and then enter the standard library logging system, where they would be printed out by a handler.

The flow was: structlog -> logging -> stdout.

Now, we still use structlog, but the messages never touch the standard library - they get emitted by the PrintLogger. As you yourself noted, it's faster and there's 0 reason for us to go through the stdlib.

So now the flow is: structlog -> stdout.

While doing this, I encountered two issues. One is that the standard library loggers support name, and the PrintLogger doesn't. I solved that one by subclassing and sticking a name attribute on it. I think logger names are very useful and maybe we could improve the situation somehow. Not necessarily adding a name attribute to PrintLogger, maybe we can be more clever somehow (maybe inject the 'name' key into a context somewhere when creating the logger? Not sure). But this required a little boilerplate and it got done.

The second issue is: other libraries still use the standard library to log. This was actually a problem before as well, but I just realized it as I was doing the transition. There's nothing we can really do about that. But I would like those logs to honor my structlog configuration.

So I was thinking we should write a StructlogHandler class that users can register with the stdlib logging system. It'd essentially just forward the logs into structlog. I want my exceptions to be a nice little JSON line instead of a multiline stack trace like it is by default.

The flow for 3rd party logs would be: logging -> structlog -> stdout.

The main benefit would be configuring the logging output only once.

@hynek
Copy link
Owner

hynek commented Jan 24, 2022

name should be easy, but isn't the second problem almost there with https://www.structlog.org/en/stable/standard-library.html#rendering-using-structlog-based-formatters-within-logging? Might be just a matter of documentation? 🤔

@Tinche
Copy link
Contributor Author

Tinche commented Jan 24, 2022

Hm, close. All of these terminate in a stdlib handler. It's also making the flow complex - the last diagram is user -> structlog -> stdlib -> structlog -> stdlib.

My thought was that if we write a class StructlogHandler(logging.Handler) we can have a better and faster flow.

So my code would be user -> structlog (the StructlogHandler isn't even used here), and 3rd party code would be 3rd party -> stdlib -> structlog (StructlogHandler is used here).

@hynek
Copy link
Owner

hynek commented Jan 29, 2022

Yeah I guess it could be kept simpler. Just processors and whatever is printing.

@hynek hynek added the stdlib label Apr 2, 2024
@sscherfke
Copy link
Contributor

We’ve been using structlog for a few projects since several years now and I’ve been wanting to write an article about this for some time.

Our requirements (wich are quite similar to @Tinche’s:

  • Use structlog alongside stdlib logging (which is still used by many other internal and 3rd party libs.
  • Have the same formatting for structlog and stdlib based loggers
  • Allow multiple named loggers (like stdlib) to have different log levels (current app logs on info, everything else on warning by default. Allow to set individual loggers to info or debug for debugging purposes)
  • Configure everything from external settings, exactly once on startup (with option to re-configure) (not shown in the example below)
  • Add additional fields
  • Add additional log levels (for b/w compat with legacy code, now shown below)

Here is a (nearly) minimal example for how we setup logging:

import logging
import functools
import sys
import logging.config
from datetime import UTC, datetime

import orjson
import structlog
import structlog._log_levels
from structlog.processors import _json_fallback_handler as orjson_fallback_handler
from structlog.tracebacks import ExceptionDictTransformer
from structlog.types import EventDict, Processor, WrappedLogger

import structlog
import structlog._log_levels


def render_orjson(
    dict_tracebacks: structlog.processors.ExceptionRenderer,
    /,
    logger: WrappedLogger,
    name: str,
    event_dict: EventDict,
) -> str:
    """
    Render log records as JSON with ``orjson`` and use dict tracebacks.

    This function must be wrapped with :func:`functools.partial()` and the
    *dict_tracebacks* function has to be passed to it::

        "processor": functools.partial(render_orjson, dict_tracebacks),

    We cannot use :class:`structlog.processors.JSONRenderer` because it returns bytes
    and the :class:`structlog.stdlib.LoggerFactory` needs strings. There is also the
    :class:`structlog.BytesLoggerFactory` but we cannot combine them.

    We also do the dict traceback formatting here, which allows us to simplify the
    stdlib log hander config::

        "()": structlog.stdlib.ProcessorFormatter,
        "processor": functools.partial(render_orjson, dict_tracebacks),

    vs::

        "()": structlog.stdlib.ProcessorFormatter,
        "processors": [
            structlog.stdlib.ProcessorFormatter.remove_processors_meta,
            structlog.processors.dict_tracebacks,
            render_orjson,
        ],
    """

    event_dict = dict_tracebacks(logger, name, event_dict)
    if event_dict.get("exception"):
        e = event_dict["exception"][0]
        event_dict["exc_type"] = e.get("exc_type", "?")
        event_dict["exc_value"] = e.get("exc_value", "?")

    try:
        # Create a new event dict with improved key sorting
        # because even JSON logs will also be viewed by humans. ;-)
        sorted_dict: EventDict = {
            "timestamp": event_dict["timestamp"],
            "level": event_dict["level"],
            "event": event_dict["event"],
            **event_dict,
        }
    except KeyError:
        # pytest-structlog removes the "timestamp" key.
        # In that case, just use the original event dict.
        sorted_dict = event_dict
    return orjson.dumps(sorted_dict, default=orjson_fallback_handler).decode()


def add_extra_fields(
    logger: logging.Logger, method_name: str, event_dict: EventDict
) -> EventDict:
    """
    A structlog processor that adds extra info fields to the event dict.

    This stems largely from structlog but this way, it is more customizable and maybe
    a bit faster, too.
    """
    # ISO UTC tiemstamp
    # structlog.processors.TimeStamper(fmt="iso", utc=True),
    event_dict["timestamp"] = datetime.now(tz=UTC).isoformat().replace("+00:00", "Z")

    # Logger name
    # structlog.stdlib.add_logger_name,
    record = event_dict.get("_record")
    if record is None:
        event_dict["logger"] = logger.name
    else:
        event_dict["logger"] = record.name

    # Log level in UPPER CASE
    # structlog.stdlib.add_log_level,
    if method_name == "warn":
        # The stdlib has an alias
        method_name = "warning"
    elif method_name == "exception":
        # exception("") method is the same as error("", exc_info=True)
        method_name = "error"
    event_dict["level"] = method_name.upper()

    # Log level number
    # structlog.stdlib.add_log_level_number,  # not needed right now
    # event_dict["level_number"] = structlog._log_levels._NAME_TO_LEVEL[method_name]

    return event_dict


def setup_logging() -> None:
    shared_processors: list[Processor] = [
        add_extra_fields,
        structlog.stdlib.ExtraAdder(),
    ]
    # Setup structlog
    #
    # The performance guide (https://www.structlog.org/en/stable/performance.html)
    # suggest to not pump structlog log entries through stdlib and use
    # a BytesLoggerFactory instead.
    #
    # However, that would not work for us, because structlog would than only use single
    # logger instance in all libs and apps.  But we want to be able to configure
    # different log levels for our app and the various libs.  This only works if we
    # use the stdlib logger factory that can create different loggers with different
    # names.
    pre_chain: list[Processor] = [
        structlog.stdlib.filter_by_level,
        structlog.contextvars.merge_contextvars,
    ]
    post_chain: list[Processor] = [
        structlog.stdlib.PositionalArgumentsFormatter(),
        structlog.processors.StackInfoRenderer(),
        structlog.stdlib.ProcessorFormatter.wrap_for_formatter,
    ]
    structlog.configure(
        cache_logger_on_first_use=True,
        # The filtering_bound_logger is fast, but it does not allow us to change the
        # log level at a later stage since the filtering is hard code in the logger
        # instances themselves.
        # In addition, we'd have to compute the minimum log level for all loggers and
        # use this, which might drastically reduce the positive performance impact.
        # wrapper_class=structlog.make_filtering_bound_logger(logging.INFO),
        wrapper_class=structlog.stdlib.BoundLogger,
        processors=pre_chain + shared_processors + post_chain,
        context_class=dict,
        logger_factory=structlog.stdlib.LoggerFactory(),
    )

    dict_tracebacks = structlog.processors.ExceptionRenderer(
        ExceptionDictTransformer(
            # ...  # Config from settings
        )
    )
    rich_tracebacks = structlog.dev.RichTracebackFormatter(
        # ...  # Config from settings
    )

    # Setup stdlib
    cfg = {
        "version": 1,
        "disable_existing_loggers": False,
        "formatters": {
            "json": {
                "()": structlog.stdlib.ProcessorFormatter,
                "processor": functools.partial(render_orjson, dict_tracebacks),
                "foreign_pre_chain": shared_processors,
            },
            "colored": {
                "()": structlog.stdlib.ProcessorFormatter,
                "processor": structlog.dev.ConsoleRenderer(
                    colors=True, exception_formatter=rich_tracebacks
                ),
                "foreign_pre_chain": shared_processors,
            },
        },
        "handlers": {
            "stream": {
                "formatter": "colored" if sys.stdout.isatty() else "json",
                "class": "logging.StreamHandler",
                "stream": "ext://sys.stderr",
            },
        },
        "root": {
            "handlers": ["stream"],
            "level": logging.INFO,  # Acutally loaded from settings
        },
        # Configure log levels for individual loggers from settings
        # "loggers": {name: {"level": level.value} for name, level in levels.items()},
    }
    logging.config.dictConfig(cfg)


def main() -> None:
    setup_logging()
    logging.getLogger("app").info("logging")
    structlog.get_logger("app").info("structlog")


if __name__ == "__main__":
    main()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants