Improving stdlib integration #395

Tinche · 2022-01-23T17:03:41Z

This is more of a discussion than an issue to be fixed.

I have recently refactored how our services log. Other folks might choose to do so too (and we could do some marketing to nudge them), so it might be useful to polish up this use case.

All our modules use structlog to perform logging calls. Before, the logs would pass through structlog and then enter the standard library logging system, where they would be printed out by a handler.

The flow was: structlog -> logging -> stdout.

Now, we still use structlog, but the messages never touch the standard library - they get emitted by the PrintLogger. As you yourself noted, it's faster and there's 0 reason for us to go through the stdlib.

So now the flow is: structlog -> stdout.

While doing this, I encountered two issues. One is that the standard library loggers support name, and the PrintLogger doesn't. I solved that one by subclassing and sticking a name attribute on it. I think logger names are very useful and maybe we could improve the situation somehow. Not necessarily adding a name attribute to PrintLogger, maybe we can be more clever somehow (maybe inject the 'name' key into a context somewhere when creating the logger? Not sure). But this required a little boilerplate and it got done.

The second issue is: other libraries still use the standard library to log. This was actually a problem before as well, but I just realized it as I was doing the transition. There's nothing we can really do about that. But I would like those logs to honor my structlog configuration.

So I was thinking we should write a StructlogHandler class that users can register with the stdlib logging system. It'd essentially just forward the logs into structlog. I want my exceptions to be a nice little JSON line instead of a multiline stack trace like it is by default.

The flow for 3rd party logs would be: logging -> structlog -> stdout.

The main benefit would be configuring the logging output only once.

The text was updated successfully, but these errors were encountered:

hynek · 2022-01-24T07:48:08Z

name should be easy, but isn't the second problem almost there with https://www.structlog.org/en/stable/standard-library.html#rendering-using-structlog-based-formatters-within-logging? Might be just a matter of documentation? 🤔

Tinche · 2022-01-24T23:06:28Z

Hm, close. All of these terminate in a stdlib handler. It's also making the flow complex - the last diagram is user -> structlog -> stdlib -> structlog -> stdlib.

My thought was that if we write a class StructlogHandler(logging.Handler) we can have a better and faster flow.

So my code would be user -> structlog (the StructlogHandler isn't even used here), and 3rd party code would be 3rd party -> stdlib -> structlog (StructlogHandler is used here).

hynek · 2022-01-29T17:12:56Z

Yeah I guess it could be kept simpler. Just processors and whatever is printing.

sscherfke · 2024-12-01T20:16:12Z

We’ve been using structlog for a few projects since several years now and I’ve been wanting to write an article about this for some time.

Our requirements (wich are quite similar to @Tinche’s:

Use structlog alongside stdlib logging (which is still used by many other internal and 3rd party libs.
Have the same formatting for structlog and stdlib based loggers
Allow multiple named loggers (like stdlib) to have different log levels (current app logs on info, everything else on warning by default. Allow to set individual loggers to info or debug for debugging purposes)
Configure everything from external settings, exactly once on startup (with option to re-configure) (not shown in the example below)
Add additional fields
Add additional log levels (for b/w compat with legacy code, now shown below)

Here is a (nearly) minimal example for how we setup logging:

import logging
import functools
import sys
import logging.config
from datetime import UTC, datetime

import orjson
import structlog
import structlog._log_levels
from structlog.processors import _json_fallback_handler as orjson_fallback_handler
from structlog.tracebacks import ExceptionDictTransformer
from structlog.types import EventDict, Processor, WrappedLogger

import structlog
import structlog._log_levels


def render_orjson(
    dict_tracebacks: structlog.processors.ExceptionRenderer,
    /,
    logger: WrappedLogger,
    name: str,
    event_dict: EventDict,
) -> str:
    """
    Render log records as JSON with ``orjson`` and use dict tracebacks.

    This function must be wrapped with :func:`functools.partial()` and the
    *dict_tracebacks* function has to be passed to it::

        "processor": functools.partial(render_orjson, dict_tracebacks),

    We cannot use :class:`structlog.processors.JSONRenderer` because it returns bytes
    and the :class:`structlog.stdlib.LoggerFactory` needs strings. There is also the
    :class:`structlog.BytesLoggerFactory` but we cannot combine them.

    We also do the dict traceback formatting here, which allows us to simplify the
    stdlib log hander config::

        "()": structlog.stdlib.ProcessorFormatter,
        "processor": functools.partial(render_orjson, dict_tracebacks),

    vs::

        "()": structlog.stdlib.ProcessorFormatter,
        "processors": [
            structlog.stdlib.ProcessorFormatter.remove_processors_meta,
            structlog.processors.dict_tracebacks,
            render_orjson,
        ],
    """

    event_dict = dict_tracebacks(logger, name, event_dict)
    if event_dict.get("exception"):
        e = event_dict["exception"][0]
        event_dict["exc_type"] = e.get("exc_type", "?")
        event_dict["exc_value"] = e.get("exc_value", "?")

    try:
        # Create a new event dict with improved key sorting
        # because even JSON logs will also be viewed by humans. ;-)
        sorted_dict: EventDict = {
            "timestamp": event_dict["timestamp"],
            "level": event_dict["level"],
            "event": event_dict["event"],
            **event_dict,
        }
    except KeyError:
        # pytest-structlog removes the "timestamp" key.
        # In that case, just use the original event dict.
        sorted_dict = event_dict
    return orjson.dumps(sorted_dict, default=orjson_fallback_handler).decode()


def add_extra_fields(
    logger: logging.Logger, method_name: str, event_dict: EventDict
) -> EventDict:
    """
    A structlog processor that adds extra info fields to the event dict.

    This stems largely from structlog but this way, it is more customizable and maybe
    a bit faster, too.
    """
    # ISO UTC tiemstamp
    # structlog.processors.TimeStamper(fmt="iso", utc=True),
    event_dict["timestamp"] = datetime.now(tz=UTC).isoformat().replace("+00:00", "Z")

    # Logger name
    # structlog.stdlib.add_logger_name,
    record = event_dict.get("_record")
    if record is None:
        event_dict["logger"] = logger.name
    else:
        event_dict["logger"] = record.name

    # Log level in UPPER CASE
    # structlog.stdlib.add_log_level,
    if method_name == "warn":
        # The stdlib has an alias
        method_name = "warning"
    elif method_name == "exception":
        # exception("") method is the same as error("", exc_info=True)
        method_name = "error"
    event_dict["level"] = method_name.upper()

    # Log level number
    # structlog.stdlib.add_log_level_number,  # not needed right now
    # event_dict["level_number"] = structlog._log_levels._NAME_TO_LEVEL[method_name]

    return event_dict


def setup_logging() -> None:
    shared_processors: list[Processor] = [
        add_extra_fields,
        structlog.stdlib.ExtraAdder(),
    ]
    # Setup structlog
    #
    # The performance guide (https://www.structlog.org/en/stable/performance.html)
    # suggest to not pump structlog log entries through stdlib and use
    # a BytesLoggerFactory instead.
    #
    # However, that would not work for us, because structlog would than only use single
    # logger instance in all libs and apps.  But we want to be able to configure
    # different log levels for our app and the various libs.  This only works if we
    # use the stdlib logger factory that can create different loggers with different
    # names.
    pre_chain: list[Processor] = [
        structlog.stdlib.filter_by_level,
        structlog.contextvars.merge_contextvars,
    ]
    post_chain: list[Processor] = [
        structlog.stdlib.PositionalArgumentsFormatter(),
        structlog.processors.StackInfoRenderer(),
        structlog.stdlib.ProcessorFormatter.wrap_for_formatter,
    ]
    structlog.configure(
        cache_logger_on_first_use=True,
        # The filtering_bound_logger is fast, but it does not allow us to change the
        # log level at a later stage since the filtering is hard code in the logger
        # instances themselves.
        # In addition, we'd have to compute the minimum log level for all loggers and
        # use this, which might drastically reduce the positive performance impact.
        # wrapper_class=structlog.make_filtering_bound_logger(logging.INFO),
        wrapper_class=structlog.stdlib.BoundLogger,
        processors=pre_chain + shared_processors + post_chain,
        context_class=dict,
        logger_factory=structlog.stdlib.LoggerFactory(),
    )

    dict_tracebacks = structlog.processors.ExceptionRenderer(
        ExceptionDictTransformer(
            # ...  # Config from settings
        )
    )
    rich_tracebacks = structlog.dev.RichTracebackFormatter(
        # ...  # Config from settings
    )

    # Setup stdlib
    cfg = {
        "version": 1,
        "disable_existing_loggers": False,
        "formatters": {
            "json": {
                "()": structlog.stdlib.ProcessorFormatter,
                "processor": functools.partial(render_orjson, dict_tracebacks),
                "foreign_pre_chain": shared_processors,
            },
            "colored": {
                "()": structlog.stdlib.ProcessorFormatter,
                "processor": structlog.dev.ConsoleRenderer(
                    colors=True, exception_formatter=rich_tracebacks
                ),
                "foreign_pre_chain": shared_processors,
            },
        },
        "handlers": {
            "stream": {
                "formatter": "colored" if sys.stdout.isatty() else "json",
                "class": "logging.StreamHandler",
                "stream": "ext://sys.stderr",
            },
        },
        "root": {
            "handlers": ["stream"],
            "level": logging.INFO,  # Acutally loaded from settings
        },
        # Configure log levels for individual loggers from settings
        # "loggers": {name: {"level": level.value} for name, level in levels.items()},
    }
    logging.config.dictConfig(cfg)


def main() -> None:
    setup_logging()
    logging.getLogger("app").info("logging")
    structlog.get_logger("app").info("structlog")


if __name__ == "__main__":
    main()

hynek added the stdlib label Apr 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improving stdlib integration #395

Improving stdlib integration #395

Tinche commented Jan 23, 2022

hynek commented Jan 24, 2022

Tinche commented Jan 24, 2022

hynek commented Jan 29, 2022

sscherfke commented Dec 1, 2024

Improving stdlib integration #395

Improving stdlib integration #395

Comments

Tinche commented Jan 23, 2022

hynek commented Jan 24, 2022

Tinche commented Jan 24, 2022

hynek commented Jan 29, 2022

sscherfke commented Dec 1, 2024