Move from `slog` to `tracing` #317

XAMPPRocky · 2021-07-07T07:07:47Z

tracing is a diagnostic built and maintained by the Tokio team. It handles logging, but it provides a lot more as it's design for general instrumentation, for example there's a companion crate that uses the output to provide flamegraphs from the events. So using tracing would provide us with a lot more insight into what's happening at runtime than our current logging solution.

The text was updated successfully, but these errors were encountered:

markmandel · 2021-07-19T23:12:24Z

I'm curious - a while ago I was looking into https://github.com/tikv/pprof-rs to get better visibility into where bottlenecks are in the codebase. My thought being, that then we would't have to manually instrument the codebase. Would something like that be used in conjunction with something like tracing, or are they more of competitors?

(I've been meaning to ping you on what tooling you like for performance tracing for Rust).

XAMPPRocky · 2021-07-26T13:30:00Z

Would something like that be used in conjunction with something like tracing, or are they more of competitors?

Well these two particular libraries are probably competitors right now, because pprof uses the backtraces to figure out what happened, where as tracing uses crafted events. For our use case (async) I think tracing is way forward for quite a few reasons. One, it's what being focused on by the Rust and Tokio developers (you can read more about that here http://blog.pnkfx.org/blog/2021/04/26/road-to-turbowish-part-1-goals/ ). Two, it's like log in that provides a clear separation from providing the events to subscribing to the events (it's essentially log + slog + spans). So for example you would be able to create a flamegraph from the same events that you spit out to your logging output, which IMO is a lot better than backtrace based profiling because that can be quite noisy with async code, since the tasks are passed into tokio several layers deep in stack.

markmandel · 2021-07-26T22:03:00Z

From a logging point of view, as long as we still get structured JSON logging, I'm happy 😄, and it looks like https://docs.rs/tracing-subscriber/0.2.19/tracing_subscriber/fmt/format/struct.Json.html or tracing-serde will help solve this. (also our INFO and ERROR level logging are quite quiet, so I don't think there is much adjustment needed).

where as tracing uses crafted events

That's the only concern I really have - how much work does it take to build out crafted events? Could we possibly miss a bottleneck / potential performance improvement spot if we have to manually craft our events?

That all being said - the integration with opentelemetry and distributed tracing is certainly very interesting (I've been thinking a bit about how we could do tracing across proxies, so this seems prescient):
https://github.com/tokio-rs/tracing/blob/master/examples/examples/opentelemetry.rs

As well as the general focus on how to debug async operations. 👍🏻

This is super interesting! I chose slog because it was simple, and closest to what I knew from doing similar things in Go - but the use cases I had in the past were different.

The other thing I like about this - is it decouples our logging from the quilkin library, and if we have a consumer as a library that wants to align the logging output with what they do (which may be different from our own), they can do that through their own collector.

I would recommend one of us doing a small POC first, see what the end results are, and if we are all happy, let's move ahead with it?

I really liked this article for really getting my head around how tracing works:
https://betterprogramming.pub/production-grade-logging-in-rust-applications-2c7fffd108a6

👍🏻 on my end!

XAMPPRocky · 2021-07-27T07:02:01Z

That's the only concern I really have - how much work does it take to build out crafted events? Could we possibly miss a bottleneck / potential performance improvement spot if we have to manually craft our events?

It's not a lot of work to get something useful. You just tag every function you want to instrument with #[tracing::instrument], replace slog's macros with tracing's and call it a day.

markmandel · 2021-07-29T16:19:35Z

Yeah - this is very exciting. I like this a lot.

One thing that came up recently in a chat I was having - since we can export this to OpenTelemetry, we can export traces with routing tokens to a distributed tracing store (Honeycomb, Google Cloud monitoring, Grafana Tempo, Datadog etc) -- then you can start to tie player data to tracing info and metrics, which can be another avenue to identify bad actors.

This has a lot of very interesting and useful applications both for performance testing as well as at scale 👍🏻

XAMPPRocky mentioned this issue Jul 14, 2021

Get packet buffer size from operating system #330

Open

XAMPPRocky mentioned this issue Aug 4, 2021

Move I/O and configuration out of runner::run #350

Merged

XAMPPRocky assigned rezvaneh Aug 5, 2021

markmandel mentioned this issue Aug 10, 2021

Create More Complete End-to-End Testing Framework #318

Open

rezvaneh mentioned this issue Aug 29, 2021

replace slog with tracing in Filter #385

Merged

XAMPPRocky mentioned this issue Dec 23, 2021

Completely remove slog and replace with tracing #457

Merged

markmandel closed this as completed in #457 Jan 11, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move from `slog` to `tracing` #317

Move from `slog` to `tracing` #317

XAMPPRocky commented Jul 7, 2021

markmandel commented Jul 19, 2021

XAMPPRocky commented Jul 26, 2021

markmandel commented Jul 26, 2021

XAMPPRocky commented Jul 27, 2021

markmandel commented Jul 29, 2021

Move from slog to tracing #317

Move from slog to tracing #317

Comments

XAMPPRocky commented Jul 7, 2021

markmandel commented Jul 19, 2021

XAMPPRocky commented Jul 26, 2021

markmandel commented Jul 26, 2021

XAMPPRocky commented Jul 27, 2021

markmandel commented Jul 29, 2021

Move from `slog` to `tracing` #317

Move from `slog` to `tracing` #317