Skip to content

ADR for telemetry improvements #213

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Dec 8, 2021
Merged

ADR for telemetry improvements #213

merged 4 commits into from
Dec 8, 2021

Conversation

garypen
Copy link
Contributor

@garypen garypen commented Nov 29, 2021

A template (copied from internet) and a draft ADR for telemetry.

Please use the github "discussion" (shared in slack) for framing discussion about these documents.

A template (copied from internet) and a draft ADR for telemetry.
@BrynCooke
Copy link
Contributor

It would be worth referencing this: https://opentelemetry.lightstep.com/best-practices/using-attributes/
It helps to frame when span attributes should be used.

@o0Ignition0o
Copy link
Contributor

#29

#28

will probably come in handy!

@cecton
Copy link
Contributor

cecton commented Dec 1, 2021

I was commenting that the decorator is kinda nice and we should add it to the recommendations: #210 (comment)

Also there is the question about where the trace should go in the code:

  • on the caller
  • on the source function

I personally prefer on the source function. I think this should also go into the recommendations of everybody agrees.

@Geal
Copy link
Contributor

Geal commented Dec 2, 2021

I'm adding some random notes that came up while working on #224 and others, maybe some of them will be useful to define how we should use tracing:

  • tracing's Filter functionality is not well integrated yet, I often run into issues with trait bounds because some the Filtered type may not be compatible with something else, or runtime panics like:
thread 'tokio-runtime-worker' panicked at 'tracing_subscriber::fmt::Subscriber<tracing_subscriber::fmt::format::DefaultFields, tracing_subscriber::fmt::format::Format, tracing_subscriber::filter::env::EnvFilter>
does not currently support filters', /home/geal/.cargo/registry/src/github.com-1ecc6299db9ec823/tracing-subscriber-0.3.2/src/registry/mod.rs:153:9
  • some clarification about vocabulary and usage: a Subscriber is tasked with creating spans and giving them an id, and passing those spans to a Layer that can then filter, edit or send them somewhere else (stdout, opentelemetry, etc). We can add a layer to a subscriber with the subscriber's with() method. With multiple with() calls we can have multiple layers see all the spans. Layers can be combined with the and_then() method. We can implement filtering and sampling that way. We can also have different "paths for spans, where they can be filtered in different ways depending on the output. Example usage: subscriber.with(LayerA).with(LayerB.and_then(LayerC)) gives the folowing tree:
subscriber
|\_ Layer A
\_ LayerC _ Layer B
  • tracing-opentelemetry's sampler makes the sampling decision at the span creation time, not when it is closed
  • be careful with dependency updates: different versions of tracing-opentelemetry or other crates can result in traces not being sent anymore
  • there, we have to explicitely set the parent span because apparently inside that call to stream::once(), we're in a different tree of spans:
    let span = Span::current();
    stream::once(
    async move {
    let response_task = self
    .query_plan
    .node()
    .expect("we already ensured that the plan is some; qed")
    .execute(
    Arc::clone(&request),
    Arc::clone(&self.service_registry),
    Arc::clone(&self.schema),
    )
    .instrument(tracing::info_span!(parent: &span, "execution"));
    let query_task = self
    .query_cache
    .get_query(&request.query)
    .instrument(tracing::info_span!(parent: &span, "query_parsing"));
    let (mut response, query) = tokio::join!(response_task, query_task);
    if let Some(query) = query {
    tracing::debug_span!(parent: &span, "format_response").in_scope(|| {
    query.format_response(&mut response, request.operation_name.as_deref())
    });
    }
    response
    }
    .with_current_subscriber(),

    I do not understand yet why that is the case, we need a good explanation for that and a way to avoid the mistake. I fixed it in bbefbb6 but the time spent sending the response to the client is still not tracked

I've incorporated the review comments that I think belong in the ADR.
Some of the comments not included maybe should be here as well...
@garypen
Copy link
Contributor Author

garypen commented Dec 3, 2021

It would be worth referencing this: https://opentelemetry.lightstep.com/best-practices/using-attributes/ It helps to frame when span attributes should be used.
Addressed in: 8a213ed

@garypen garypen changed the title Draft ADR for telemetry improvements ADR for telemetry improvements Dec 3, 2021
@garypen garypen marked this pull request as ready for review December 3, 2021 10:11
- Ensure that each instrumented function is named to promote understanding
and consistency

- Ensure that instrumentation is at the "info" level. Other levels, such
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using instrument at debug or trace level should be fine, it's actually a good use of that attribute, because when debugging we may want to know which function is called. While info level spans are collected in production telemetry, where that level of detail is less interesting, and we actually want logic spans (like "query planning phase")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's really the point I'm trying to make here, but perhaps not very clearly. I'm trying to indicate that most instrumentation should be considered as "production first" (i.e. info level). I want developers to think about the level and have a bias for thinking about how to report production useful telemetry.

@garypen garypen merged commit 82561ea into main Dec 8, 2021
@garypen garypen deleted the adr-discussion branch December 8, 2021 11:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants