ADR for telemetry improvements #213

garypen · 2021-11-29T15:38:24Z

A template (copied from internet) and a draft ADR for telemetry.

Please use the github "discussion" (shared in slack) for framing discussion about these documents.

A template (copied from internet) and a draft ADR for telemetry.

adr/telemetry.md

BrynCooke · 2021-11-29T18:11:10Z

It would be worth referencing this: https://opentelemetry.lightstep.com/best-practices/using-attributes/
It helps to frame when span attributes should be used.

o0Ignition0o · 2021-11-30T08:41:31Z

#29

#28

will probably come in handy!

cecton · 2021-12-01T13:10:17Z

I was commenting that the decorator is kinda nice and we should add it to the recommendations: #210 (comment)

Also there is the question about where the trace should go in the code:

on the caller
on the source function

I personally prefer on the source function. I think this should also go into the recommendations of everybody agrees.

Geal · 2021-12-02T10:41:24Z

I'm adding some random notes that came up while working on #224 and others, maybe some of them will be useful to define how we should use tracing:

tracing's Filter functionality is not well integrated yet, I often run into issues with trait bounds because some the Filtered type may not be compatible with something else, or runtime panics like:

thread 'tokio-runtime-worker' panicked at 'tracing_subscriber::fmt::Subscriber<tracing_subscriber::fmt::format::DefaultFields, tracing_subscriber::fmt::format::Format, tracing_subscriber::filter::env::EnvFilter>
does not currently support filters', /home/geal/.cargo/registry/src/github.com-1ecc6299db9ec823/tracing-subscriber-0.3.2/src/registry/mod.rs:153:9

some clarification about vocabulary and usage: a Subscriber is tasked with creating spans and giving them an id, and passing those spans to a Layer that can then filter, edit or send them somewhere else (stdout, opentelemetry, etc). We can add a layer to a subscriber with the subscriber's with() method. With multiple with() calls we can have multiple layers see all the spans. Layers can be combined with the and_then() method. We can implement filtering and sampling that way. We can also have different "paths for spans, where they can be filtered in different ways depending on the output. Example usage: subscriber.with(LayerA).with(LayerB.and_then(LayerC)) gives the folowing tree:

subscriber
|\_ Layer A
\_ LayerC _ Layer B

tracing-opentelemetry's sampler makes the sampling decision at the span creation time, not when it is closed
be careful with dependency updates: different versions of tracing-opentelemetry or other crates can result in traces not being sent anymore

there, we have to explicitely set the parent span because apparently inside that call to stream::once(), we're in a different tree of spans:

router/apollo-router/src/apollo_router.rs

Lines 92 to 120 in 81720e8

    
           let span = Span::current(); 
        
           stream::once( 
        
               async move { 
        
                   let response_task = self 
        
                       .query_plan 
        
                       .node() 
        
                       .expect("we already ensured that the plan is some; qed") 
        
                       .execute( 
        
                           Arc::clone(&request), 
        
                           Arc::clone(&self.service_registry), 
        
                           Arc::clone(&self.schema), 
        
                       ) 
        
                       .instrument(tracing::info_span!(parent: &span, "execution")); 
        
                   let query_task = self 
        
                       .query_cache 
        
                       .get_query(&request.query) 
        
                       .instrument(tracing::info_span!(parent: &span, "query_parsing")); 
        
                   let (mut response, query) = tokio::join!(response_task, query_task); 
        
                   if let Some(query) = query { 
        
                       tracing::debug_span!(parent: &span, "format_response").in_scope(|| { 
        
                           query.format_response(&mut response, request.operation_name.as_deref()) 
        
                       }); 
        
                   } 
        
                   response 
        
               } 
        
               .with_current_subscriber(),

I do not understand yet why that is the case, we need a good explanation for that and a way to avoid the mistake. I fixed it in bbefbb6 but the time spent sending the response to the client is still not tracked

I've incorporated the review comments that I think belong in the ADR. Some of the comments not included maybe should be here as well...

garypen · 2021-12-03T10:09:49Z

It would be worth referencing this: https://opentelemetry.lightstep.com/best-practices/using-attributes/ It helps to frame when span attributes should be used.
Addressed in: 8a213ed

Geal · 2021-12-03T14:49:13Z

adr/telemetry.md

+ - Ensure that each instrumented function is named to promote understanding
+   and consistency
+
+ - Ensure that instrumentation is at the "info" level. Other levels, such


using instrument at debug or trace level should be fine, it's actually a good use of that attribute, because when debugging we may want to know which function is called. While info level spans are collected in production telemetry, where that level of detail is less interesting, and we actually want logic spans (like "query planning phase")

I think that's really the point I'm trying to make here, but perhaps not very clearly. I'm trying to indicate that most instrumentation should be considered as "production first" (i.e. info level). I want developers to think about the level and have a bias for thinking about how to report production useful telemetry.

Draft ADR for telemetry improvements

9def83b

A template (copied from internet) and a draft ADR for telemetry.

garypen requested review from Geal, BrynCooke, abernix, cecton and o0Ignition0o November 29, 2021 15:38

Geal reviewed Nov 29, 2021

View reviewed changes

adr/telemetry.md Show resolved Hide resolved

o0Ignition0o mentioned this pull request Dec 1, 2021

test trace sampling #193

Closed

Geal mentioned this pull request Dec 1, 2021

set OTLP service name and namespace #210

Merged

garypen added 2 commits December 3, 2021 09:45

Merge branch 'main' into adr-discussion

ebfb445

Add some review comments

8a213ed

I've incorporated the review comments that I think belong in the ADR. Some of the comments not included maybe should be here as well...

garypen changed the title ~~Draft ADR for telemetry improvements~~ ADR for telemetry improvements Dec 3, 2021

garypen marked this pull request as ready for review December 3, 2021 10:11

Geal reviewed Dec 3, 2021

View reviewed changes

Merge branch 'main' into adr-discussion

b5cda9b

o0Ignition0o approved these changes Dec 8, 2021

View reviewed changes

garypen merged commit 82561ea into main Dec 8, 2021

garypen deleted the adr-discussion branch December 8, 2021 11:16

abernix mentioned this pull request Mar 22, 2022

Tracing: write down what we have learned when investigating #28

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ADR for telemetry improvements #213

ADR for telemetry improvements #213

Uh oh!

garypen commented Nov 29, 2021 •

edited

Loading

Uh oh!

Uh oh!

BrynCooke commented Nov 29, 2021

Uh oh!

o0Ignition0o commented Nov 30, 2021

Uh oh!

cecton commented Dec 1, 2021

Uh oh!

Geal commented Dec 2, 2021

Uh oh!

garypen commented Dec 3, 2021

Uh oh!

Geal Dec 3, 2021

Uh oh!

garypen Dec 6, 2021

Uh oh!

Uh oh!

ADR for telemetry improvements #213

ADR for telemetry improvements #213

Uh oh!

Conversation

garypen commented Nov 29, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

BrynCooke commented Nov 29, 2021

Uh oh!

o0Ignition0o commented Nov 30, 2021

Uh oh!

cecton commented Dec 1, 2021

Uh oh!

Geal commented Dec 2, 2021

Uh oh!

garypen commented Dec 3, 2021

Uh oh!

Geal Dec 3, 2021

Choose a reason for hiding this comment

Uh oh!

garypen Dec 6, 2021

Choose a reason for hiding this comment

Uh oh!

Uh oh!

garypen commented Nov 29, 2021 •

edited

Loading