Add OpenTelemetry middleware #769

davidpdrsn · 2022-02-18T16:46:24Z

This adds axum_extra::middleware::opentelemtry_tracing_layer which returns a TraceLayer that will use opentelemetry's conventional span field names.

Fixes #750

axum-extra/src/middleware/opentelemetry.rs

jplatte

Cursory review, don't think I wanna fully review the middleware implementation.

axum-extra/src/middleware/opentelemetry.rs

examples/opentelemetry-jaeger/src/main.rs

jplatte · 2022-02-22T09:52:07Z

@davidpdrsn w.r.t. the tracing initialization, how would you feel about a PR that gets rid of set_var for the whole repo at once?

davidpdrsn · 2022-02-22T10:35:03Z

@davidpdrsn w.r.t. the tracing initialization, how would you feel about a PR that gets rid of set_var for the whole repo at once?

If you wanna grind it out then fine by me 😅

jplatte

Since merging is blocked on review, I'll approve. Or could you self-approve your PR too? 😅

davidpdrsn · 2022-02-22T12:31:31Z

I'm gonna ask around if someone familiar with opentelemetry wanna take a look as well.

axum-extra/src/middleware/opentelemetry.rs

neoeinstein

I do have some questions about this approach to hooking into OpenTelemetry, however. In particular, I'd want to see how propagation would be working with to an underlying service (How do we continue the trace out to further backend services?). It could be that I can't picture it quite the same vs. what I've observed for how the actix-web-opentelemetry middleware is set up. This proposal tries to keep a half foot in both camps, by not being a pure OpenTelemetry implementation and trying to stuff the OpenTelemetry constructs through the tracing pipeline. That may not be the most effective way to accomplish that goal. Instead, it may be better to have a pure OpenTelemetry focused middleware, and if there is a desire for logging into tracing as well, that should be its own middleware.

neoeinstein · 2022-02-23T03:17:04Z

axum-extra/src/middleware/opentelemetry.rs

+        let client_ip = req
+            .extensions()
+            .get::<ConnectInfo<SocketAddr>>()
+            .map(|ConnectInfo(client_ip)| Cow::from(client_ip.to_string()))
+            .unwrap_or_default();


Right now this would only duplicate the functionality of a lower layer providing net.peer.ip, which could probably be done here or in a distinct transport-level layer. But, if this is just providing the network peer's IP, the spec indicates that client_ip should not be provided. As deployed, this is most likely to just be reporting the load balancer's IP (if using one) rather than the IP of the client. If you were to instead be doing some inspection of the X-Forwarded-For headers to try to find the IP of the "true client", then this field becomes useful.

Good point! I've added a check for x-forwarded-for as well.

Honestly not sure if we should set net.peer.ip as well 🤔

Here is probably a good place to do net.peer.ip and net.peer.port if available. I think that adding another layer deeper down the stack would just be more complicated, as you would need to be this high up in the stack to know about the trace context to attach the attribute.

axum-extra/src/middleware/opentelemetry.rs

neoeinstein · 2022-02-23T03:29:34Z

axum-extra/src/middleware/opentelemetry.rs

+        // assume there is no error, if there is `OtelOnFailure` will be called and override this
+        span.record("otel.status_code", &"OK");


According to the spec, the span status should only be set by the server when there was a 500 series error. Otherwise it should be left unset.

In general, instrumentation middleware should not set this value unless it is to report an error.

Suggested change

// assume there is no error, if there is `OtelOnFailure` will be called and override this

span.record("otel.status_code", &"OK");

That spec doesn't mention otel.status_code. Where do you see it?

The StatusCode in an OpenTelemetry span is translated into otel.status_code by the various SDKs for serialization on the wire. The semantics of Ok > Error > Unset are the primary reason why the fields shouldn't be directly set, but there are some SDKs, such as .NET, which indicated directly setting it, but also recommend looking at the shim where a specific SetStatus method is provided.

The other reason we shouldn't set this here is that this span is the current span when the handler for a request is called (presuming no intervening #[instrument] calls). If the user has and reports their own error in the middle, that will appropriately trigger the Error status code for the span. Having this here will, at the end of the request, override that status and mark the span as Ok, which is why the spec calls this out as a MUST NOT.

I guess it refers to this part of the spec,

And otel.status_code is a tracing_openetelemetry special attribute to set the Opentelemetry StatusCode so it seems we should not be setting it manually to OK 🤔

neoeinstein · 2022-02-23T03:56:39Z

axum-extra/src/middleware/opentelemetry.rs

+        match failure {
+            ServerErrorsFailureClass::StatusCode(status) => {
+                if status.is_server_error() {
+                    span.record("otel.status_code", &"ERROR");


It seems that this is a bit of a backward way to do this. There is a specific call on the normal OpenTelemetry API to set the status code, which should be preferred. The tracing_opentelemetry glue would need to expose a new function to allow setting this, because it is in there, just not exposed. In the absence of that, this is reasonable, but it won't properly enforce the semantics of Ok vs Error vs Unset that the API uses.

Not sure I don't understand what you mean. tracing_opentelemetry's docs mentions that you can set these fields but it doesn't set them automatically and it doesn't have APIs to set them.

Great. I missed that in the documentation for the crate. Because of the specific OTel semantics for how "Ok overrides all Error overrides all Unset", it is probably preferrable to have this be exposed on their extension trait, but if this is how they say to do it, it just pushes the responsibility for ensuring those semantics are correctly implemented onto us.

The opentel Span trait has a set_status method.
I've used it previously through the context's SpanRef.

davidpdrsn · 2022-02-23T08:40:41Z

@neoeinstein Thanks for the feedback. I really appreciate it!

As mentioned on Discord, if possible, which it is, then I'd like to use the tracing stuff because thats what users are most likely to be familiar with and use already. If someone wants to write a middleware that uses the opentelemetry SDK directly, and not tracing, then they can do that.

I don't see how that is keeping a foot in both camps. tracing_opentelemetry is a library that exists to bridge the gap, in fact it is maintained by the same people that maintain the opentelemetry crate, so why not use it?

sergeyshaykhullin · 2022-02-23T09:11:21Z

Also mixing them both will lead a parent hierarchy problems tokio-rs/tracing#1458 if otel used after tracing

Opinionated part
I prefer using tracing because of brilliant instrument macro not polluting fn scopes

floriantz · 2022-02-23T11:00:37Z

I do agree on relying on tracing as well. It makes adding opentelemetry on an codebase already instrumented with tracing really simple. And I like not being entirely dependant on otel as we might want other kinds of exporters as well.

But that's a very opinionated take from me.

neoeinstein · 2022-02-23T17:19:55Z

I am already a heavy user of both tracing and OpenTelemetry, though my current use is primarily involving actix-web-opentelemetry, and am also a big fan of being able to drop #[instrument] wherever I need it. My opinions are thus based on my current use of the technology in production.

actin-web-opentelemetry doesn't interact with the tracing world directly. Instead, it has one concern: the propagation of the tracing context. In my case, I will generally then throw a #[instrument] on the handler function immediately after. The tracing-opentelemetry subscriber then receives those tracing spans and passes them to OpenTelemetry to place in context within the propagated trace.

I did pull down the PR here and play around with it. I did note that the recording of the trace_id in info_span! reports nothing if there was no incoming trace, even though a new trace root and id are getting generated by the OTel tracer. However when present, the trace_id from the traceparent header does get propagated into the tracing spans.

I also discovered that the OpenTelemetry "context" is not actually being set, even if a traceparent was given. This means that a later propagator calling Context::current() is not going to see the current span from tracing (I get an empty context in my testing) and instead generate a new root trace and span, unconnected to the inbound request. At a minimum it would be desirable to have the outgoing requests show as part of the same trace as is recorded for the incoming trace.

I did try doing something like tracing::Span::current().context().attach(), but that just creates a !Send context guard that reverts back once the guard is dropped. It also pollutes handler, making it !Send if attempting to hold the guard across an await (in the code below, move other().await into the guarded block).

A potential story here is that outgoing propagation libraries would need to do a bit of a dance to get the current context. This seems easy enough to wrap in a #[cfg(feature = "tracing)] if tracing isn't being used, but does mean that we introduce this concern into the ecosystem.

let context = Some(tracing::Span::current().context())
    .filter(|c| c.has_active_span())
    .unwrap_or_else(opentelemetry::Context::current);

// Or, in a more imperative form
let context = if tracing::Span::current().context().has_active_span() {
    opentelemetry::Context::current()
} else {
    tracing::Span::current().context()
}

Below is my code sample for handler in the example file:

async fn handler() -> &'static str {
    use tracing_opentelemetry::OpenTelemetrySpanExt;
    let context = opentelemetry::Context::current();
    tracing::info!(
        context.span = %context.span().span_context().span_id(),
        context.trace = %context.span().span_context().trace_id(),
        context.is_recording = %context.span().is_recording(),
        "context pulled from opentelemetry::Context::current()"
    );
    let context = tracing::Span::current().context();
    tracing::info!(
        context.span = %context.span().span_context().span_id(),
        context.trace = %context.span().span_context().trace_id(),
        context.is_recording = %context.span().is_recording(),
        "context pulled from tracing::Span::current().context()"
    );
    {
        let guard = context.attach();
        let context = opentelemetry::Context::current();
        tracing::info!(
            context.span = %context.span().span_context().span_id(),
            context.trace = %context.span().span_context().trace_id(),
            context.is_recording = %context.span().is_recording(),
            "context pulled from opentelemetry::Context::current() after .attach()"
        );
    }
    other().await;
    "Hello, World!"
}

async fn other() {
    tracing::info!("in other");
}

And some screenshots of the relevant Jaeger output and console messages.

neoeinstein · 2022-02-23T17:27:03Z

axum-extra/src/middleware/opentelemetry.rs

+            .unwrap_or_default();
+
+        let span = tracing::info_span!(
+            "HTTP request",


This is going to form the "name" of the span. We do want it to be relatively low cardinality (as per the spec), but the generally accepted form on the server side is something like {http.method} {http.route}, presuming route had a matched path (so only if MatchedPath came back as Some). In the event we don't have a matched path, then this should just be something like the HTTP request that you have here or HTTP {http.method} request.

That keeps the cardinality in check but also makes the span more easilly identifiable.

AFAIK non-constant values cannot be used to name a tracing span. I've got around this on my tracing layer by using opentelemetry span builder instead.

You can set the name with the format {http.method} {http.route} to otel.name, I tested it with the otel exporter going to jaeger afterwards and I get the otel.name for the root span instead of HTTP Request.

otel.name is a special attribute to override the span name when using the otel exporter which exists specifically for this use-case if I understand well: https://docs.rs/tracing-opentelemetry/latest/tracing_opentelemetry/

pauljamescleary · 2022-04-21T17:48:15Z

Any update on this PR? Looks exactly like what we need

davidpdrsn · 2022-04-21T18:05:05Z

@pauljamescleary no news yet. Honestly got kinda overwhelmed with all the feedback. I do wanna finish it eventually though.

If this is something you need urgently I'd recommend you copy the code into your codebase and make the necessary adjustments.

pauljamescleary · 2022-04-24T14:01:41Z

@davidpdrsn thanks for the update :)

- based on previosu work - based on [Add OpenTelemetry middleware by davidpdrsn · Pull Request #769 · tokio-rs/axum](tokio-rs/axum#769) (code and comment)

davidB · 2022-05-26T19:02:11Z

I copied the code of the middleware as part of exploration of axum + reqwest + d-tracing (with otlp + tempo or jaeger) at https://github.com/davidB/sandbox_axum_observability I included some of your suggestions (Thanks):

to have a name with the route using otel.name
to have a trace_id before creating the tracing span and then having a non-empty trace_id into log
to retrieve trace_id in main code (by example to provide trace_id into the response for error, debug,...)

It's wip. If you're ok, I can contribute back to this PR, and collect feedback.
https://github.com/davidB/sandbox_axum_observability/blob/development/app/src/middleware/opentelemetry.rs

davidpdrsn · 2022-06-06T11:44:50Z

I'm gonna close this for now. If someone wants to continue the work please do and post here to let people know. Its possible to implement this in a third party so no reason this must be built into axum.

davidB · 2022-06-06T11:56:04Z

@davidpdrsn Are you ok, if I publish (as crate) my fork of your code (mainly based on the repo I shared in my previous message) ?

Thanks for your work

(sorry for double notification, I first send with my pro account)

davidpdrsn · 2022-06-06T12:29:01Z

@davidB go for it! You're also welcome to add it to ECOSYSTEM.md.

davidB · 2022-06-11T18:41:47Z

I published a first version of the crate as https://crates.io/crates/axum-tracing-opentelemetry

wip

a89f24e

davidpdrsn force-pushed the opentelemetry-middleware branch from 0e66199 to a89f24e Compare February 18, 2022 16:47

docs and jaeger example

66d75dd

davidpdrsn force-pushed the opentelemetry-middleware branch from 6718921 to 66d75dd Compare February 19, 2022 00:43

davidpdrsn added 4 commits February 19, 2022 01:47

Add missing cfg feature

387ea24

changelog

f818aff

remove dependency on opentelemetry-http

8d128da

clean up

1857a3c

davidpdrsn changed the title ~~WIP: OpenTelemetry middleware~~ Add OpenTelemetry middleware Feb 21, 2022

davidpdrsn marked this pull request as ready for review February 21, 2022 08:44

davidpdrsn requested a review from jplatte February 21, 2022 08:44

floriantz reviewed Feb 21, 2022

View reviewed changes

axum-extra/src/middleware/opentelemetry.rs Outdated Show resolved Hide resolved

jplatte reviewed Feb 21, 2022

View reviewed changes

axum-extra/src/middleware/opentelemetry.rs Show resolved Hide resolved

examples/opentelemetry-jaeger/src/main.rs Show resolved Hide resolved

fix typo

db2486b

Merge branch 'main' into opentelemetry-middleware

7649cfc

jplatte approved these changes Feb 22, 2022

View reviewed changes

sergeyshaykhullin reviewed Feb 22, 2022

View reviewed changes

axum-extra/src/middleware/opentelemetry.rs Show resolved Hide resolved

axum-extra/src/middleware/opentelemetry.rs Outdated Show resolved Hide resolved

neoeinstein reviewed Feb 23, 2022

View reviewed changes

davidpdrsn added 2 commits February 23, 2022 09:23

Don't set request id on span

09eac7a

Also look for x-forwarded-for

ea28592

neoeinstein reviewed Feb 23, 2022

View reviewed changes

Merge branch 'main' into opentelemetry-middleware

0157586

davidpdrsn added the A-axum-extra label Mar 1, 2022

davidpdrsn closed this Jun 6, 2022

davidpdrsn deleted the opentelemetry-middleware branch June 6, 2022 11:44

davidpdrsn mentioned this pull request Jun 6, 2022

OpenTelemetry compatible tracing middleware #750

Closed

tasn mentioned this pull request Jun 9, 2022

Start using the OpenTelemetry crate for telemtry svix/svix-webhooks#344

Closed

jayvdb mentioned this pull request Aug 24, 2022

axum opentelemetry flosse/rust-web-framework-comparison#148

Closed

Techassi mentioned this pull request Mar 6, 2024

Add full OpenTelemetry compatible request tracing for webhooks stackabletech/issues#531

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add OpenTelemetry middleware #769

Add OpenTelemetry middleware #769

davidpdrsn commented Feb 18, 2022 •

edited

Loading

jplatte left a comment

jplatte commented Feb 22, 2022

davidpdrsn commented Feb 22, 2022

jplatte left a comment

davidpdrsn commented Feb 22, 2022

neoeinstein left a comment

neoeinstein Feb 23, 2022

davidpdrsn Feb 23, 2022 •

edited

Loading

neoeinstein Feb 23, 2022

neoeinstein Feb 23, 2022

davidpdrsn Feb 23, 2022

neoeinstein Feb 23, 2022 •

edited

Loading

floriantz Mar 5, 2022

neoeinstein Feb 23, 2022

davidpdrsn Feb 23, 2022

neoeinstein Feb 23, 2022

tiagolobocastro Feb 23, 2022

davidpdrsn commented Feb 23, 2022

sergeyshaykhullin commented Feb 23, 2022

floriantz commented Feb 23, 2022

neoeinstein commented Feb 23, 2022

neoeinstein Feb 23, 2022 •

edited

Loading

tiagolobocastro Feb 23, 2022 •

edited

Loading

floriantz Feb 28, 2022 •

edited

Loading

pauljamescleary commented Apr 21, 2022

davidpdrsn commented Apr 21, 2022

pauljamescleary commented Apr 24, 2022

davidB commented May 26, 2022 •

edited

Loading

davidpdrsn commented Jun 6, 2022

davidB commented Jun 6, 2022

davidpdrsn commented Jun 6, 2022

davidB commented Jun 11, 2022

		// assume there is no error, if there is `OtelOnFailure` will be called and override this
		span.record("otel.status_code", &"OK");

Add OpenTelemetry middleware #769

Add OpenTelemetry middleware #769

Conversation

davidpdrsn commented Feb 18, 2022 • edited Loading

jplatte left a comment

Choose a reason for hiding this comment

jplatte commented Feb 22, 2022

davidpdrsn commented Feb 22, 2022

jplatte left a comment

Choose a reason for hiding this comment

davidpdrsn commented Feb 22, 2022

neoeinstein left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

davidpdrsn Feb 23, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

neoeinstein Feb 23, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

davidpdrsn commented Feb 23, 2022

sergeyshaykhullin commented Feb 23, 2022

floriantz commented Feb 23, 2022

neoeinstein commented Feb 23, 2022

neoeinstein Feb 23, 2022 • edited Loading

Choose a reason for hiding this comment

tiagolobocastro Feb 23, 2022 • edited Loading

Choose a reason for hiding this comment

floriantz Feb 28, 2022 • edited Loading

Choose a reason for hiding this comment

pauljamescleary commented Apr 21, 2022

davidpdrsn commented Apr 21, 2022

pauljamescleary commented Apr 24, 2022

davidB commented May 26, 2022 • edited Loading

davidpdrsn commented Jun 6, 2022

davidB commented Jun 6, 2022

davidpdrsn commented Jun 6, 2022

davidB commented Jun 11, 2022

davidpdrsn commented Feb 18, 2022 •

edited

Loading

davidpdrsn Feb 23, 2022 •

edited

Loading

neoeinstein Feb 23, 2022 •

edited

Loading

neoeinstein Feb 23, 2022 •

edited

Loading

tiagolobocastro Feb 23, 2022 •

edited

Loading

floriantz Feb 28, 2022 •

edited

Loading

davidB commented May 26, 2022 •

edited

Loading