-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add OpenTelemetry middleware #769
Conversation
0e66199
to
a89f24e
Compare
6718921
to
66d75dd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cursory review, don't think I wanna fully review the middleware implementation.
@davidpdrsn w.r.t. the tracing initialization, how would you feel about a PR that gets rid of |
If you wanna grind it out then fine by me 😅 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since merging is blocked on review, I'll approve. Or could you self-approve your PR too? 😅
I'm gonna ask around if someone familiar with opentelemetry wanna take a look as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do have some questions about this approach to hooking into OpenTelemetry, however. In particular, I'd want to see how propagation would be working with to an underlying service (How do we continue the trace out to further backend services?). It could be that I can't picture it quite the same vs. what I've observed for how the actix-web-opentelemetry
middleware is set up. This proposal tries to keep a half foot in both camps, by not being a pure OpenTelemetry implementation and trying to stuff the OpenTelemetry constructs through the tracing pipeline. That may not be the most effective way to accomplish that goal. Instead, it may be better to have a pure OpenTelemetry focused middleware, and if there is a desire for logging into tracing as well, that should be its own middleware.
let client_ip = req | ||
.extensions() | ||
.get::<ConnectInfo<SocketAddr>>() | ||
.map(|ConnectInfo(client_ip)| Cow::from(client_ip.to_string())) | ||
.unwrap_or_default(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now this would only duplicate the functionality of a lower layer providing net.peer.ip
, which could probably be done here or in a distinct transport-level layer. But, if this is just providing the network peer's IP, the spec indicates that client_ip
should not be provided. As deployed, this is most likely to just be reporting the load balancer's IP (if using one) rather than the IP of the client. If you were to instead be doing some inspection of the X-Forwarded-For
headers to try to find the IP of the "true client", then this field becomes useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point! I've added a check for x-forwarded-for
as well.
Honestly not sure if we should set net.peer.ip
as well 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is probably a good place to do net.peer.ip
and net.peer.port
if available. I think that adding another layer deeper down the stack would just be more complicated, as you would need to be this high up in the stack to know about the trace context to attach the attribute.
// assume there is no error, if there is `OtelOnFailure` will be called and override this | ||
span.record("otel.status_code", &"OK"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to the spec, the span status should only be set by the server when there was a 500 series error. Otherwise it should be left unset.
In general, instrumentation middleware should not set this value unless it is to report an error.
// assume there is no error, if there is `OtelOnFailure` will be called and override this | |
span.record("otel.status_code", &"OK"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That spec doesn't mention otel.status_code
. Where do you see it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The StatusCode
in an OpenTelemetry span is translated into otel.status_code
by the various SDKs for serialization on the wire. The semantics of Ok > Error > Unset
are the primary reason why the fields shouldn't be directly set, but there are some SDKs, such as .NET, which indicated directly setting it, but also recommend looking at the shim where a specific SetStatus
method is provided.
The other reason we shouldn't set this here is that this span is the current span when the handler for a request is called (presuming no intervening #[instrument]
calls). If the user has and reports their own error in the middle, that will appropriately trigger the Error
status code for the span. Having this here will, at the end of the request, override that status and mark the span as Ok
, which is why the spec calls this out as a MUST NOT.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it refers to this part of the spec,
And otel.status_code
is a tracing_openetelemetry
special attribute to set the Opentelemetry StatusCode
so it seems we should not be setting it manually to OK
🤔
match failure { | ||
ServerErrorsFailureClass::StatusCode(status) => { | ||
if status.is_server_error() { | ||
span.record("otel.status_code", &"ERROR"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that this is a bit of a backward way to do this. There is a specific call on the normal OpenTelemetry API to set the status code, which should be preferred. The tracing_opentelemetry
glue would need to expose a new function to allow setting this, because it is in there, just not exposed. In the absence of that, this is reasonable, but it won't properly enforce the semantics of Ok vs Error vs Unset that the API uses.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I don't understand what you mean. tracing_opentelemetry
's docs mentions that you can set these fields but it doesn't set them automatically and it doesn't have APIs to set them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great. I missed that in the documentation for the crate. Because of the specific OTel semantics for how "Ok overrides all Error overrides all Unset", it is probably preferrable to have this be exposed on their extension trait, but if this is how they say to do it, it just pushes the responsibility for ensuring those semantics are correctly implemented onto us.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The opentel Span trait has a set_status method.
I've used it previously through the context's SpanRef
.
@neoeinstein Thanks for the feedback. I really appreciate it! As mentioned on Discord, if possible, which it is, then I'd like to use the tracing stuff because thats what users are most likely to be familiar with and use already. If someone wants to write a middleware that uses the opentelemetry SDK directly, and not tracing, then they can do that. I don't see how that is keeping a foot in both camps. |
Also mixing them both will lead a parent hierarchy problems tokio-rs/tracing#1458 if otel used after tracing Opinionated part |
I do agree on relying on But that's a very opinionated take from me. |
I am already a heavy user of both tracing and OpenTelemetry, though my current use is primarily involving
I did pull down the PR here and play around with it. I did note that the recording of the I also discovered that the OpenTelemetry "context" is not actually being set, even if a I did try doing something like A potential story here is that outgoing propagation libraries would need to do a bit of a dance to get the current context. This seems easy enough to wrap in a let context = Some(tracing::Span::current().context())
.filter(|c| c.has_active_span())
.unwrap_or_else(opentelemetry::Context::current);
// Or, in a more imperative form
let context = if tracing::Span::current().context().has_active_span() {
opentelemetry::Context::current()
} else {
tracing::Span::current().context()
} Below is my code sample for async fn handler() -> &'static str {
use tracing_opentelemetry::OpenTelemetrySpanExt;
let context = opentelemetry::Context::current();
tracing::info!(
context.span = %context.span().span_context().span_id(),
context.trace = %context.span().span_context().trace_id(),
context.is_recording = %context.span().is_recording(),
"context pulled from opentelemetry::Context::current()"
);
let context = tracing::Span::current().context();
tracing::info!(
context.span = %context.span().span_context().span_id(),
context.trace = %context.span().span_context().trace_id(),
context.is_recording = %context.span().is_recording(),
"context pulled from tracing::Span::current().context()"
);
{
let guard = context.attach();
let context = opentelemetry::Context::current();
tracing::info!(
context.span = %context.span().span_context().span_id(),
context.trace = %context.span().span_context().trace_id(),
context.is_recording = %context.span().is_recording(),
"context pulled from opentelemetry::Context::current() after .attach()"
);
}
other().await;
"Hello, World!"
}
async fn other() {
tracing::info!("in other");
} And some screenshots of the relevant Jaeger output and console messages. |
.unwrap_or_default(); | ||
|
||
let span = tracing::info_span!( | ||
"HTTP request", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is going to form the "name" of the span. We do want it to be relatively low cardinality (as per the spec), but the generally accepted form on the server side is something like {http.method} {http.route}
, presuming route
had a matched path (so only if MatchedPath
came back as Some
). In the event we don't have a matched path, then this should just be something like the HTTP request
that you have here or HTTP {http.method} request
.
That keeps the cardinality in check but also makes the span more easilly identifiable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIK non-constant values cannot be used to name a tracing span. I've got around this on my tracing layer by using opentelemetry span builder instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can set the name with the format {http.method} {http.route}
to otel.name
, I tested it with the otel exporter going to jaeger afterwards and I get the otel.name for the root span instead of HTTP Request
.
otel.name
is a special attribute to override the span name when using the otel exporter which exists specifically for this use-case if I understand well: https://docs.rs/tracing-opentelemetry/latest/tracing_opentelemetry/
Any update on this PR? Looks exactly like what we need |
@pauljamescleary no news yet. Honestly got kinda overwhelmed with all the feedback. I do wanna finish it eventually though. If this is something you need urgently I'd recommend you copy the code into your codebase and make the necessary adjustments. |
@davidpdrsn thanks for the update :) |
- based on previosu work - based on [Add OpenTelemetry middleware by davidpdrsn · Pull Request #769 · tokio-rs/axum](tokio-rs/axum#769) (code and comment)
I copied the code of the middleware as part of exploration of axum + reqwest + d-tracing (with otlp + tempo or jaeger) at https://github.com/davidB/sandbox_axum_observability I included some of your suggestions (Thanks):
It's wip. If you're ok, I can contribute back to this PR, and collect feedback. |
I'm gonna close this for now. If someone wants to continue the work please do and post here to let people know. Its possible to implement this in a third party so no reason this must be built into axum. |
@davidpdrsn Are you ok, if I publish (as crate) my fork of your code (mainly based on the repo I shared in my previous message) ? Thanks for your work (sorry for double notification, I first send with my pro account) |
@davidB go for it! You're also welcome to add it to ECOSYSTEM.md. |
I published a first version of the crate as https://crates.io/crates/axum-tracing-opentelemetry |
This adds
axum_extra::middleware::opentelemtry_tracing_layer
which returns aTraceLayer
that will use opentelemetry's conventional span field names.Fixes #750