-
Notifications
You must be signed in to change notification settings - Fork 588
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reworked Distributed Tracing Extension #607
Conversation
Signed-off-by: Francesco Guardiani <francescoguard@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not clear what the motivation is for these changes or how they would make the extension more useful. I also don't think we should adopt such a large change to this extension's specification especially since the proposed change appears to severely limit the usefulness of this extension as a trace propagation mechanism.
|
||
To integrate with existing tracing libraries, the Distributed Tracing attributes | ||
MUST be encoded over HTTP(S) as headers. E.g. | ||
The Distributed Tracing Extension is not intended to replace the protocol specific headers for tracing, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's more accurate to say that it is intended to function like the HTTP tracecontext headers in non-HTTP protocols or for event storage.
Given a multi hop event transmission, the Distributed Tracing Extension, if used, MUST | ||
carry the trace information of the starting trace of the transmission. | ||
In other words, It MUST NOT carry trace information of the actual hop, since these information are usually | ||
carried using protocol specific defined headers, understood by tools like [OpenTelemetry](https://opentelemetry.io/). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Many protocols do not define a trace propagation mechanism, hence the need for this extension as the eventing trace propagation standard. If this extension is changed to specify that it should not track the actual trace context, then we would need a new protocol agnostic standard for trace propagation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure cloudevents is the right place where this standardization should happen
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case where do you propose that this happens?
Signed-off-by: Francesco Guardiani <francescoguard@gmail.com>
Hey @adriancole, perhaps you can provide some input here? |
|
||
To integrate with existing tracing libraries, the Distributed Tracing attributes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My 2p and take it for what its worth. I don't see the update of this section bringing clarity the linked specification doesn't already cover. Maybe just underscore that the headers themselves are only the context of the trace.
Other event attributes may be helpful to add to trace data (or even some as log correlations or metrics dimensions), but specifically how to do log correlation, metrics or tracing is out of scope for this spec.
I'm aware that mentioning Zipkin may not be ok here, but if you did there's a lot of existing art with messaging tracing. Your call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tracing is out of scope for this spec.
That's what we're trying to establish here 😄 The reason of this PR is exactly because somebody thought that the goal of this spec extension is to enable tracing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. If the goal is to enable tracing, it seems limiting to only discuss the tracecontext spec. many years of tools, thousands of users are familiar with B3. You might consider linking to this also, if adoption is a concern. https://github.com/openzipkin/b3-propagation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is another reason why i feel we should not touch the distributed tracing argument at all, otherwise we have a clear overlap with what openzipkin/opentelemetry/w3c is doing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that specifically how to do log correlation, metrics, and tracing are out of scope of this extension. That does not imply, however, that trace context propagation is out of scope. Since this extension is modeled after the W3C trace propagation format it seems that trace context propagation must be in-scope here. Nor do I think that adding a trace propagation mechanism for events conflicts with those other projects especially given that they are unlikely to define trace propagation standards either for cloudevents or for all possible eventing protocols.
@bsideup sure, hope it helped. |
Signed-off-by: Francesco Guardiani <francescoguard@gmail.com>
A couple of rambling thoughts/comments:
In my chats with people, I've heard pros and cons for this extension. However, it is JUST an extension, which means we actually don't need to all agree. And, nor is there any requirement for any SDK to support it, or any other extension. From that alone, it seems like it's ok for it to stick-around since at least some people do seem to find it useful. However, it is concerning that we seem to be revisiting this so often. What I'd like to propose is this:
Thoughts? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- the tracing info feels like something outside of the CE domain as it's not really about the event as much as it's about the transport path it took. And as such, feels like an incorrect bit of metadata for CE to deal with
The CE spec is not just a data format, it also deals with transmission and protocols. Therefore I don't think it's necessarily out of bounds for an extension to handle metadata related to event transmission. Such extensions have already proved to be useful for other purposes in knative such as for TTL and cycle detection.
- if this information does not contain all of the tracing info (from entire entire transport path) then what is it's value? And this comes down to whether or not there is value in knowing just the tracing info from the initial sender.
I agree that there seems to be little purpose in storing an expressly immutable trace context.
In my chats with people, I've heard pros and cons for this extension. However, it is JUST an extension, which means we actually don't need to all agree. And, nor is there any requirement for any SDK to support it, or any other extension.
I agree that one possible solution would be to be less prescriptive about how the extension is used. This PR does not do that, however. It instead adds additional restrictions that prevent this extension from being a useful mechanism for trace propagation.
From that alone, it seems like it's ok for it to stick-around since at least some people do seem to find it useful. However, it is concerning that we seem to be revisiting this so often. What I'd like to propose is this:
- if there are no concerns with the PR itself (ignoring the confusion around the purpose of the PR), and the PR makes the text better than it was before, then I think we should merge it
I do have concerns beyond the purpose of the PR. The requirements that would be added around transmission make the extension less useful than before as they would prevent the use of this extension for trace propagation.
- ask for people to come forward with interest or intent to support this extension. I'd like to see if we can get more than "this sounds interesting" type of feedback. If no one has any intention of actually supporting it then we could consider dropping it via some deprecation/warning period. Actually, we may want to define a process in which all extensions go thru some periodic review to ensure we remove stale ones.
Thoughts?
I think there is clearly value in an extension which can provide trace propagation across event transmission and I have yet to see a compelling use-case presented for an immutable trace context which could not be used for trace propagation. I think this discussion would be more fruitful if such a use-case was brought forward. In the meantime I don't see why we should replace an extension with a clear use case (and one that is already being used in knative eventing) with an extension that seems to be strictly less useful.
Given a multi hop event transmission, the Distributed Tracing Extension, if used, MUST | ||
carry the trace information of the starting trace of the transmission. | ||
In other words, It MUST NOT carry trace information of the actual hop, since these information are usually | ||
carried using protocol specific defined headers, understood by tools like [OpenTelemetry](https://opentelemetry.io/). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case where do you propose that this happens?
|
||
To integrate with existing tracing libraries, the Distributed Tracing attributes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that specifically how to do log correlation, metrics, and tracing are out of scope of this extension. That does not imply, however, that trace context propagation is out of scope. Since this extension is modeled after the W3C trace propagation format it seems that trace context propagation must be in-scope here. Nor do I think that adding a trace propagation mechanism for events conflicts with those other projects especially given that they are unlikely to define trace propagation standards either for cloudevents or for all possible eventing protocols.
It really only deals with transmission and protocol enough to show how CE metadata appears in certain protocols. We specifically tried to avoid transport level metadata as CE properties. See the discussions of "destination" in the primer, e.g. https://github.com/cloudevents/spec/blob/master/primer.md#cloudevents-concepts That's not to say some of those bits of metadata aren't useful - but they're not in-scope for CE right now.
which requirements are you referring to? I kind of feel like there are two conversations going on here: I'd prefer to let @slinkydeveloper developer off the hook and let him do just 1 since that's all he signed up for. If someone wants to propose 2 then I think that's a different PR. |
I like the sound of that: we add an extension, we check the adoption after a while and then we understand if we want to keep it or not
I see your concerns @ian-mi but I feel that, if we don't clarify the scope of this extension, we still can't use it for our production use case, because we'll still have unanswered questions:
The whole goal of this PR is to clearly answers these questions, following the original intentions of the extension. |
I'm one of the few people implementing tracestate, literally am right now actually, but you can ask advice from instana who notably also implement this dance. In general, you need to be able to identify your own entry in tracestate. Once you can assert that you ignore the traceparent mostly as it is not helpful. I think they prefer you to re-use the same trace ID, but the spec is squishy. The other things to consider are just "pass through" meaning you apply zero semantic value to anything. This is what jaeger and elastic do. |
@adriancole are you implementing this CE extension? Also, without getting into whether or not the CE attribute is useful, does this PR help clarify what its expect value is and how it can be used? Or at least not make it worse? https://github.com/cloudevents/spec/pull/607/files#r418372301 seems to indicate that it doesn't make it any worse :-) |
@duglin my only commentary is about the usefulness of the "traceparent" field in the specification that this current work is bound to. I implement trace propagation for dozens of things including messaging rpc you name it. The implementation will be limited to the semantics available in the spec. In other words, by joining this directly to the trace-context spec, it is probably best to either punt to that spec or decide some application here. If you do decide to apply it here, I'd suggest looking at the code I mentioned from instana, but this is just a suggestion. I feel like some of this chat is cyclic as I'm really not sure who is implementing anything and it is odd to choose a concrete header spec without experience with it first. How you navigate this is up to you. I've only put commentary here as someone asked me to. For now, I'll stop giving advice and let y'all just decide whatever it is you are going to do. |
Approved on 5/7 call |
* Reworked Distributed Tracing Extension Signed-off-by: Francesco Guardiani <francescoguard@gmail.com> * suggestions Signed-off-by: Francesco Guardiani <francescoguard@gmail.com> * suggestions Signed-off-by: Francesco Guardiani <francescoguard@gmail.com> * Grammar Signed-off-by: Francesco Guardiani <francescoguard@gmail.com>
Fixes #603
Signed-off-by: Francesco Guardiani francescoguard@gmail.com