Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Supporting non-random trace IDs #896

Closed
anuraaga opened this issue Aug 28, 2020 · 13 comments · Fixed by #1006
Closed

Supporting non-random trace IDs #896

anuraaga opened this issue Aug 28, 2020 · 13 comments · Fixed by #1006
Labels
area:sdk Related to the SDK priority:p3 Lowest priority level release:allowed-for-ga Editorial changes that can still be added before GA since they don't require action by SIGs spec:trace Related to the specification/trace directory

Comments

@anuraaga
Copy link
Contributor

anuraaga commented Aug 28, 2020

Currently many SDKs provide an interface for configuring the way IDs are generated. This is important for X-Ray where all trace IDs have to start with 4 bytes representing the epoch seconds (otherwise, the total number of bytes is the same, it's only a semantic restriction on the first 4 bytes). Being able to ingest arbitrary IDs is a nice thing we'd like to support someday but, the backend relies on this heavily in its indexing scheme currently and it would take a long time, if ever.

https://github.com/open-telemetry/opentelemetry-java/blob/master/sdk/tracing/src/main/java/io/opentelemetry/sdk/trace/IdsGenerator.java
https://github.com/open-telemetry/opentelemetry-go/blob/master/sdk/trace/id_generator.go
https://github.com/open-telemetry/opentelemetry-js/blob/master/packages/opentelemetry-core/src/trace/IdGenerator.ts

Python currently doesn't but it is generated by OpenTelemetry and extracting an interface is simple (actually the JS interface was extracting by a team member to allow JS to work with X-Ray)

https://github.com/open-telemetry/opentelemetry-python/blob/9b41a57b2e8a91ae38ba6f268b90944adcbebd15/opentelemetry-sdk/src/opentelemetry/sdk/trace/__init__.py#L666

We've found OpenTelemetry .NET does not provide an interface and actually uses ID generation logic from the .NET runtime itself. It defines ID formats as an enum

https://docs.microsoft.com/en-us/dotnet/api/system.diagnostics.activityidformat?view=netcore-3.1

I was surprised to see the runtime provide tracing functionality built in! But it means trace ID generation cannot be customized at the OpenTelemetry level, we would need to add knobs in the .NET runtime itself.

We have some ideas of trying to work around this by filling epoch in tracestate and have propagators and exporters reconstruct an epoch-prefixed trace ID based on it. It seems like it may not work completely, though, in particular, I expect a lot of trouble making sure a correct trace ID is synced across metrics / logs assuming it's even possible to modify the trace ID during propagation in a generic way. To clarify, it's not enough for only the X-Ray propagator / exporter to support this trace ID translation, we would not be able to interop with any other servers that e.g., use B3 without complete coverage of the trace ID conversion.

Sorry for the long background, but for the spec, I am wondering if it's possible to require all SDKs to support custom ID generation. Not allowing it could break interop with certain backends that require IDs such as epoch-based IDs and this seems against the spirit of OpenTelemetry. With a hard requirement in the spec, we would have a lot more leverage, for example, in making sure the .NET runtime can satisfy our needs with its API.

@anuraaga anuraaga added the spec:trace Related to the specification/trace directory label Aug 28, 2020
@anuraaga
Copy link
Contributor Author

/cc @lupengamzn @srprash

@Oberon00
Copy link
Member

@reyang

@andrewhsu andrewhsu added release:after-ga Not required before GA release, and not going to work on before GA area:api Cross language API specification issue labels Aug 28, 2020
@anuraaga
Copy link
Contributor Author

anuraaga commented Sep 1, 2020

@reyang Any thoughts on this? Thanks!

@anuraaga
Copy link
Contributor Author

anuraaga commented Sep 2, 2020

@yurishkuro Also I remember you commenting on the w3c spec that we shouldn't add verification of randomness since there are legitimate IDs with non-random parts. Do you have any thoughts on OpenTelemetry requiring the ability to generate such IDs? Thanks!

@yurishkuro
Copy link
Member

I think the ID generator should be pluggable in the SDK, if that's what you're asking. And I don't think that the spec should insist on pure random IDs, since it does not affect the API or SDK behavior in any way.

@anuraaga
Copy link
Contributor Author

anuraaga commented Sep 2, 2020

@yurishkuro The key word there is SHOULD :) Do you think it's SHOULD or MUST? If any SDK didn't support pluggable ID generation, than that language would potentially not be able to support backends that rely on them.

@yurishkuro
Copy link
Member

I am fine with MUST for SDKs, but I am not currently working directly on any SDKs, so this guidance should be validated by maintainers in different languages. In my prior experience it was never a problem to expose the ID generator to be swappable, but your prior comments indicate that this might be somehow more complicated in the .NET runtime, so I don't have an opinion there.

@Oberon00 Oberon00 added area:sdk Related to the SDK and removed area:api Cross language API specification issue labels Sep 2, 2020
@anuraaga
Copy link
Contributor Author

anuraaga commented Sep 3, 2020

Thanks - definitely need some input from various languages. But I also think we need a high level discussion by the spec maintainers since it relates to OpenTelemetry's goals. If OpenTelemetry considers the custom ID format use case to be important, then we need to add the requirement to the spec itself. Interested in any thoughts @open-telemetry/specs-trace-approvers.

@Oberon00
Copy link
Member

Oberon00 commented Sep 3, 2020

I would be in favor of forcing implemenations to allow non-random trace (and span) IDs. In fact, they already have to deal with them right now, as a parent span ID could come from a non-OpenTelemetry system that e.g. uses a non-random GUID.

@lupengamzn
Copy link

Hi all, so we're trying to replace OT's trace IDs with non-random trace IDs, but the only feasible public API allows us to do so is SetParentId, which will reset the trace ID and parent span ID of current activity. However, when we're applying this API on a root activity, its parent ID will no longer be null and will not pass the default sampling mechanism check in the ActivitySourceAdapter when we didn't provide sampler in advance. We're almost running out of ideas and if you're planning to provide APIs to directly modify the trace ID, that would be awesome, but if not, is there any possibility that you can improve this checker to take our case into consideration?

@lupengamzn
Copy link

Issue 1240

@Oberon00
Copy link
Member

Oberon00 commented Sep 15, 2020

@lupengamzn Indeed the spec is currently lacking a description of how to create a SpanContext. If you only implement what is specified, you could not create one at all. Seems like a bug to me, but different from this issue which talks about exchanging the generator in cases where the SDK generates a trace/span ID for you.

@andrewhsu
Copy link
Member

from the spec sig mtg today, talked about this issue and decided to relabel as allowed-for-ga since the expected changes in PR #1006 are not major breaking changes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:sdk Related to the SDK priority:p3 Lowest priority level release:allowed-for-ga Editorial changes that can still be added before GA since they don't require action by SIGs spec:trace Related to the specification/trace directory
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants