-
Notifications
You must be signed in to change notification settings - Fork 431
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OpenTelemetry Tracing API vs Tokio-Tracing API for Distributed Tracing #1571
Comments
Tagging @open-telemetry/rust-approvers |
If we were to use tracing as the API. This is the deviation between existing tracing API and Otel tracing API
|
From the metrics perspective exemplars are also something to take into account. |
As requested in the community meeting: I would like say that we should probably try to see if it's not possible to improve the inter-compatibility as people will still try to use it directly. Questions that are open from my perspective:
We know that the Update: I think really I'd be more 3 than 2. If we can promote inter-compatibility between the two then I think that's a greater win for the community at large. Because as I mentioned during the meeting we will still need to have "some" API anyway. |
As a heavy user of direct OpenTelemetry instrumentation (e.g., using Which interfaces specifically would be deprecated?
I suspect a lot of other OpenTelemetry users are also doing so in private repositories, so I agree that it's hard to measure. I would caution against inferring much from these public GitHub usage stats. |
We don't know exactly yet. The idea is to bridge the gap between the |
I vote for option 2, as there are challenges with other options:
Going with Option 2, we also need evaluation for introducing an extension API within OpenTelemetry. This is to effectively bridge the existing gaps between the OTel specifications and Tokio-Tracing's functionalities (e.g, Baggage support, Propagators). |
Direct consumption of opentelemetry-api could be for traces, metrics and logs, and I agree it is really hard to get the actual statistics for "traces" only :) |
OpenTelemetry comes from OpenCensus and OpenTracing merge. IDK if I have a saying because I don't maintain the OTel Rust, but I'd vote for Option 1 and invite the maintainers of IDK how much
I'm biased but I see OTel as the future for Observability signals. |
Tagging for more inputs. |
Just to provide context here. I think if we move to
|
I haven't had much time recently to work on open source, but my perspective is that option 3 is likely optimal in the near term. I suspect that expressing the full otel API via Option 3 could be done via clearer purposes for each API (e.g. low level "full" api via otel, or high level "limited but ergonomic and user-friendly" api via |
I'm on vacation, so I'll be brief and try to expand next week/summarize my thoughts from Slack: pulling a .NET (paying attention to the intent, not the letter of the spec) is very much possible, down to the fact that propagators remained in a dedicated OTEL library for 2.5 years. I think |
In practice, this is still the case! So is I'll also be on vacation for ~1 week. Once back, I'll write down more details on how option2 could potentially look like. I didn't want to spend too much time on exploring any of the options, without observing which one the community as a whole would lean to.. It does not look like there are any clear winners so far, but part of the reason could be due to lack of specifics/details on what would each option really entails. I'm not yet in a position to strongly support any option so far, however, I'll take a stab at exploring option 2 further. |
I guess I'll try to take a look at how we could go for option 3. From the top of my head use cases to look at:
Then some variant of the two where both Those will be "advanced cases", but honestly it might be more common than one might think. |
Comment/Discussion from Community Meeting for option3: Test to validate the option3 A - uses tracing for producing span 3 spans It may not be feasible to ask users to use same api for all 3, as they may not own/control some of them. eg: B could be reqwest crate. #1378 (comment) shows an examples where logging and tracing (distributed tracing aka spans) are used, and correlation is broken when |
Took some time to get to this due to other priorities, but here are more details on one possible way to go with option2, including a prototype: |
👋🏻 I am not a Rust developer so am coming from a very different perspective. My take is that, to my knowledge, every other ecosystem has opted for Option 1 long-term, Option 3 near-term. Specifying the API in OTel was (I assume) a large effort and we have seen the API evolve as developers have battle-tested it and provided feedback (e.g., lack of a synchronous gauge instrument, which is now in the spec.) My impression is that spec evolution is a pretty collaborative process, which is nice to observe. In my view it would be a mistake to align on pre-existing instrumentation conventions as OTel's mission has been to provide a standard API that instrumentations across languages/systems can adhere to. This is particularly important as it provides a path for libraries to provide instrumentation hooks to, e.g. automatically generate traces and metrics as part of their own business logic, kind of like bpf kernel tracepoints or UDST. And those hooks are written according to a wider specification and hence less vulnerable to governance issues that tend to come up in external libraries from time to time. In the Go OTel SDK there are several "bridge" interfaces that help to close the gap b/w the OTel API and existing instrumentation libraries, e.g., the opencensus bridge. Perhaps this would be a way to pave the path towards wider OTel API adoption. /$0.02 🙇🏻 |
@hdost After re-reading this, I am not entirely sure if I understand the part where you said "I mentioned during the meeting we will still need to have "some" API anyway" I think @TommyCpp also mentioned this (in metrics context though). If you look at the prototype, it has tracing sdk only! No tracing api. i.e there is nospan/span.start()/end() etc. We'll need Could you check this. We can discuss in the next SIG call and figure out what are the gaps in our understanding. |
I've stumbled upon this issue about an issue with tracing and OpenTelemetry API compatibility. I didn't test if it is still current, but it might be another data point to consider. Just checking, was some kind of decision made anywhere? The original timeline aimed for end of April which has already passed, but I didn't attend the weekly calls if it was discussed there. |
Its similar/same as #1690 ? No decision is finalized. I have done initial exploration of option 2 here : #1689 .We discussed some ideas even in yesterdays' community call as well. @TommyCpp is further exploring this approach to come up with a list of issues (along with severity - nice-to-have vs blockers) that we need to discuss with We are really short on manpower, especially people with experience in |
You're right, it seems to be the same issue. I'll see if I can spare some time but honestly I don't have that much experience with OpenTelemetry itself. |
See if you can join the community meetings. It is 9 AM PT (Tuesdays). If the timing does not work, happy to discuss in separate calls. (you can reach out to me/other maintainers on slack/discord as well) This is the most foundational problem that needs to be resolved in this repo, but unfortunately, it is somewhat a hard problem, and we also lacks manpower :( |
Update from July 30 OTel Rust Community Meeting: We recognize it’ll be a while before this can be fully sorted out. We continually see issues - upgrades are hard, otel-demo is broken, and users are unsure which versions are compatible and the list goes on. To mitigate the short/medium term pain, while also being not-too-far from the long term plans, it was decided to offer Note that this does not support interoperating both APIs for spans - either use tracing or otel tracing api, but mixing them up won't work. If option 3 is settled on, then this will need to be solved, but not part of the immediate release. This does not deprecate @TommyCpp will make the above happen and we are targeting to include it in the next release (~Aug 30) |
After some experiments using our custom tracing implementation, I reached the conclusion that it isn't worth the extra complexity. Most of the ecossystem is using tokio's tracing implementation, and the Rust OpenTelemetry WG are evaluating replacing their implementation with only the tracing layer[1]. Thus, I decided to replace it with a tracing integration setup. The setup is pretty standard, but the implementation uses a custom Layer to pass data from tracing to OpenTelemetry, continuing to use the background worker to do most of the heavy lifting. This makes the hot path that runs in the application loop to be more efficient. The implementation also uses a custom context to allow for faster retrieval of tracing information for propagation. The result is that kiso's users don't have to worry about OpenTelemetry crates unless they need dynamic attributes or links. Everything else is handled by the tracing crate. This commit contains only the necessary code for the migration. There are still some things to sort out, and primarily performance improvement to implement. These will be done in other patches as this one is already too big to properly review. [1]: open-telemetry/opentelemetry-rust#1571
This work is delayed, and won't be part of the coming release (expected in a day). Will post new ETA for this soon. |
@cijothomas is there an update on when it will be implemented? |
Background
The Rust ecosystem has two prominent tracing APIs: the OpenTelemetry Tracing API (Otel for short), delivered through the
opentelemetry
crate, and the Tokio tracing API, provided by thetracing
crate. The OTel Tracing API adheres to the OpenTelemetry specification, ensuring alignment with OpenTelemetry Tracing implementations in other languages like C++, Java etc. Conversely, the Tokio tracing ecosystem, which predatesOpenTelemetry, boasts widespread adoption, with many popular libraries already instrumented. The tracing-opentelemetry crate, maintained outside of OpenTelemetry repositories, act as a "bridge", enabling applications instrumented with tracing to work with OpenTelemetry.
The issue
The coexistence of the OTel Tracing API and Tokio-Tracing poses a dilemma, forcing end users to choose between two competing APIs. This situation complicates the decision-making process due to the absence of comprehensive
documentation comparing the two options. A significant concern is the lack of tested interoperability between the APIs, which can result in issues, especially in applications where different layers use different tracing APIs, potentially
leading to incomplete traces. This also impacts the log correlation scenarios as well.
A Comparison with OTel .NET
The OpenTelemetry .NET community encountered a similar challenge when the OTel Tracing API was introduced, as the .NET runtime library (shipped as the
DiagnosticSource package) already had a similar API in place. This issue was resolved through collaboration between OTel .NET maintainers and the .NET
runtime team, leading to the alignment of the .NET runtime's tracing API with the OTel specifications. This approach was later applied to the Metrics API as well. While the decision by OTel .NET to prioritize the .NET Runtime library's
API over its own for tracing/metrics has generally been successful, it has not been without its challenges. Despite declaring stability years ago, OTel .NET has yet to implement certain aspects of the OTel specification fully.
Although the outcomes in the .NET ecosystem might not directly forecast the success of similar efforts in Rust, they provide a valuable reference point.
Options for Consideration
Deprecate Tokio-Tracing: This approach would align Rust with the OpenTelemetry strategies adopted by other languages. However, considering the popularity and active maintenance of the
tracing
crate in the Rust ecosystem, this path has highest friction and is highly improbable.Deprecate OTel Tracing: Promoting Tokio-Tracing as the standard could be a feasible option, albeit requiring comprehensive evaluation. This strategy would cause OTel Rust to deviate from its counterparts in other languages.
Potential alignment of Tokio-Tracing with OTel Tracing specifications could mitigate this concern but necessitates groundwork to identify gaps and propose solutions. Tokio-Tracing maintainers have shown willingness to accommodate
reasonable changes, pending a clear set of requirements. This option does not eliminate the OTel Tracing API completely, but it'll still remain to compensate for things missing from Tokio-Tracing - only those APIs which are overlapping/competing with Tokio-Tracing needs to be deprecated/removed.
Maintain Both APIs: This alternative emphasizes the importance of ensuring seamless interoperability between the two APIs, allowing users to choose based on preference or specific needs without compromising trace completeness. Achieving this goal requires significant effort to identify and bridge any existing gaps in the interoperability story. Users should be able freely chose between, without worrying about any broken traces.
Do nothing.: OTel Rust has some special accommodations done to help tracing crate (and vice-versa). We can just remove them, and let each crate follow their own destiny. (Highly undesirable state, just listed for completion)
Are there more options? Please let us know in the comments!
Current State
The Rust tracing ecosystem is at a critical juncture. Active discussions between the OTel Rust team and the Tracing Rust team are taking place, with updates and deliberations shared on Cloud Native
Slack. Interested individuals are encouraged to join the discussion on Slack (or right in this Github issue). All decisions and considerations will be posted on GitHub as well for wider visibility and to gather feedbacks.
Timeline
Resolving this issue is a prerequisite (though not the only one) for declaring the Tracing signal as GA (General Availability) for OTel Rust. Given the goal to achieve Tracing GA (alongside other milestones) soon, it's crucial that this issue is resolved promptly. A tentative deadline to reach a decision on the chosen path forward is set for April 30th, 2024, approximately 2 months from today.
Related issues
#1378 Tracing Propagation.
#1394 (comment)
Broken Trace example : #1690
The text was updated successfully, but these errors were encountered: