Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide examples for Inter-Service-Tracing #1949

Open
NobbZ opened this issue Oct 28, 2022 · 9 comments
Open

Provide examples for Inter-Service-Tracing #1949

NobbZ opened this issue Oct 28, 2022 · 9 comments
Labels
discussion Input from everyone is helpful to drive this forward enhancement New feature or request

Comments

@NobbZ
Copy link

NobbZ commented Oct 28, 2022

What needs to be changed?

Currently "distributed tracing" is mentioned as a major selling point, and it is also mentioned that this works "somehow" via Context Propagation and Baggage.

There is not a single word though, how to use this between 2 independent services.

Therefore it would be nice if there were some examples showing this.

This example could be implemented as some docker compose that provides a visualizer as well as 2 services which interact with each other.

An example would be some webserver exposed as the "frontend" which can be curled as an echo server, which expects a name in the query (or in the path of the URL) and forwards this to the other service. The other service reverses it. Then the reversed string gets replied in the body to the client.

Ideally this example would exist per language.

Of course I see how the details might differ between different transports, though the important thing is how to get the context serialized and deserialzed. Getting it into/out of the transport is the easy part.

The main reason for this request: For a simple example implementation in Elixir it took me ~4 hours of reading documentation (which is mostly empty in this area of the library) and sourcecode before being able to properly see the "clients" trace ID in the "servers" parent trace ID.

(and this implementation is probably not as intended)

@NobbZ NobbZ added the bug Something isn't working label Oct 28, 2022
@svrnm
Copy link
Member

svrnm commented Oct 28, 2022

I agree that it is something our docs should have eventually, see also the discussion in this ticket: #1862

A good starting point to learn is the demo which gives you a complex environment to play with, beyond that, as said in the other ticket as well ideally you should not worry yourself about context propagation: instrumentation libraries can take care of that for you, so if you use any of the following you should be covered:

https://opentelemetry.io/registry/?language=erlang&component=instrumentation

If not let the erlang community know that you have a need for a specific library that's not yet covered.

cc @open-telemetry/erlang-approvers

@tsloughter
Copy link
Member

I've wanted similar to what is described. If every language implemented the same 2 client/server services then they could be swapped between in a docker-compose file and you expect the same result.

I also want this to do actual testing of implementations matching the spec, but that is a separate issue.

@cartermp
Copy link
Contributor

@NobbZ where did you look when you expected to find an explanation + sample?

I agree, it is a big gap. We don't describe generally how context propagation works, nor do we have a dedicated section + example for each language.

@svrnm
Copy link
Member

svrnm commented Oct 28, 2022

I've wanted similar to what is described. If every language implemented the same 2 client/server services then they could be swapped between in a docker-compose file and you expect the same result.

ACK, here's how I see this in the future:

  • Step 1, we have the "roll the dice" web server as "getting started app" for all languages
  • Step 2, there is a second route/endpoint called /rolldiceRemotely (or something) that calls /rolldice on a downstream service, there is a second route/endpoint called /battle or /compete which calls /rolldice on 2 downstream services and compares the result

But there's a few things that need to be done to get there :)

@NobbZ
Copy link
Author

NobbZ commented Oct 28, 2022

A good starting point to learn is the demo which gives you a complex environment to play with,

The demo is hidden well. I searched the documentation and found no hints, I asked google with a couple of keyword combinations, I tried various things in the GH org repo search (but not "demo" obviously).

Perhaps linking it from the Documentation instead of "community" might help with discoverability.

Also a big problem seems to be, I can not make it work…

Something wants to bind [::]:8080 which is already in use and I do not want to stop that service, I want to run the demo binding another port, and also not ::!

The system is lacking documentation how to change this.

By digging the compose file, I found ENVOY_PORT, but setting that to 8081 does make the services start and persist, though even after 15 minutes of waiting I get a 503:

upstream connect error or disconnect/reset before headers. reset reason: connection failure, transport failure reason: immediate connect error: Cannot assign requested address

No clue how to proceed…

instrumentation libraries

I checked hex.pm and the plug, cowboy and phoenix (server side) instrumentation libraries I found did not talk about taking a context via request, just starting a span for each, so the system remains in isolation.

I haven't found an instrumentation library for httpoison (HTTP client) at all.


Though what I get from skimming the sources of the elixir/erlang based service in the demo, it seems as if I need an interceptor, though this word again is not mentioned in the docs at all… I have not found any documentation about how to create one. In general the sparse docs and the source code for the elixir and erlang libraries seem to use a lot of domain vocabulary that is not explained anywhere.

@tsloughter
Copy link
Member

tsloughter commented Oct 28, 2022

An interceptor is grpc specific. The propagation to the featureflag service is over grpc so it has the grpcbox interceptor enabled.

There is no httpoison instrumentation library that I am aware of. Taking a quick look it appears httpoison supports a sort of middlewaring (handle_request_headers), so it would be simple enough to create one -- relatively speaking, I was thinking in terms of compared to hackney which has no such helpers :), not in the sense the docs are in a state that make it simple to create a new library.

@tsloughter
Copy link
Member

Oops, I guess I should have googled first https://github.com/primait/telepoison

I need to get them to submit that to the contrib repo :)

@NobbZ
Copy link
Author

NobbZ commented Oct 28, 2022

In this case httpoison was just the library I happened to use for the prototype.

And I wouldn't actually need to trace that one (unless I can extend the trace to include ElasticSearch processing the query).

It is basically the end of everything we can observe. Though I will need to find ways for tracing from a JS frontend over backends in JS, Ruby and Elixir, which do some back and forth.

And despite the documentation situation, I still think OT.io is the correct tool for that job.

And this is something I will continue to prototype even in my freetime, as this really would solve some urgent pain I have with the reliance on rural knowledge within the team…

@chalin
Copy link
Contributor

chalin commented Dec 16, 2022

@svrnm - tag this as an enhancement request rather than a bug?

@cartermp cartermp added enhancement New feature or request and removed bug Something isn't working labels Dec 16, 2022
@theletterf theletterf added the discussion Input from everyone is helpful to drive this forward label Apr 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion Input from everyone is helpful to drive this forward enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

6 participants