-
Notifications
You must be signed in to change notification settings - Fork 521
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Migrate from OpenTracing to OpenTelemetry instrumentation #3646
Migrate from OpenTracing to OpenTelemetry instrumentation #3646
Conversation
f8ecdd1
to
6e91af6
Compare
Signed-off-by: Andreas Gerstmayr <agerstmayr@redhat.com>
6e91af6
to
59ea438
Compare
This is a first draft to replace OpenTracing with OpenTelemetry SDK. Some open questions
|
Signed-off-by: Andreas Gerstmayr <agerstmayr@redhat.com>
b8571ce
to
b625520
Compare
Thanks for the PR, this looks like its on the right track to me.
One concern I have is whether we lose the server spans. I see the dskit change linked above, but I'm not clear this addresses the issue. Currently, many spans are generated on the dskit server, which connects various components since the server is used for all components for grpc and http, and these spans are using the opentracing SDK. Are you able to confirm the impact here? I can test this PR out once you give the nod that you're ready. I will mention that there has been some effort in the past to align this between all databases, and since dskit is in the mix, we may not be able to make this change on our own. I've had a coupe branches of my own which look similar, but the concern above blocked forward movement. I expect that we can have updated information here next quarter, but the complexity of moving all databases gave us some pause previously, and so some coordination may be prudent. |
So far my testing consisted of running two tempo instances (monolithic mode), one with this PR and one without, setting up two data sources in Grafana and opening the explore page in separate tabs, and then comparing random samples of traces. Initially the context propagation of the The Yes, you can test this PR already.
I think it should work fine if dskit is instrumented with OpenTracing, as long as we don't remove the OpenTracing-OpenTelemetry bridge. |
I tested it in microservices mode today, looks good so far. |
Greetings. Wondering if we'll need to update the docs for your PR changes. Will this migration impact users? For example, are there changes to configuration files? IF there are updates, we can create a doc issue for your work or you can add the doc updates to this or a separate PR. |
I took some time to deploy this, and it looks pretty good to me. Nice work. Here is a 6 hour window, and you can't really tell when the deployment happened. Looking at alloy for the same period, you can see the transition in receiver protocol, but not much else. It might be nice to include an option for using GRPC. Additionally, to @knylander-grafana 's point, we could call out the change in environment variables where it makes sense. I changed |
@knylander-grafana Yes, there is impact for some users. I'll update the docs in this PR (this section: https://grafana.com/docs/tempo/latest/operations/monitor/#traces). I'll also add gRPC support and update the env vars next week. @zalegrala do you have a metric for the (average) number of spans per trace in your deployment? I'm mainly concerned about broken traces (broken context propagation), this should be visible if the number of spans per trace is different before and after applying this PR. |
Signed-off-by: Andreas Gerstmayr <agerstmayr@redhat.com>
I don't have that metric currently, but I think we can work that up, perhaps next week. I expect the change from grpc -> http will mean there is more negotiation and slightly increase in spans, but I share the concern of broken traces. If we do have some after merge, then we can follow up, but it would be good to identify as much as we can ahead of time. |
I moved from the I also updated the docs, changelog, docker compose example file and resolved all merge conflicts. Today I noticed one issue in the logs around OpenCensus/metrics:
afaics it's due to the OpenCensus bridge at https://github.com/census-instrumentation/opencensus-go/blob/master/metric/metricexport/reader.go#L191 I don't know where Tempo uses OpenCensus metrics though, afaics Tempo does use prometheus/client_golang for metrics. |
The error you point to isn't new with this change, so probably safe to ignore, though would be nice to clean up. |
Found the one occurrence of OpenCensus metrics: But I'm unsure what the code is doing, I can't find any place where a metric value is set. The return value of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doc updates look good. This approval only applies to the doc and not to code changes.
@andreasgerstmayr You have some good work in this PR but there are some conflicts that need to be resolved. |
Yes, unfortunately this PR gets out of date very quickly because it modifies many files. I've split off the update of the dskit dependency to a separate PR #3915 to make this PR a bit smaller. Once the other PR is merged, I'll fix the merge conflicts of this PR again. |
Signed-off-by: Andreas Gerstmayr <agerstmayr@redhat.com>
Signed-off-by: Andreas Gerstmayr <agerstmayr@redhat.com>
Signed-off-by: Andreas Gerstmayr <agerstmayr@redhat.com>
Signed-off-by: Andreas Gerstmayr <agerstmayr@redhat.com>
Signed-off-by: Andreas Gerstmayr <agerstmayr@redhat.com>
Signed-off-by: Andreas Gerstmayr <agerstmayr@redhat.com>
Signed-off-by: Andreas Gerstmayr <agerstmayr@redhat.com>
@@ -27,7 +27,6 @@ require ( | |||
github.com/gorilla/mux v1.8.1 | |||
github.com/grafana/dskit v0.0.0-20240801171758-736c44c85382 | |||
github.com/grafana/e2e v0.1.1 | |||
github.com/grpc-ecosystem/grpc-opentracing v0.0.0-20180507213350-8e809c8a8645 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice.
Nice work @andreasgerstmayr. Thanks for following up on this. |
Thanks for the review & merging! :) |
What this PR does:
Which issue(s) this PR fixes:
Fixes #2224
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]