-
Notifications
You must be signed in to change notification settings - Fork 300
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Migrate from OpenTracing to OpenTelemetry #1548
Conversation
Thanks very much @rajatvig, that looks fantastic. I'm hoping we can soon find time to review it, and think about compatibility/migration for existing Datadog/OpenTracing users. |
It's really exciting.
|
@rajatvig Firstly, thanks for taking the time to open this PR :) I've tried to test these changes locally, but I'm not seeing any data sent through to Jaeger. FYI, I'm doing: ❯ env |grep OTEL
OTEL_EXPORTER_OTLP_TRACES_ENDPOINT=http://localhost:4317
OTEL_RESOURCE_ATTRIBUTES=deployment.environment=development
❯ scripts/build-binary.sh linux amd64 test1
Building buildkite-agent with:
GOOS=linux
GOARCH=amd64
BUILD_VERSION=test1
Done: pkg/buildkite-agent-linux-amd64 💪
❯ cd pkg
❯ ll
total 25M
lrwxrwxrwx 1 fatmcgav fatmcgav 27 Jan 11 14:12 buildkite-agent -> buildkite-agent-linux-amd64
-rwxr-xr-x 1 fatmcgav fatmcgav 25M Jan 11 14:21 buildkite-agent-linux-amd64
❯ ./buildkite-agent start --token $BUILDKITE_AGENT_TOKEN --tracing-backend otel --build-path "$PWD/builds" --tags "queue=test"
_ _ _ _ _ _ _ _
| | (_) | | | | (_) | | |
| |__ _ _ _| | __| | | ___| |_ ___ __ _ __ _ ___ _ __ | |_
| '_ \| | | | | |/ _` | |/ / | __/ _ \ / _` |/ _` |/ _ \ '_ \| __|
| |_) | |_| | | | (_| | <| | || __/ | (_| | (_| | __/ | | | |_
|_.__/ \__,_|_|_|\__,_|_|\_\_|\__\___| \__,_|\__, |\___|_| |_|\__|
__/ |
https://buildkite.com/agent |___/
2022-01-11 14:23:15 NOTICE Starting buildkite-agent v3.33.3 with PID: 3026700
2022-01-11 14:23:15 NOTICE The agent source code can be found here: https://github.com/buildkite/agent
2022-01-11 14:23:15 NOTICE For questions and support, email us at: hello@buildkite.com
2022-01-11 14:23:15 INFO Registering agent with Buildkite...
2022-01-11 14:23:15 INFO Successfully registered agent "fatmcgav-p1g2" with tags [queue=test]
2022-01-11 14:23:15 INFO Starting 1 Agent(s)
2022-01-11 14:23:15 INFO You can press Ctrl-C to stop the agents
2022-01-11 14:23:15 INFO fatmcgav-p1g2 Connecting to Buildkite...
2022-01-11 14:23:16 INFO fatmcgav-p1g2 Waiting for work...
However I don't see any traces hitting my local Jaeger instance... I do see a couple of additional log lines in my build output which weren't there prior to these changes:
Any suggestions? Ultimately, I want to test sending traces to an Elastic APM server instance, but thought I'd test with Jaeger first :) Thanks |
@rajatvig sorry, I should've said that I am using the example OTEL collector included in this PR... |
@fatmcgav That seems odd. I just ran it locally and it works with almost the same steps as you outlined. The only difference being I am on a macOS and not on a Linux host. It might have something to do with how docker compose is binding the network. Will try and run on a Linux machine tomorrow. |
So I've spent a bit more time debugging this one, and was able to get traces into my local Jaeger instance by tweaking diff --git a/bootstrap/bootstrap.go b/bootstrap/bootstrap.go
index 17365239..31249947 100644
--- a/bootstrap/bootstrap.go
+++ b/bootstrap/bootstrap.go
@@ -272,7 +278,12 @@ func (b *Bootstrap) startTracing(ctx context.Context) (trace.Span, context.Conte
var stopper stopper
switch b.Config.TracingBackend {
case "datadog", "otel":
- client := otlptracegrpc.NewClient()
+ client := otlptracegrpc.NewClient(
+ otlptracegrpc.WithInsecure(),
+ otlptracegrpc.WithEndpoint("localhost:4317"),
+ otlptracegrpc.WithDialOption(grpc.WithBlock()),
+ otlptracegrpc.WithReconnectionPeriod(time.Second),
+ )
exporter, err := otlptrace.New(ctx, client)
if err != nil {
b.shell.Errorf("Error creating OTLP trace exporter %s", err) So it seems like there's some kinda issue with loading the config from env vars, atleast in my env... I've confirmed that the env vars are set:
|
My thoughts aswell, but it just isn't working for me... I even went as far as building an agent from this PR, pushing a local docker image and testing on K8s with the relevant Unfortunately setting |
@fatmcgav I am guessing the ENV variables are not getting passed to the Agent process you are running. Maybe add debugging from the Agent to spit out the environment variables it sees. Also, the endpoint does need to be prefixed with |
So my test pipeline runs the following step:
And I can see the following output in Buildkite build log:
|
The environment variable is |
My understanding is that the Either way, I'm pretty sure I tested both with and without the Edit: Yeh, so the logic in https://github.com/open-telemetry/opentelemetry-go/blob/main/exporters/otlp/otlptrace/internal/otlpconfig/envconfig.go#L63 looks for the |
2. Export | ||
|
||
```shell | ||
export OTEL_EXPORTER_OTLP_TRACES_ENDPOINT=http://localhost:4317 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might need to be OTEL_EXPORTER_OTLP_ENDPOINT
or have /v1/traces
appended.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code is using the GRPC based endpoint so it does not need that appended
Thanks so much @rajatvig @cyrille-leclerc @fatmcgav @cbandy for the thinking, testing and discussion around this. |
.. and
This is very cool info, I didn't realise the datadog agent now ships with native support for ingestion of OTLP traces. |
I was able to get this to work on the elastic stack for buildkite with a bootstrap script by patching with and And by running an otel collector on the agent. Looking forward to seeing this merged :) |
In #1563 I kinda assumed buildkite-agent would adopt OpenTelemetry. Here's a patch on this PR that is a rough draft of some more fields I would like to see on spans. I am very excited to see this merged, either way! Thank you! From fc0e463fc21675bc4a5d155b4d4733f104303083 Mon Sep 17 00:00:00 2001
From: John Soo <john.soo@arista.com>
Date: Fri, 18 Feb 2022 07:04:20 -0800
Subject: [PATCH] tracing: Add buildkite_{base_branch,pull_request,label}
attributes.
---
bootstrap/bootstrap.go | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/bootstrap/bootstrap.go b/bootstrap/bootstrap.go
index 95e5365c..8d2a8939 100644
--- a/bootstrap/bootstrap.go
+++ b/bootstrap/bootstrap.go
@@ -240,6 +240,7 @@ func (b *Bootstrap) startTracing(ctx context.Context) (trace.Span, context.Conte
buildNumber, _ := b.shell.Env.Get("BUILDKITE_BUILD_NUMBER")
buildURL, _ := b.shell.Env.Get("BUILDKITE_BUILD_URL")
jobURL := fmt.Sprintf("%s#%s", buildURL, b.JobID)
+ pullRequestBaseBranch, _ := b.shell.Env.Get("BUILDKITE_PULL_REQUEST_BASE_BRANCH")
source, _ := b.shell.Env.Get("BUILDKITE_SOURCE")
label, hasLabel := b.shell.Env.Get("BUILDKITE_LABEL")
if !hasLabel {
@@ -288,12 +289,15 @@ func (b *Bootstrap) startTracing(ctx context.Context) (trace.Span, context.Conte
attribute.String("buildkite.org", b.OrganizationSlug),
attribute.String("buildkite.pipeline", b.PipelineSlug),
attribute.String("buildkite.branch", b.Branch),
+ attribute.String("buildkite.pull_request", b.PullRequest),
+ attribute.String("buildkite.pull_request_base_branch", pullRequestBaseBranch),
attribute.String("buildkite.job_id", b.JobID),
attribute.String("buildkite.job_url", jobURL),
attribute.String("buildkite.build_id", buildID),
attribute.String("buildkite.build_number", buildNumber),
attribute.String("buildkite.build_url", buildURL),
attribute.String("buildkite.source", source),
+ attribute.String("buildkite.label", label),
attribute.String("buildkite.retry", fmt.Sprintf("%v", retry)),
attribute.String("buildkite.parallel", fmt.Sprintf("%v", parallel)),
attribute.String("buildkite.rebuilt_from_id", rebuiltFromID),
--
2.34.0
|
Bumping this, is there anything I can help with to get it over the line and merged? We've found this data very valuable, and would love to get it into the next release. |
Hey friends! First off, an apology: we haven't given this PR the attention it deserves, and it's languished a bit. We've had competing priorities within the team, but we should've been clearer and more upfront about what was going on. We're working on processes within the team so that we can be better opensource stewards of Agent and Friends, but these things take time. With that said, wrt. this PR, I have some good news and some bad news. The good news is that OpenTelemetry support is a high priority for us at the moment. It's clear that this is something that people want in the agent, and they have good reason, so I'm going to be working on adding OpenTelemetry support to the agent over the next couple of weeks. The bad news is that unfortunately, we won't be able to merge this PR. The PR is in good shape as it stands, but unfortunately, there are breaking changes in Datadog's behaviour between ingesting OpenTracing traces and OpenTelemetry traces. This means that were we to ship this as is, users currently using the (older) OpenTracing/Datadog integration would see differently shaped traces, and would need to do some extra work to get their Datadog agents. With that in mind, I'm going to work on a separate PR that includes OpenTelemetry support in such a way that existing workflows aren't broken - likely taking liberal inspiration from this PR. I'm happy to chat about this in the comments here, otherwise, feel free to get in touch with me on either the Enterprise or Community slacks (@Benno Moskovitz) or via email at ben.m@buildkite.com. As a matter of logistics, I'll close this PR 2022-05-06, or once we've reached consensus in the comments, whichever comes later. |
Thanks for the update @moskyb, great to hear this is still happening 😃 |
Thanks for the update @moskyb. Really glad this is moving forward. Am fine closing the PR also long as we get OpenTelemetry support in the Agent |
This moves the tracing setup from OpenTracing (now deprecated) to OpenTelemetry. The code is vendor agnostic and tracing data is sent over GRPC to the collector as configured in the environment.
Have removed the code in
tracetools
and inlined the span create and end inbootstrap
.There is a small demo setup of the OpenTelemetry Collector and Jaeger to see traces from the agents locally.
This would very likely break any consumers of the current Datadog based tracing setup as it relies on a running Collector. However, as documented here, it is easy to get it stood up for Datadog.