Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add HTTP exporting to OpenTelemetry Tracer #28454

Closed

Conversation

AlexanderEllis
Copy link
Contributor

@AlexanderEllis AlexanderEllis commented Jul 18, 2023

Commit Message: Add HTTP exporting to OpenTelemetry Tracer
Additional Description: This PR refactors the OpenTelemetry exporting to allow for a second option: OTLP spans over HTTP according to the OTLP HTTP spec. It refactors the existing gRPC exporter into a general exporter that is then used by the gRPC and the HTTP exporters. The HTTP exporter is relatively simple: it accesses the thread local cluster for the collector, then sends it a request.
Risk Level: Low (new, optional functionality for existing tracing extension)
Testing: Unit test and manual testing with OpenTelemetry collector
Fixes #26352

Meta notes about the PR:

  • I feel my C++ skills slowly slipping away from me — please don't hesitate to leave feedback if I've missed anything basic
  • I wasn't sure what the best way to update the proto config was. Ideally, it would be a oneof, with either gRPC or HTTP, but I hesitate to make a breaking change, opting instead for an optional new field that is ignored if the gRPC config is present. Happy to hear suggestions there.
  • Additionally, I wasn't sure what the best practices were for accessing the thread local cluster. I've opted to pass in the cluster manager and call getThreadLocalCluster, but would love recommendations there as well
  • Open question: what would be the best way to add Authorization headers to the request to the collector? Header mutation as part of the upstream_http_protocol_options in the collector cluster config? Something new here? Wasn't sure, but open to ideas.

Signed-off-by: Alex Ellis <ellisonjtk@gmail.com>
Signed-off-by: Alex Ellis <ellisonjtk@gmail.com>
Signed-off-by: Alex Ellis <ellisonjtk@gmail.com>
Signed-off-by: Alex Ellis <ellisonjtk@gmail.com>
Signed-off-by: Alex Ellis <ellisonjtk@gmail.com>
Signed-off-by: Alex Ellis <ellisonjtk@gmail.com>
Signed-off-by: Alex Ellis <ellisonjtk@gmail.com>
Signed-off-by: Alex Ellis <ellisonjtk@gmail.com>
Signed-off-by: Alex Ellis <ellisonjtk@gmail.com>
@repokitteh-read-only
Copy link

As a reminder, PRs marked as draft will not be automatically assigned reviewers,
or be handled by maintainer-oncall triage.

Please mark your PR as ready when you want it to be reviewed!

🐱

Caused by: #28454 was opened by AlexanderEllis.

see: more, trace.

@repokitteh-read-only
Copy link

CC @envoyproxy/api-shepherds: Your approval is needed for changes made to (api/envoy/|docs/root/api-docs/).
envoyproxy/api-shepherds assignee is @lizan
CC @envoyproxy/api-watchers: FYI only for changes made to (api/envoy/|docs/root/api-docs/).

🐱

Caused by: #28454 was opened by AlexanderEllis.

see: more, trace.

Signed-off-by: Alex Ellis <ellisonjtk@gmail.com>
Signed-off-by: Alex Ellis <ellisonjtk@gmail.com>
Signed-off-by: Alex Ellis <ellisonjtk@gmail.com>
Signed-off-by: Alex Ellis <ellisonjtk@gmail.com>
Signed-off-by: Alex Ellis <ellisonjtk@gmail.com>
@AlexanderEllis
Copy link
Contributor Author

/retest

@AlexanderEllis
Copy link
Contributor Author

AlexanderEllis commented Jul 19, 2023

Looks like the only failing presubmit was Verify/examples with [grpc-bridge] Failed, which seems likely unrelated (also not sure if there's a quick way to rerun — it didn't like my /retest attempt). With the other presubmits passing, I think this is ready for review 👍

@AlexanderEllis AlexanderEllis marked this pull request as ready for review July 19, 2023 04:38
Copy link
Member

@lizan lizan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

api review

// See https://opentelemetry.io/docs/specs/otlp/#otlphttp.
HttpFormat http_format = 2;

// Optional path. If omitted, the default path of "/v1/traces" will be used.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The HttpUri contains path and host as part of URI, so just derive from there?

BINARY_PROTOBUF = 0;

// JSON Protobuf Encoding.
JSON_PROTOBUF = 1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mark this not implemented as it is not in the implementation?

@@ -17,6 +18,31 @@ option (udpa.annotations.file_status).package_version_status = ACTIVE;
// Configuration for the OpenTelemetry tracer.
// [#extension: envoy.tracers.opentelemetry]
message OpenTelemetryConfig {
// Configuration for sending traces to the collector over HTTP. Note that
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to also refactor the config in the same way done for the exporters? For example, keeping the base config in here, but then adding new types for OtlpHttpExporterConfig and OtlpGrpcExporterConfig?

That's also what OTel C++ does Http gRPC

Now we will have a config with HTTP-related things (HTTP Format, HTTP headers) that are not relevant for gRPC. Same would apply for possible future gRPC config only.

}

// The upstream cluster that will receive OTLP traces over HTTP.
core.v3.HttpUri http_uri = 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this is appropriate for this PR, but the OTEL spec defines default URL/Port for gRPC and HTTP that targets these endpoints in the OTel collector. https://github.com/open-telemetry/opentelemetry-proto/blob/main/docs/specification.md#otlphttp-default-port.

OTel users using these exporters in Envoy might be surprised they have to configure these, since in other "OTel components" they are set OOTB.


// Optional path. If omitted, the default path of "/v1/traces" will be used.
string collector_path = 3;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joaopgrassi
Copy link
Contributor

joaopgrassi commented Jul 25, 2023

@AlexanderEllis Hi! I did some experimentation with the changes here. I basically used the sandbox from https://www.envoyproxy.io/docs/envoy/latest/start/sandboxes/opentelemetry#step-1-build-the-sandbox and replaced the envoy binary on the containers with the one built locally.

I struggled a bit to configure it but after some trial/error I got it to work. Here are some points which are unclear to me:

  • http_uri:uri: What is this property for? I initially thought this was the URL the exporter would use to send the OTLP messages to, as it's described in the opentelemetry.proto

// The upstream cluster that will receive OTLP traces over HTTP.

But it seems it's not being used at all. I realized that the one coming from the cluster http_config:cluster is the one actually being used. I can put whatever URL there, doesn't matter.

  • collector_path: In opentelemetry.proto it says it's optional, but without it I see 400's in envoys. Probably because it's not sending to the correct place to my OTel collector. Also, if this is needed I'm afraid we might need a different name for it. I think the point of having the HTTP exporter was that people don't have to have a OTel collector and want to send the traces directly to their backends, so calling it collector_path might be confusing. What about calling it otlp_traces_path? Or just traces_path?

General question: If all the above is true (the cluster is what the HTTP client uses to get the URL/send the requests), how would I configure an external trace backend in the cluster section? My little understanding is that the HTTP client is "derived" from the cluster provided in http_config:cluster. Could HTTP headers also be defined in the cluster definition, that would apply to all requests made from that HTTP client? Sorry if I'm just saying wrong things, I'm very new to Envoy's architecture :)

@htuch
Copy link
Member

htuch commented Aug 4, 2023

@AlexanderEllis what are the next steps for this PR? (friendly maintainer on-call stale PR ping ;)

@joaopgrassi
Copy link
Contributor

joaopgrassi commented Aug 4, 2023

Hi @htuch I'm interested in getting this feature into Envoy, but I have some questions. I asked over on Slack but didn't get much response there, so maybe you can help clarify? I could also work on it, if @AlexanderEllis doesn't have the cycles.

Basically it's my question on the last comment: The HTTP exporter here, depends on a cluster in order to obtain an HTTP client that does the post request with the OTLP data. Is there another way to obtain an HTTP client in Envoy, other than via the cluster "route"? I'm a beginner in Envoy so most of the concepts are new to me. I have read the docs but it's not clear if this is the way it works.

The reason I'm asking is because maybe people have an "agent" that is running locally like the OTel collector, but also they could not have the collector and want to send the telemetry data directly to their back-end that's available publicly on the internet. I think that's the main goal of having an HTTP exporter like this. Does it still make sense to add the public back-end host name in a cluster definition?

For example, is it possible to have a config like this, without a cluster configured? The HTTP exporter then would have its own HTTP client configured with the values in the config (uri, headers etc)

 http_config:
  http_uri:
	uri: "http://my-observability-backend/v1/traces"
	timeout: 0.250s
  http_format: BINARY_PROTOBUF
  headers:
	Authorization: "Token"
	Other: "some custom header"

@htuch
Copy link
Member

htuch commented Aug 4, 2023

You can easily do Internet hosts at clusters with something like strict / logical DNS clusters.

@KBaichoo
Copy link
Contributor

KBaichoo commented Aug 8, 2023

friendly ping @AlexanderEllis

I think this is pending changes on your end.

/wait

@joaopgrassi
Copy link
Contributor

Just to re-iterate, I could take this over and continue the work if @AlexanderEllis does not have the cycles now. We are interested in getting this in, specially because of #28851

@htuch
Copy link
Member

htuch commented Aug 14, 2023

Would be great to hear from @AlexanderEllis on how we can collaborate with @joaopgrassi here. If @AlexanderEllis doesn't have cycles to push this PR right now we can have @joaopgrassi take it over.

@AlexanderEllis
Copy link
Contributor Author

Hey @htuch and @joaopgrassi ! Sorry for the delay here, ended up extremely busy between work and personal life, and my OSS Envoy hobby work fell off the map. Maybe some day I'll be doing it for work again :)

@joaopgrassi if you'd like to take over this PR, that would be totally fine with me! That would also be a great introduction for you for an Envoy PR — I know you mentioned in the other issue that you're new to making changes to the project. I'm also happy to advise as possible, with the caveat that I may be in and out. I think it's pretty much ready to go as is, with the comments to address around the proto config.

@joaopgrassi
Copy link
Contributor

joaopgrassi commented Aug 16, 2023

@AlexanderEllis totally understandable, no problem!

Alright! If it's OK with you I will fork your PR and submit a new one then. It would be awesome to get your feedback and be able to ask you things, since you wrote the initial OTel things in Envoy. Are you on the Envoy CNCF Slack? Would be great to have a channel to chat, of course with no expectations on your side :). Thanks again

@joaopgrassi
Copy link
Contributor

joaopgrassi commented Aug 23, 2023

Hi all, the new PR is open! #29207

@github-actions
Copy link

This pull request has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in 7 days if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions!

@github-actions github-actions bot added the stale stalebot believes this issue/PR has not been touched recently label Sep 22, 2023
@github-actions
Copy link

This pull request has been automatically closed because it has not had activity in the last 37 days. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions!

@github-actions github-actions bot closed this Sep 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api stale stalebot believes this issue/PR has not been touched recently waiting
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support for sending OpenTelemetry Tracing over HTTP/Protobuf endpoint
5 participants