-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Google Pubsub as an OTLP exporter/receiver #1892
Conversation
Codecov Report
@@ Coverage Diff @@
## main #1892 +/- ##
==========================================
- Coverage 90.70% 90.04% -0.67%
==========================================
Files 399 385 -14
Lines 19802 18730 -1072
==========================================
- Hits 17961 16865 -1096
- Misses 1381 1394 +13
- Partials 460 471 +11
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
a1c97ed
to
fb808a5
Compare
I'm going to split this PR up in different PR's as the CONTRIBUTING suggests. I'll keep this one open for a while. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did an initial skim, in general looks good. Do you mind splitting this up per these guidelines for a new component?
|
||
type Config struct { | ||
configmodels.ExporterSettings `mapstructure:",squash"` // squash ensures fields are correctly decoded in embedded struct. | ||
ProjectID string `mapstructure:"project"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add some comments. Endpoint
in particular needs something on when it would be set (probably very rarely by normal users)
The following configuration options are supported: | ||
|
||
* `project` (Required): The Google Cloud Project of the topics. | ||
* `validate_existence`(Optional): Checks the existence of the topic, but this requires admin permissions to validate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to list out the actual permission names required, both for normal use and when setting this flag.
Also I would consider removing this flag as it might just complicate things and failing when sending messages isn't that bad, many of our exporters work that way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This check is interesting. This verification could make sense to be implemented on exporter startup (how you implemented it), as this would have a fast feedback loop and prevent trying to send data that is going to fail. However, topics can be created at anytime. Since the collector uses a static config, would it be safe to assume that the topic name is known and has been created prior to configuring a collector? Or is it best to assume a fully asynchronous approach and assume a topic may exist at some time in the future?
|
||
func (ex *pubsubExporter) ConsumeTraces(ctx context.Context, td pdata.Traces) error { | ||
bytes, _ := td.ToOtlpProtoBytes() | ||
message := &pubsub.Message{Data: bytes} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think we should put a PR in for the opentelemetry-specification
to document how OTLP interacts with Pub/Sub? While it's straightforward to push the bytes into Data, there could still be different interpretations of how Pub/Sub is supposed to work so we probably want to make sure they're unified across implementations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @anuraaga. I'll make one comment in reply to your initial review. Thanks, I needed a second eye on this.
- I will add it to the spec. I'll ask in gitter to see if I should do a PR or an OTEP first. I see that Kafka does it the same way as I (sending the Request), so that's good.
- I will split up the PR but will keep this one open in the meantime (I converted it to draft), I want to have someone from Google involved todo a quick scan.
- All fix your concerns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking forward to reviewing the spec you draft up, @alexvanboxel! 📘
|
||
This exporter sends OTLP messages to a Google Cloud [Pubsub](https://cloud.google.com/pubsub) topic. | ||
|
||
The following configuration options are supported: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing documentation for user agent
fb808a5
to
bc8c7fd
Compare
"context" | ||
"fmt" | ||
|
||
"cloud.google.com/go/pubsub" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shopify has been running a Google PubSub (trace) exporter in production for almost a year. The largest improvement in stability came from switching to the lower level package cloud.google.com/go/pubsub/apiv1
. The high level package used here provides an asynchronous Publish
method that does not allow back pressure in the collector, and limits the usefulness of the memorylimiter
processor, queued retry, etc. The lower level package provides a synchronous Publish
call, which is more appropriate for the collector.
cc @ibawt
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @fbogsany I will certainly rewrite this as I was a bit struggling with this. I'm writing the OTEP now and will refer to this in this thread.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shopify has been running a Google PubSub (trace) exporter in production for almost a year. The largest improvement in stability came from switching to the lower level package
cloud.google.com/go/pubsub/apiv1
. The high level package used here provides an asynchronousPublish
method that does not allow back pressure in the collector, and limits the usefulness of thememorylimiter
processor, queued retry, etc. The lower level package provides a synchronousPublish
call, which is more appropriate for the collector.cc @ibawt
At Shopify do you push the complete OTLP Request over PubSub or are you unraffling the content. Let me elaborate: For spans, the ExportTraceServiceRequest
contains a repeated ResourceSpans
. The request is already a mini-batch. I'm going to propose to put the ResourceSpans
on Pubsub (or any message bus).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We publish a full ExportTraceServiceRequest
. We treat PubSub as a transport for OTLP, which makes things easier for downstream consumers. In particular, we can reuse code between the collector and the consumer. For example, we have an exporter from PubSub to GCS (Avro) that we'll eventually move into the collector, so it helps if that receives the data it would receive if it were in the collector instead.
Specifically, for traces pdata.Traces
, we Publish
the gzipped bytes returned from traces.ToOtlpProtoBytes()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While it's a big effort, I do wish that OTLP had a FullSpan
or similar type which was a normalized span with resource / instrumentation library built in. Having the ability to pack a self-contained span can help with cases like message queues, downstream processing like metrics updating, or even updating a trace with a new span, generally model better with span-per-message when dealing with retries and such.
This PR was marked stale due to lack of activity. It will be closed in 7 days. |
Commenting... as this is a long-running branch. First OTEP needs to be written. |
Allows export and reception of OTLP data over Google Pubsub. With the ability to set subscriptions and topics for each of the components.
bc8c7fd
to
85ce288
Compare
This PR was marked stale due to lack of activity. It will be closed in 7 days. |
Closed as inactive. Feel free to reopen if this PR is still being worked on. |
Description:
Allows export and reception of OTLP data over Google Pubsub. With the ability to set subscriptions and topics for each of the components.
Link to tracking Issue:
#1802
Testing:
Unit tests are included
Documentation:
The documentation includes a short description and the configuration.