Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Google Pubsub as an OTLP exporter/receiver #1892

Closed
wants to merge 2 commits into from

Conversation

alexvanboxel
Copy link
Contributor

Description:
Allows export and reception of OTLP data over Google Pubsub. With the ability to set subscriptions and topics for each of the components.

Link to tracking Issue:
#1802

Testing:
Unit tests are included

Documentation:
The documentation includes a short description and the configuration.

@alexvanboxel alexvanboxel requested a review from a team December 22, 2020 20:25
@codecov
Copy link

codecov bot commented Dec 22, 2020

Codecov Report

Merging #1892 (bc8c7fd) into main (9677f5b) will decrease coverage by 0.66%.
The diff coverage is 99.05%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #1892      +/-   ##
==========================================
- Coverage   90.70%   90.04%   -0.67%     
==========================================
  Files         399      385      -14     
  Lines       19802    18730    -1072     
==========================================
- Hits        17961    16865    -1096     
- Misses       1381     1394      +13     
- Partials      460      471      +11     
Flag Coverage Δ
integration 69.73% <0.00%> (+0.46%) ⬆️
unit 88.77% <99.05%> (-0.74%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
receiver/gcloudpubsubreceiver/receiver.go 97.18% <97.18%> (ø)
cmd/otelcontribcol/components.go 86.48% <100.00%> (-0.53%) ⬇️
exporter/gcloudpubsubexporter/exporter.go 100.00% <100.00%> (ø)
exporter/gcloudpubsubexporter/factory.go 100.00% <100.00%> (ø)
receiver/gcloudpubsubreceiver/factory.go 100.00% <100.00%> (ø)
exporter/datadogexporter/metrics_exporter.go 45.45% <0.00%> (-54.55%) ⬇️
receiver/redisreceiver/redis_metric.go 63.15% <0.00%> (-36.85%) ⬇️
exporter/sumologicexporter/exporter.go 50.00% <0.00%> (-31.92%) ⬇️
exporter/datadogexporter/metadata/metadata.go 60.78% <0.00%> (-28.93%) ⬇️
exporter/datadogexporter/metadata/ec2/ec2.go 30.00% <0.00%> (-17.50%) ⬇️
... and 92 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9677f5b...85ce288. Read the comment docs.

@alexvanboxel alexvanboxel force-pushed the feature/pubsub branch 2 times, most recently from a1c97ed to fb808a5 Compare December 23, 2020 11:09
@alexvanboxel
Copy link
Contributor Author

I'm going to split this PR up in different PR's as the CONTRIBUTING suggests. I'll keep this one open for a while.

Copy link
Contributor

@anuraaga anuraaga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did an initial skim, in general looks good. Do you mind splitting this up per these guidelines for a new component?

https://github.com/open-telemetry/opentelemetry-collector/blob/master/CONTRIBUTING.md#when-adding-a-new-component


type Config struct {
configmodels.ExporterSettings `mapstructure:",squash"` // squash ensures fields are correctly decoded in embedded struct.
ProjectID string `mapstructure:"project"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add some comments. Endpoint in particular needs something on when it would be set (probably very rarely by normal users)

The following configuration options are supported:

* `project` (Required): The Google Cloud Project of the topics.
* `validate_existence`(Optional): Checks the existence of the topic, but this requires admin permissions to validate
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to list out the actual permission names required, both for normal use and when setting this flag.

Also I would consider removing this flag as it might just complicate things and failing when sending messages isn't that bad, many of our exporters work that way.

Copy link
Member

@gramidt gramidt Dec 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check is interesting. This verification could make sense to be implemented on exporter startup (how you implemented it), as this would have a fast feedback loop and prevent trying to send data that is going to fail. However, topics can be created at anytime. Since the collector uses a static config, would it be safe to assume that the topic name is known and has been created prior to configuring a collector? Or is it best to assume a fully asynchronous approach and assume a topic may exist at some time in the future?


func (ex *pubsubExporter) ConsumeTraces(ctx context.Context, td pdata.Traces) error {
bytes, _ := td.ToOtlpProtoBytes()
message := &pubsub.Message{Data: bytes}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think we should put a PR in for the opentelemetry-specification to document how OTLP interacts with Pub/Sub? While it's straightforward to push the bytes into Data, there could still be different interpretations of how Pub/Sub is supposed to work so we probably want to make sure they're unified across implementations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @anuraaga. I'll make one comment in reply to your initial review. Thanks, I needed a second eye on this.

  • I will add it to the spec. I'll ask in gitter to see if I should do a PR or an OTEP first. I see that Kafka does it the same way as I (sending the Request), so that's good.
  • I will split up the PR but will keep this one open in the meantime (I converted it to draft), I want to have someone from Google involved todo a quick scan.
  • All fix your concerns.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking forward to reviewing the spec you draft up, @alexvanboxel! 📘


This exporter sends OTLP messages to a Google Cloud [Pubsub](https://cloud.google.com/pubsub) topic.

The following configuration options are supported:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing documentation for user agent

"context"
"fmt"

"cloud.google.com/go/pubsub"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shopify has been running a Google PubSub (trace) exporter in production for almost a year. The largest improvement in stability came from switching to the lower level package cloud.google.com/go/pubsub/apiv1. The high level package used here provides an asynchronous Publish method that does not allow back pressure in the collector, and limits the usefulness of the memorylimiter processor, queued retry, etc. The lower level package provides a synchronous Publish call, which is more appropriate for the collector.

cc @ibawt

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @fbogsany I will certainly rewrite this as I was a bit struggling with this. I'm writing the OTEP now and will refer to this in this thread.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shopify has been running a Google PubSub (trace) exporter in production for almost a year. The largest improvement in stability came from switching to the lower level package cloud.google.com/go/pubsub/apiv1. The high level package used here provides an asynchronous Publish method that does not allow back pressure in the collector, and limits the usefulness of the memorylimiter processor, queued retry, etc. The lower level package provides a synchronous Publish call, which is more appropriate for the collector.

cc @ibawt

At Shopify do you push the complete OTLP Request over PubSub or are you unraffling the content. Let me elaborate: For spans, the ExportTraceServiceRequest contains a repeated ResourceSpans. The request is already a mini-batch. I'm going to propose to put the ResourceSpans on Pubsub (or any message bus).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We publish a full ExportTraceServiceRequest. We treat PubSub as a transport for OTLP, which makes things easier for downstream consumers. In particular, we can reuse code between the collector and the consumer. For example, we have an exporter from PubSub to GCS (Avro) that we'll eventually move into the collector, so it helps if that receives the data it would receive if it were in the collector instead.

Specifically, for traces pdata.Traces, we Publish the gzipped bytes returned from traces.ToOtlpProtoBytes().

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While it's a big effort, I do wish that OTLP had a FullSpan or similar type which was a normalized span with resource / instrumentation library built in. Having the ability to pack a self-contained span can help with cases like message queues, downstream processing like metrics updating, or even updating a trace with a new span, generally model better with span-per-message when dealing with retries and such.

@github-actions
Copy link
Contributor

This PR was marked stale due to lack of activity. It will be closed in 7 days.

@github-actions github-actions bot added the Stale label Jan 22, 2021
@alexvanboxel
Copy link
Contributor Author

alexvanboxel commented Jan 22, 2021

This PR was marked stale due to lack of activity. It will be closed in 7 days.

Commenting... as this is a long-running branch. First OTEP needs to be written.

@anuraaga anuraaga removed the Stale label Jan 22, 2021
Base automatically changed from master to main January 28, 2021 00:57
Allows export and reception of OTLP data over Google Pubsub. With the
ability to set subscriptions and topics for each of the components.
@github-actions
Copy link
Contributor

This PR was marked stale due to lack of activity. It will be closed in 7 days.

@github-actions github-actions bot added the Stale label Feb 17, 2021
@github-actions
Copy link
Contributor

Closed as inactive. Feel free to reopen if this PR is still being worked on.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants