Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RemoteTap Extension: a new start #34096

Open
wildum opened this issue Jul 16, 2024 · 8 comments
Open

RemoteTap Extension: a new start #34096

wildum opened this issue Jul 16, 2024 · 8 comments
Labels

Comments

@wildum
Copy link
Contributor

wildum commented Jul 16, 2024

Component(s)

extension/remotetap

Is your feature request related to a problem? Please describe.

The RemoteTap extension was abandoned because the first implementation was not merged. The extension has been left in a skeleton state since its beginning 9 months ago.

The big advantage of this component compared to the remotetap processor is that the user does not need to modify the pipeline to see the data at any stages.

In Alloy (Grafana's Opentelemetry collector distribution), we implemented a similar feature called live debugging. Observability is not an easy topic and we believe this feature is a big step in making it more accessible.

Describe the solution you'd like

The extension could maintain a registry where components can register themselves. Additionally, it could expose an endpoint that allows users to select from these registered components.

Processors would register on start to the extension and publish data to it after every processing (some could publish the data also before if relevant).
Receivers could also register to it and publish data before sending it to the next consumer.

Components would only publish data when a remotetap stream is open to prevent any unnecessary computing

Describe alternatives you've considered

No response

Additional context

I would be happy to use my experience with live debugging in Alloy to contribute to the implementation of this feature.

@atoulme

@wildum wildum added enhancement New feature or request needs triage New item requiring triage labels Jul 16, 2024
Copy link
Contributor

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@atoulme
Copy link
Contributor

atoulme commented Jul 16, 2024

The extension actually requires the remotetap processor to be deployed, and lists them in its configuration.

Whatever you can bring forward that helps is most welcome. Please feel free to expand on your proposal, given the component model of the collector. In particular, can you please describe the config yaml for this development?

@wildum
Copy link
Contributor Author

wildum commented Jul 17, 2024

I would like the extension not to depend on the remotetap processor with the idea to deprecate the remotetap processor once the extension is mature enough.

The config for the extension would only contain "confighttp.ServerConfig `mapstructure:",squash"" and would look like this in yaml:

extensions:
  remotetap:
    endpoint: localhost:12001

If you need to modify the config of a faulty collector to debug it, you run the risk to break it completely or to make the problem go away by reloading the config. Having the possibility to remote tap any processor/receiver of a running collector without any disruptions is a massive +.

I believe that with the following design, users might be happy to keep the extension even in prod environments because it's not directly part of their pipelines and it does not impact performances:

On start:

  • remote tap extension starts the server
  • processors/receivers that support the feature register on start to the extensions via host.GetExtensions()["remotetap"].Register(componentID)

Let's say that the user wants to remote tap the component "metricsgeneration" processor:

  • the user sends the request localhost:12001/metricsgeneration via the UI
  • the remote tap extension receives the request, keeps the connection open and tracks it in a map
  • the metricsgeneration processor checks after every processing if a connection is open via host.GetExtensions()["remotetap"].IsActive(componentID)
  • if a connection is active, the processor will publish the metrics to the extension via host.GetExtensions()["remotetap"].PublishMetrics(componentID, md Metrics)
  • the extension will marshall the metrics into text and send the data over the open connection to the front-end

The interface for the components to interact with the extension would be the following:

type RemoteTapPublisher interface {
        // Register the component to the RemoteTap extension
        Register(componentID)
	// IsActive returns true when at least one connection is open for the given componentID.
	IsActive(componentID ComponentID) bool
	// PublishMetrics sends metrics for a given componentID to the RemoteTap extension.
	PublishMetrics(componentID ComponentID, md pmetric.Metrics)
	// PublishTraces sends traces for a given componentID to the RemoteTap extension.
	PublishTraces(componentID ComponentID, td ptrace.Traces)
	// PublishLogs sends logs for a given componentID to the RemoteTap extension.
	PublishLogs(componentID ComponentID, ld plog.Logs)
	// PublishData sends data for a given componentID to the RemoteTap extension.
	PublishData(componentID ComponentID, data string)
}

The UI should contain some basic controls to make debugging easier:

  • sample rate to handle heavy loads
  • filter
  • pause/resume the stream
  • clear the page

What do you think?
I could try a POC with the support for one processor and a very basic UI.
If people are happy with it we could gradually extend it to more components and improve the UI.

@jaronoff97
Copy link
Contributor

@wildum I'd be happy to assist in reviewing and working on this. I am currently using the remotetap processor for exactly this:

  • sample rate to handle heavy loads
  • filter
  • pause/resume the stream
  • clear the page

In my tails project

@djaglowski
Copy link
Member

In my opinion the ideal solution for this would be more deeply integrated into the collector so that individual component developers do not need to be concerned with managing it, and so that performance and correctness concerns are handled in a uniform way.

Roughly the following:

  • When the user has not enabled data tapping, there is no additional work done within the collector vs today.
  • When data is tapped, it respects consumer.Capabilities to make copies as appropriate.
  • Receiver outputs, processor inputs and outputs, and exporter outputs may be tapped.
  • Tapped data passes through an interface which may eventually have multiple implementations (e.g. as local web service, OpAMP, any exporter)

I would also point out that if we ever land open-telemetry/opentelemetry-collector#9077, then this becomes a trivial problem where the solution is just adding one more exporter that subscribes to all data producers.

@wildum
Copy link
Contributor Author

wildum commented Aug 26, 2024

Hey @atoulme @djaglowski @jaronoff97, following the SIG meeting (and a week off), I worked on a 2nd POC.
As discussed, this time I implemented this concept in the core repository using the processorhelper pkg.
You can find the new POC here: open-telemetry/opentelemetry-collector#10962
And a clean lightweight version of the concept here: open-telemetry/opentelemetry-collector#10963

Please have a look at the concept branch when you have time and let me know what you think :)

@wildum
Copy link
Contributor Author

wildum commented Aug 27, 2024

Following @atoulme's comment, I moved the component back to contrib and kept only the changes related to the processor helper in core. @djaglowski @jaronoff97
Here are the new relevant links:

Copy link
Contributor

github-actions bot commented Dec 2, 2024

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the Stale label Dec 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants