Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[cmd/opampsupervisor] Report bad remote config to OpAMP server #21079

Open
Tracked by #24327
evan-bradley opened this issue Apr 19, 2023 · 2 comments
Open
Tracked by #24327

[cmd/opampsupervisor] Report bad remote config to OpAMP server #21079

evan-bradley opened this issue Apr 19, 2023 · 2 comments
Assignees
Labels
cmd/opampsupervisor enhancement New feature or request never stale Issues marked with this label will be never staled and automatically removed

Comments

@evan-bradley
Copy link
Contributor

Component(s)

No response

Is your feature request related to a problem? Please describe.

Currently, the Supervisor only tries to compose the effective configuration for the Collector before reporting that remote configuration has been applied. We should instead check to make sure that the Collector successfully starts or fails to start before reporting that it has been successfully applied or failed respectively.

Describe the solution you'd like

Wait to receive a healthcheck or crash from the Collector before reporting the final remote configuration status to the server. The Supervisor can still report that it is applying the remote configuration as it composes the effective configuration and signals the Collector to load the new configuration.

It should be clarified whether changing telemetry/other connection settings should also trigger the Supervisor to report that a remote config has been applied.

I'm considering all Collector pipelines starting to be a successfully applied configuration even if the configuration may have bugs while handling telemetry records. We could consider using a configurable waiting period or number of healthchecks to be required before declaring the configuration as applied.

Describe alternatives you've considered

No response

Additional context

No response

@evan-bradley evan-bradley added the enhancement New feature or request label Apr 19, 2023
@github-actions
Copy link
Contributor

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

@github-actions github-actions bot added the Stale label Jun 19, 2023
@mwear mwear removed the Stale label Jun 21, 2023
@evan-bradley evan-bradley added the never stale Issues marked with this label will be never staled and automatically removed label Jul 17, 2023
@srikanthccv
Copy link
Member

I can pick this up.

djaglowski pushed a commit that referenced this issue Nov 5, 2024
… to report remote config status (#34907)

**Description:** 

This pull request addresses the remote config status reporting issue
discussed in #21079 by introducing the following options to the Agent
config:

1. `config_apply_timeout`: config update is successful if we receive a
healthy status and then observe no failure updates for the entire
duration of the timeout period; otherwise, failure is reported.

**Link to tracking Issue:** #21079

**Testing:** Added e2e test

**Documentation:** <Describe the documentation added.>
michael-burt pushed a commit to michael-burt/opentelemetry-collector-contrib that referenced this issue Nov 7, 2024
… to report remote config status (open-telemetry#34907)

**Description:** 

This pull request addresses the remote config status reporting issue
discussed in open-telemetry#21079 by introducing the following options to the Agent
config:

1. `config_apply_timeout`: config update is successful if we receive a
healthy status and then observe no failure updates for the entire
duration of the timeout period; otherwise, failure is reported.

**Link to tracking Issue:** open-telemetry#21079

**Testing:** Added e2e test

**Documentation:** <Describe the documentation added.>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cmd/opampsupervisor enhancement New feature or request never stale Issues marked with this label will be never staled and automatically removed
Projects
None yet
Development

No branches or pull requests

3 participants