Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix default discovery crash on Windows #5202

Merged

Conversation

pjanotti
Copy link
Contributor

@pjanotti pjanotti commented Aug 8, 2024

Replaces #5162

Description:
A bad default on the OTel docker_host (just fixed upstream, see open-telemetry/opentelemetry-collector-contrib#34358) exposed that if a observer extension fails to Start the discovery code still calls ListAndWatch what causes a panic.

Details: the Start of the observer is called via internal/confmapprovider/discovery, but if Start fails it logs a warning and proceeds with the execution. Later internal/receiver/discoveryreceiver uses the host.GetExtensions() and the observer is there since it was created successfully (it did fail to start in a step that the host doesn't have visibility). The extension is cast to an observer.Observable and a go routine calling ListAndWatch is launched, but, then the private that it needs to run are nil because the start failed and then panic.

Right now I am removing the docker_observer from Windows, but, it would be re-enabled as soon as we have it fixed on upstream.

Jira:
https://splunk.atlassian.net/browse/OTL-2883

Testing:
Added 2 new tests and validated manually.

Documentation:
N/A

@pjanotti pjanotti requested review from a team as code owners August 8, 2024 03:04
@pjanotti pjanotti self-assigned this Aug 8, 2024
args: []string{"otelcol", "--discovery", "--config=config/collector/agent_config.yaml"},
timeout: 30 * time.Second,
},
// Running the discovery with --dry-run in CI is not desirable because of the use of os.Exit(0) to end the execution.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--dry-run was originally intended as a helper/utility for the field/sales to capture what services are available but design requirements have changed. I wonder we need this test at all. But we can keep it like this. I do not have strong feelings either way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, I will ask around to see if the need/purpose is clear.

@pjanotti pjanotti merged commit c016bb9 into signalfx:main Aug 9, 2024
133 checks passed
@pjanotti pjanotti deleted the fix-default-discovery-crash-on-windows-01 branch August 9, 2024 20:48
@github-actions github-actions bot locked and limited conversation to collaborators Aug 9, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants