Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flaky Test - prometheusexporter.TestEndToEndSummarySupport #8365

Closed
dmitryax opened this issue Mar 9, 2022 · 2 comments · Fixed by #9244
Closed

Flaky Test - prometheusexporter.TestEndToEndSummarySupport #8365

dmitryax opened this issue Mar 9, 2022 · 2 comments · Fixed by #9244
Labels
bug Something isn't working comp: exporter Exporter comp:prometheus Prometheus related issues flaky test a test is flaky

Comments

@dmitryax
Copy link
Member

dmitryax commented Mar 9, 2022

Seen here.

=== RUN   TestEndToEndSummarySupport
    end_to_end_test.go:181: Left-over unmatched Prometheus scrape content: "test_scrape_samples_post_metric_relabeling 0 1646857445910test_scrape_samples_scraped 0 1646857445910test_scrape_series_added 0 1646857445910test_up 0 1646857445910"
--- FAIL: TestEndToEndSummarySupport (5.11s)
@dmitryax dmitryax added bug Something isn't working comp:prometheus Prometheus related issues comp: exporter Exporter flaky test a test is flaky labels Mar 9, 2022
@jpkrohling
Copy link
Member

@gouthamve
Copy link
Member

gouthamve commented Apr 13, 2022

So, debugging this (with some debug logging config added):

❯ go test . -run=TestEndToEndSummarySupport -count=1 -v
=== RUN   TestEndToEndSummarySupport
{"level":"debug","msg":"Starting provider","provider":"static/0","subs":"map[otel-collector:{}]"}
{"level":"debug","msg":"Discoverer channel closed","provider":"static/0"}
{"level":"debug","msg":"Scrape failed","scrape_pool":"otel-collector","target":"http://127.0.0.1:62688/metrics","error":"Get \"http://127.0.0.1:62688/metrics\": context deadline exceeded"}
--- PASS: TestEndToEndSummarySupport (5.08s)

Basically we scrape a test server 8 times with a scrape-interval of 10ms, but the global scrape interval is 2ms, so the scrape timeout is also set to 2ms. See: https://github.com/prometheus/prometheus/blob/fb2da1f26aec023b1e3c864222aaccbe01969f11/config/config.go#L382-L388 and https://github.com/prometheus/prometheus/blob/fb2da1f26aec023b1e3c864222aaccbe01969f11/config/config.go#L297-L303

If the 8th scrape fails, then we fail the test with the error:

=== RUN   TestEndToEndSummarySupport
    end_to_end_test.go:181: Left-over unmatched Prometheus scrape content: "test_scrape_samples_post_metric_relabeling 0 1649790752438test_scrape_samples_scraped 0 1649790752438test_scrape_series_added 0 1649790752438test_up 0 1649790752438"
--- FAIL: TestEndToEndSummarySupport (5.11s)

Proof (I added a fmt.Println("currentScrapeIndex", currentScrapeIndex)):

=== RUN   TestEndToEndSummarySupport
{"level":"debug","msg":"Starting provider","provider":"static/0","subs":"map[otel-collector:{}]"}
{"level":"debug","msg":"Discoverer channel closed","provider":"static/0"}
{"level":"debug","msg":"Scrape failed","scrape_pool":"otel-collector","target":"http://127.0.0.1:62815/metrics","error":"Get \"http://127.0.0.1:62815/metrics\": context deadline exceeded"}
currentScrapeIndex 1
currentScrapeIndex 2
currentScrapeIndex 3
currentScrapeIndex 4
currentScrapeIndex 5
currentScrapeIndex 6
currentScrapeIndex 7
currentScrapeIndex 8
{"level":"debug","msg":"Scrape failed","scrape_pool":"otel-collector","target":"http://127.0.0.1:62815/metrics","error":"Get \"http://127.0.0.1:62815/metrics\": context deadline exceeded"}
    end_to_end_test.go:182: Left-over unmatched Prometheus scrape content: "test_scrape_samples_post_metric_relabeling 0 1649814848841test_scrape_samples_scraped 0 1649814848841test_scrape_series_added 0 1649814848841test_up 0 1649814848841"
--- FAIL: TestEndToEndSummarySupport (5.46s)

I propose increasing the timeout to 100ms. Sending a PR for the same.

gouthamve added a commit to gouthamve/opentelemetry-collector-contrib that referenced this issue Apr 13, 2022
Fixes open-telemetry#8365

See: open-telemetry#8365 (comment)

Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>
jpkrohling pushed a commit that referenced this issue Apr 13, 2022
Fixes #8365

See: #8365 (comment)

Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working comp: exporter Exporter comp:prometheus Prometheus related issues flaky test a test is flaky
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants