[internal/otelarrow] Resolve test flakes; skip one still-flaky test #34794

jmacd · 2024-08-21T17:59:21Z

Description: Fixes the causes of flakiness in most cases by using a callback to terminate the test without resorting to sleep statements. There is still one flaky test that for reasons not understood, does not pass. Fortunately, it fails in a repeatable way, and I will debug as part of #34719.

Link to tracking Issue: #34719

mwear

Can we leave the test active in non-windows environments. Something like:

if runtime.GOOS == "windows" {
		t.Skip("skipping flaky test on Windows, see https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/34719")
	}

jmacd · 2024-08-21T20:14:44Z

This test is failing for other reasons in my local environment. The local failure was an expectation that more than one stream was used, and I find the test wasn't controlling for that variable. I introduced retries for thoroughness and a tiny sleep call to ensure the expected >1 streams takes place. It can be no worse for the Windows problem, and I see it has passed at least once on Windows (from the history above, though see other test flakes on Windows).

mwear

The failing test is an instance of: #34792

pjanotti

I'm ignoramus about the internals of this component, but, when testing in Windows, adding a sleep can be advisable given the differences in scheduling and time tick resolution in relation *nix where most devs run their tests. LGTM.

internal/otelarrow/test/e2e_test.go

crobert-1 · 2024-08-22T20:58:50Z

I've submitted #34823 for the receiver/datadog test failures, unrelated to this change.

jmacd · 2024-08-22T23:53:02Z

There is still a flaky test, the problem appears to be a real problem combined with a slower test start than on the non-windows host. I have added a Skip() and will leave #34719 open to cover fixing it.

jmacd · 2024-08-23T13:43:02Z

The windows/internal test group passes with my changes. The other tests are still preventing a ✅. Hoping to unblock the release and CI process, I left one Skip() in here, which I will investigate next week.

jmacd · 2024-08-26T15:00:43Z

I recommend we merge this as-is, because (a) it will take some time to properly fix the one skipped test, (b) I have four PRs open in this repository and it becomes difficult to move anything in this situation.

jpkrohling

Nice, thanks!

Signed-off-by: Juraci Paixão Kröhling <juraci@kroehling.de>

**Description:** Restore a skipped test, after understanding the nature of the problem. The problem was mostly addressed in #34794, which left the test disabled. The test had been flaky because while testing for an out-of-memory condition, the test could fail for timeout or other reason. To make the test more reliable, this now waits until at least one ArrowTraces span has been received by both components. After one span is available, it checks that the expected log messages are present on both sides. **Link to tracking Issue:** Fixes #34719. **Testing:** ✅ --------- Co-authored-by: Curtis Robert <crobert@splunk.com> Co-authored-by: Alex Boten <223565+codeboten@users.noreply.github.com>

…pen-telemetry#34794) **Description:** Fixes the causes of flakiness in most cases by using a callback to terminate the test without resorting to sleep statements. There is still one flaky test that for reasons not understood, does not pass. Fortunately, it fails in a repeatable way, and I will debug as part of open-telemetry#34719. **Link to tracking Issue:** open-telemetry#34719 --------- Signed-off-by: Juraci Paixão Kröhling <juraci@kroehling.de> Co-authored-by: Juraci Paixão Kröhling <juraci@kroehling.de>

**Description:** Restore a skipped test, after understanding the nature of the problem. The problem was mostly addressed in open-telemetry#34794, which left the test disabled. The test had been flaky because while testing for an out-of-memory condition, the test could fail for timeout or other reason. To make the test more reliable, this now waits until at least one ArrowTraces span has been received by both components. After one span is available, it checks that the expected log messages are present on both sides. **Link to tracking Issue:** Fixes open-telemetry#34719. **Testing:** ✅ --------- Co-authored-by: Curtis Robert <crobert@splunk.com> Co-authored-by: Alex Boten <223565+codeboten@users.noreply.github.com>

jmacd added the Skip Changelog PRs that do not require a CHANGELOG.md entry label Aug 21, 2024

jmacd requested review from a team and andrzej-stencel August 21, 2024 17:59

github-actions bot assigned jpkrohling Aug 21, 2024

github-actions bot added the internal/otelarrow label Aug 21, 2024

github-actions bot requested a review from moh-osman3 August 21, 2024 17:59

jmacd requested a review from pjanotti August 21, 2024 17:59

mwear reviewed Aug 21, 2024

View reviewed changes

crobert-1 added the Run Windows Enable running windows test on a PR label Aug 21, 2024

jmacd changed the title ~~Disable the TestIntegrationSelfTracing test~~ Disable the OTel-Arrow TestIntegrationSelfTracing test Aug 21, 2024

jmacd changed the title ~~Disable the OTel-Arrow TestIntegrationSelfTracing test~~ Reduce flake potential in the OTel-Arrow TestIntegrationSelfTracing test Aug 21, 2024

crobert-1 added the os:windows label Aug 21, 2024

jmacd mentioned this pull request Aug 21, 2024

[internal/otelarrow] Flaky test disabled: TestIntegrationMemoryLimited #34719

Closed

mwear approved these changes Aug 21, 2024

View reviewed changes

pjanotti approved these changes Aug 21, 2024

View reviewed changes

jpkrohling requested changes Aug 22, 2024

View reviewed changes

internal/otelarrow/test/e2e_test.go Outdated Show resolved Hide resolved

jmacd changed the title ~~Reduce flake potential in the OTel-Arrow TestIntegrationSelfTracing test~~ Fix test flakes in the internal/otelarrow Aug 22, 2024

jmacd marked this pull request as draft August 22, 2024 18:57

jmacd changed the title ~~Fix test flakes in the internal/otelarrow~~ Fix test flakes in internal/otelarrow Aug 22, 2024

github-actions bot added the receiver/otelarrow label Aug 22, 2024

jmacd changed the title ~~Fix test flakes in internal/otelarrow~~ Resolve test flakes in internal/otelarrow; skip one still-flaky test Aug 22, 2024

jmacd marked this pull request as ready for review August 23, 2024 00:53

github-actions bot assigned dmitryax Aug 23, 2024

jmacd requested a review from jpkrohling August 26, 2024 21:10

jpkrohling approved these changes Aug 27, 2024

View reviewed changes

jpkrohling removed processor/geoip exporter/otelarrow receiver/otelarrow labels Aug 27, 2024

jpkrohling removed request for atoulme, bogdandrutu, dashpole, tigrannajaryan, djaglowski, mx-psi, dmitryax, songy23, evan-bradley, TylerHelmuth, MovieStoreGuy and crobert-1 August 27, 2024 09:16

fix missing go.mod

cf561c9

Signed-off-by: Juraci Paixão Kröhling <juraci@kroehling.de>

github-actions bot added the cmd/otelcontribcol otelcontribcol command label Aug 27, 2024

fix other gosums...

78762c8

Signed-off-by: Juraci Paixão Kröhling <juraci@kroehling.de>

github-actions bot added exporter/otelarrow receiver/otelarrow labels Aug 27, 2024

github-actions bot requested a review from lquerel August 27, 2024 09:28

Merge branch 'main' into jmacd/arrowflake

0069b3e

jpkrohling changed the title ~~Resolve test flakes in internal/otelarrow; skip one still-flaky test~~ [internal/otelarrow] Resolve test flakes; skip one still-flaky test Aug 27, 2024

jpkrohling merged commit 5c9325e into open-telemetry:main Aug 27, 2024
172 checks passed

github-actions bot added this to the next release milestone Aug 27, 2024

jmacd mentioned this pull request Aug 27, 2024

[internal/otelarrow] Fix test flake (for 34719) #34889

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[internal/otelarrow] Resolve test flakes; skip one still-flaky test #34794

[internal/otelarrow] Resolve test flakes; skip one still-flaky test #34794

jmacd commented Aug 21, 2024 •

edited

Loading

mwear left a comment

jmacd commented Aug 21, 2024

mwear left a comment

pjanotti left a comment

crobert-1 commented Aug 22, 2024

jmacd commented Aug 22, 2024

jmacd commented Aug 23, 2024 •

edited

Loading

jmacd commented Aug 26, 2024

jpkrohling left a comment

[internal/otelarrow] Resolve test flakes; skip one still-flaky test #34794

[internal/otelarrow] Resolve test flakes; skip one still-flaky test #34794

Conversation

jmacd commented Aug 21, 2024 • edited Loading

mwear left a comment

Choose a reason for hiding this comment

jmacd commented Aug 21, 2024

mwear left a comment

Choose a reason for hiding this comment

pjanotti left a comment

Choose a reason for hiding this comment

crobert-1 commented Aug 22, 2024

jmacd commented Aug 22, 2024

jmacd commented Aug 23, 2024 • edited Loading

jmacd commented Aug 26, 2024

jpkrohling left a comment

Choose a reason for hiding this comment

jmacd commented Aug 21, 2024 •

edited

Loading

jmacd commented Aug 23, 2024 •

edited

Loading