-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[internal/otelarrow] Resolve test flakes; skip one still-flaky test #34794
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we leave the test active in non-windows environments. Something like:
if runtime.GOOS == "windows" {
t.Skip("skipping flaky test on Windows, see https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/34719")
}
This test is failing for other reasons in my local environment. The local failure was an expectation that more than one stream was used, and I find the test wasn't controlling for that variable. I introduced retries for thoroughness and a tiny sleep call to ensure the expected >1 streams takes place. It can be no worse for the Windows problem, and I see it has passed at least once on Windows (from the history above, though see other test flakes on Windows). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The failing test is an instance of: #34792
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm ignoramus about the internals of this component, but, when testing in Windows, adding a sleep can be advisable given the differences in scheduling and time tick resolution in relation *nix where most devs run their tests. LGTM.
I've submitted #34823 for the |
There is still a flaky test, the problem appears to be a real problem combined with a slower test start than on the non-windows host. I have added a |
The windows/internal test group passes with my changes. The other tests are still preventing a ✅. Hoping to unblock the release and CI process, I left one |
I recommend we merge this as-is, because (a) it will take some time to properly fix the one skipped test, (b) I have four PRs open in this repository and it becomes difficult to move anything in this situation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, thanks!
Signed-off-by: Juraci Paixão Kröhling <juraci@kroehling.de>
Signed-off-by: Juraci Paixão Kröhling <juraci@kroehling.de>
**Description:** Restore a skipped test, after understanding the nature of the problem. The problem was mostly addressed in #34794, which left the test disabled. The test had been flaky because while testing for an out-of-memory condition, the test could fail for timeout or other reason. To make the test more reliable, this now waits until at least one ArrowTraces span has been received by both components. After one span is available, it checks that the expected log messages are present on both sides. **Link to tracking Issue:** Fixes #34719. **Testing:** ✅ --------- Co-authored-by: Curtis Robert <crobert@splunk.com> Co-authored-by: Alex Boten <223565+codeboten@users.noreply.github.com>
…pen-telemetry#34794) **Description:** Fixes the causes of flakiness in most cases by using a callback to terminate the test without resorting to sleep statements. There is still one flaky test that for reasons not understood, does not pass. Fortunately, it fails in a repeatable way, and I will debug as part of open-telemetry#34719. **Link to tracking Issue:** open-telemetry#34719 --------- Signed-off-by: Juraci Paixão Kröhling <juraci@kroehling.de> Co-authored-by: Juraci Paixão Kröhling <juraci@kroehling.de>
**Description:** Restore a skipped test, after understanding the nature of the problem. The problem was mostly addressed in open-telemetry#34794, which left the test disabled. The test had been flaky because while testing for an out-of-memory condition, the test could fail for timeout or other reason. To make the test more reliable, this now waits until at least one ArrowTraces span has been received by both components. After one span is available, it checks that the expected log messages are present on both sides. **Link to tracking Issue:** Fixes open-telemetry#34719. **Testing:** ✅ --------- Co-authored-by: Curtis Robert <crobert@splunk.com> Co-authored-by: Alex Boten <223565+codeboten@users.noreply.github.com>
**Description:** Restore a skipped test, after understanding the nature of the problem. The problem was mostly addressed in open-telemetry#34794, which left the test disabled. The test had been flaky because while testing for an out-of-memory condition, the test could fail for timeout or other reason. To make the test more reliable, this now waits until at least one ArrowTraces span has been received by both components. After one span is available, it checks that the expected log messages are present on both sides. **Link to tracking Issue:** Fixes open-telemetry#34719. **Testing:** ✅ --------- Co-authored-by: Curtis Robert <crobert@splunk.com> Co-authored-by: Alex Boten <223565+codeboten@users.noreply.github.com>
Description: Fixes the causes of flakiness in most cases by using a callback to terminate the test without resorting to sleep statements. There is still one flaky test that for reasons not understood, does not pass. Fortunately, it fails in a repeatable way, and I will debug as part of #34719.
Link to tracking Issue: #34719