Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[internal/otelarrow] Flaky test disabled: TestIntegrationMemoryLimited #34719

Closed
pjanotti opened this issue Aug 16, 2024 · 9 comments · Fixed by #34889
Closed

[internal/otelarrow] Flaky test disabled: TestIntegrationMemoryLimited #34719

pjanotti opened this issue Aug 16, 2024 · 9 comments · Fixed by #34889

Comments

@pjanotti
Copy link
Contributor

Component(s)

internal/otelarrow

Describe the issue you're reporting

Hit on #34358 see https://github.com/open-telemetry/opentelemetry-collector-contrib/actions/runs/10203882356/job/28231140032?pr=34358#step:6:518

=== FAIL: test TestIntegrationSelfTracing (11.03s)
    e2e_test.go:369: 
        	Error Trace:	D:/a/opentelemetry-collector-contrib/opentelemetry-collector-contrib/internal/otelarrow/test/e2e_test.go:369
        	            				D:/a/opentelemetry-collector-contrib/opentelemetry-collector-contrib/internal/otelarrow/test/e2e_test.go:205
        	            				C:/hostedtoolcache/windows/go/1.21.12/x64/src/runtime/asm_amd64.s:1650
        	Error:      	Received unexpected error:
        	            	rpc error: code = Canceled desc = send wait: context deadline exceeded
        	Test:       	TestIntegrationSelfTracing
    e2e_test.go:369: 
        	Error Trace:	D:/a/opentelemetry-collector-contrib/opentelemetry-collector-contrib/internal/otelarrow/test/e2e_test.go:369
        	            				D:/a/opentelemetry-collector-contrib/opentelemetry-collector-contrib/internal/otelarrow/test/e2e_test.go:205
        	            				C:/hostedtoolcache/windows/go/1.21.12/x64/src/runtime/asm_amd64.s:1650
        	Error:      	Received unexpected error:
        	            	rpc error: code = Canceled desc = send wait: context deadline exceeded
        	Test:       	TestIntegrationSelfTracing
    e2e_test.go:369: 
        	Error Trace:	D:/a/opentelemetry-collector-contrib/opentelemetry-collector-contrib/internal/otelarrow/test/e2e_test.go:369
        	            				D:/a/opentelemetry-collector-contrib/opentelemetry-collector-contrib/internal/otelarrow/test/e2e_test.go:205
        	            				C:/hostedtoolcache/windows/go/1.21.12/x64/src/runtime/asm_amd64.s:1650
        	Error:      	Received unexpected error:
        	            	rpc error: code = Canceled desc = send wait: context deadline exceeded
        	Test:       	TestIntegrationSelfTracing
    e2e_test.go:369: 
        	Error Trace:	D:/a/opentelemetry-collector-contrib/opentelemetry-collector-contrib/internal/otelarrow/test/e2e_test.go:369
        	            				D:/a/opentelemetry-collector-contrib/opentelemetry-collector-contrib/internal/otelarrow/test/e2e_test.go:205
        	            				C:/hostedtoolcache/windows/go/1.21.12/x64/src/runtime/asm_amd64.s:1650
        	Error:      	Received unexpected error:
        	            	rpc error: code = Canceled desc = send wait: context deadline exceeded
        	Test:       	TestIntegrationSelfTracing
    e2e_test.go:369: 
        	Error Trace:	D:/a/opentelemetry-collector-contrib/opentelemetry-collector-contrib/internal/otelarrow/test/e2e_test.go:369
        	            				D:/a/opentelemetry-collector-contrib/opentelemetry-collector-contrib/internal/otelarrow/test/e2e_test.go:205
        	            				C:/hostedtoolcache/windows/go/1.21.12/x64/src/runtime/asm_amd64.s:1650
        	Error:      	Received unexpected error:
        	            	rpc error: code = Canceled desc = send wait: context deadline exceeded
        	Test:       	TestIntegrationSelfTracing
    e2e_test.go:369: 
        	Error Trace:	D:/a/opentelemetry-collector-contrib/opentelemetry-collector-contrib/internal/otelarrow/test/e2e_test.go:369
        	            				D:/a/opentelemetry-collector-contrib/opentelemetry-collector-contrib/internal/otelarrow/test/e2e_test.go:205
        	            				C:/hostedtoolcache/windows/go/1.21.12/x64/src/runtime/asm_amd64.s:1650
        	Error:      	Received unexpected error:
        	            	rpc error: code = Canceled desc = send wait: context deadline exceeded
        	Test:       	TestIntegrationSelfTracing
    e2e_test.go:369: 
        	Error Trace:	D:/a/opentelemetry-collector-contrib/opentelemetry-collector-contrib/internal/otelarrow/test/e2e_test.go:369
        	            				D:/a/opentelemetry-collector-contrib/opentelemetry-collector-contrib/internal/otelarrow/test/e2e_test.go:205
        	            				C:/hostedtoolcache/windows/go/1.21.12/x64/src/runtime/asm_amd64.s:1650
        	Error:      	Received unexpected error:
        	            	rpc error: code = Canceled desc = send wait: context deadline exceeded
        	Test:       	TestIntegrationSelfTracing
    e2e_test.go:369: 
        	Error Trace:	D:/a/opentelemetry-collector-contrib/opentelemetry-collector-contrib/internal/otelarrow/test/e2e_test.go:369
        	            				D:/a/opentelemetry-collector-contrib/opentelemetry-collector-contrib/internal/otelarrow/test/e2e_test.go:205
        	            				C:/hostedtoolcache/windows/go/1.21.12/x64/src/runtime/asm_amd64.s:1650
        	Error:      	Received unexpected error:
        	            	rpc error: code = Canceled desc = send wait: context deadline exceeded
        	Test:       	TestIntegrationSelfTracing
    e2e_test.go:369: 
        	Error Trace:	D:/a/opentelemetry-collector-contrib/opentelemetry-collector-contrib/internal/otelarrow/test/e2e_test.go:369
        	            				D:/a/opentelemetry-collector-contrib/opentelemetry-collector-contrib/internal/otelarrow/test/e2e_test.go:205
        	            				C:/hostedtoolcache/windows/go/1.21.12/x64/src/runtime/asm_amd64.s:1650
        	Error:      	Received unexpected error:
        	            	rpc error: code = Canceled desc = send wait: context deadline exceeded
        	Test:       	TestIntegrationSelfTracing
    e2e_test.go:369: 
        	Error Trace:	D:/a/opentelemetry-collector-contrib/opentelemetry-collector-contrib/internal/otelarrow/test/e2e_test.go:369
        	            				D:/a/opentelemetry-collector-contrib/opentelemetry-collector-contrib/internal/otelarrow/test/e2e_test.go:205
        	            				C:/hostedtoolcache/windows/go/1.21.12/x64/src/runtime/asm_amd64.s:1650
        	Error:      	Received unexpected error:
        	            	rpc error: code = Canceled desc = send wait: context deadline exceeded
        	Test:       	TestIntegrationSelfTracing
    e2e_test.go:272: 
        	Error Trace:	D:/a/opentelemetry-collector-contrib/opentelemetry-collector-contrib/internal/otelarrow/test/e2e_test.go:272
        	            				D:/a/opentelemetry-collector-contrib/opentelemetry-collector-contrib/internal/otelarrow/test/e2e_test.go:418
        	            				D:/a/opentelemetry-collector-contrib/opentelemetry-collector-contrib/internal/otelarrow/test/e2e_test.go:220
        	            				D:/a/opentelemetry-collector-contrib/opentelemetry-collector-contrib/internal/otelarrow/test/e2e_test.go:476
        	Error:      	Not equal: 
        	            	expected: 10000
        	            	actual  : 4664
        	Test:       	TestIntegrationSelfTracing
@pjanotti pjanotti added the needs triage New item requiring triage label Aug 16, 2024
Copy link
Contributor

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@pjanotti
Copy link
Contributor Author

@pjanotti
Copy link
Contributor Author

Since both hits are on Windows /label os:windows

@crobert-1
Copy link
Member

Since both hits are on Windows /label os:windows

FYI: To add a label using automation, the /label message has to be at the beginning of the comment. Source

@jmacd
Copy link
Contributor

jmacd commented Aug 21, 2024

Will take a look.

@jmacd
Copy link
Contributor

jmacd commented Aug 21, 2024

I would like to recommend #34794, and if that fails I'll be glad to disable the test on Windows. Without trying to fix this, I'm not sure how we'd ever resolve it.

@pjanotti
Copy link
Contributor Author

Many test failures in Windows are due to the scheduling and the default time tick resolution being different than *nix. The sleep added in #34794 seems a reasonable try.

@jmacd
Copy link
Contributor

jmacd commented Aug 22, 2024

I have added one Skip to this test, will leave this issue open.

@jmacd jmacd changed the title [internal/otelarrow] Flaky test: TestIntegrationSelfTracing [internal/otelarrow] Flaky test disabled: TestIntegrationMemoryLimited Aug 22, 2024
jpkrohling added a commit that referenced this issue Aug 27, 2024
…34794)

**Description:** Fixes the causes of flakiness in most cases by using a
callback to terminate the test without resorting to sleep statements.
There is still one flaky test that for reasons not understood, does not
pass. Fortunately, it fails in a repeatable way, and I will debug as
part of #34719.

**Link to tracking Issue:**
#34719

---------

Signed-off-by: Juraci Paixão Kröhling <juraci@kroehling.de>
Co-authored-by: Juraci Paixão Kröhling <juraci@kroehling.de>
codeboten added a commit that referenced this issue Sep 6, 2024
**Description:** Restore a skipped test, after understanding the nature
of the problem.

The problem was mostly addressed in
#34794,
which left the test disabled. The test had been flaky because while
testing for an out-of-memory condition, the test could fail for timeout
or other reason. To make the test more reliable, this now waits until at
least one ArrowTraces span has been received by both components. After
one span is available, it checks that the expected log messages are
present on both sides.

**Link to tracking Issue:** 
Fixes #34719.

**Testing:** ✅

---------

Co-authored-by: Curtis Robert <crobert@splunk.com>
Co-authored-by: Alex Boten <223565+codeboten@users.noreply.github.com>
f7o pushed a commit to f7o/opentelemetry-collector-contrib that referenced this issue Sep 12, 2024
…pen-telemetry#34794)

**Description:** Fixes the causes of flakiness in most cases by using a
callback to terminate the test without resorting to sleep statements.
There is still one flaky test that for reasons not understood, does not
pass. Fortunately, it fails in a repeatable way, and I will debug as
part of open-telemetry#34719.

**Link to tracking Issue:**
open-telemetry#34719

---------

Signed-off-by: Juraci Paixão Kröhling <juraci@kroehling.de>
Co-authored-by: Juraci Paixão Kröhling <juraci@kroehling.de>
f7o pushed a commit to f7o/opentelemetry-collector-contrib that referenced this issue Sep 12, 2024
**Description:** Restore a skipped test, after understanding the nature
of the problem.

The problem was mostly addressed in
open-telemetry#34794,
which left the test disabled. The test had been flaky because while
testing for an out-of-memory condition, the test could fail for timeout
or other reason. To make the test more reliable, this now waits until at
least one ArrowTraces span has been received by both components. After
one span is available, it checks that the expected log messages are
present on both sides.

**Link to tracking Issue:** 
Fixes open-telemetry#34719.

**Testing:** ✅

---------

Co-authored-by: Curtis Robert <crobert@splunk.com>
Co-authored-by: Alex Boten <223565+codeboten@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants