-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Flaky test: TestNewExporter_collectorConnectionDiesThenReconnects #1527
Comments
Actually, I see it failing 5% of the time on 1.14 as well. It's just that in one particular CI run, 1.15 failed and 1.14 passed. There are other CI runs where the opposite happens too. |
Possible duplicate of #1524 |
I saw such errors happen in CI again 🙈. Based commit: 8b1be11 --- FAIL: TestNewExporter_collectorConnectionDiesThenReconnects (0.76s)
otlp_integration_test.go:248:
Error Trace: otlp_integration_test.go:248
Error: Received unexpected error:
rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:49312: connect: connection refused"
Test: TestNewExporter_collectorConnectionDiesThenReconnects
FAIL |
The offending test waits for the reconnect loop to re-establish the connection but never yields being the scheduled goroutine. Adding a |
There seems to be a correlation with the testing system. In the past month this error has occurred with the following frequency:
|
I think we can update the mock collector to wrap the underlying TCP listener and signal when a new connection is actually made. This will allow the tests to deterministically wait for the connection of timeout at the CLI specified timeout. |
This comment has been minimized.
This comment has been minimized.
I seem to have resolved this issue in b17df01 I created #1815 as a test runner to validate this. Without this commit, the failure present regularly: https://github.com/open-telemetry/opentelemetry-go/runs/2356749118 With the commit, the tests have succeeded 200+ times without a failure: https://github.com/open-telemetry/opentelemetry-go/pull/1815/checks?check_run_id=2357037389 That commit needs to be cleaned up and I'll submit it as a PR. |
It looks like this issue is not resolved. I'm still seeing failures on my arch linux system as well as the Ubuntu test runners (Go 1.14) |
The grpc |
Initial manual testing of this approach seems to validate the connection itself handles retries and reconnection. It appears this is an approach worth pursuing. |
PoC I plan to fix into a valid PR to remove this test #2329 |
Reporting to be failing about %5 of the time on Go 1.15 but not seen on 1.14.
Version tested: e50a1c8
The text was updated successfully, but these errors were encountered: