Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] node drop on o.o.cluster.routing.allocation.decider.MockDiskUsagesIT.testRerouteOccursOnDiskPassingHighWatermark #1907

Closed
nknize opened this issue Jan 14, 2022 · 1 comment
Assignees
Labels
flaky-test Random test failure that succeeds on second run >test-failure Test failure from CI, local build, etc.

Comments

@nknize
Copy link
Collaborator

nknize commented Jan 14, 2022

Describe the bug
Caught on PR #1902 . Another failure that can't be reproduced! (╯°□°)╯︵ ┻━┻

REPRODUCE WITH: ./gradlew ':server:internalClusterTest' --tests "org.opensearch.cluster.routing.allocation.decider.MockDiskUsagesIT.testRerouteOccursOnDiskPassingHighWatermark" -Dtests.seed=994DB46D3A71E388 -Dtests.security.manager=true -Dtests.jvm.argline="-XX:TieredStopAtLevel=1 -XX:ReservedCodeCacheSize=64m" -Dtests.locale=pt -Dtests.timezone=Asia/Tel_Aviv -Druntime.java=17

Looks like a node timeout issue at MockDiskUsagesIT.java#L166

1> [2022-01-14T19:03:26,620][WARN ][o.o.c.NodeConnectionsService] [node_t1] failed to connect to {node_t0}{C3fT4Fp9SjmjuepSij5_0Q}{IqnPkDZ7TZu52WOm_HWKOA}{127.0.0.1}{127.0.0.1:43583}{dimr}{shard_indexing_pressure_enabled=true} (tried [1] times)
  1> org.opensearch.transport.ConnectTransportException: [node_t0][127.0.0.1:43583] connect_exception

Gave up after one try... valiant effort (。々°)

To Reproduce
Another failure that can't be reproduced! (╯°□°)╯︵ ┻━┻

Expected behavior
No node drops...happy WIFi happy LIFi

Plugins
All core opensearch

Screenshots

Host/Environment (please complete the following information):

Additional context
Add any other context about the problem here.

relates #1715

@nknize nknize added >test-failure Test failure from CI, local build, etc. v2.0.0 Version 2.0.0 untriaged labels Jan 14, 2022
@anasalkouz anasalkouz added flaky-test Random test failure that succeeds on second run and removed untriaged labels Jan 18, 2022
@anasalkouz anasalkouz removed the v2.0.0 Version 2.0.0 label Apr 12, 2022
@minalsha minalsha assigned gauravruhela and unassigned owaiskazi19 Nov 15, 2023
@ankitkala
Copy link
Member

Not able to reproduce even after 20K iterations.
Given the issue was very generic(node timeout) & not specific to the test, resolving this as one-off case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
flaky-test Random test failure that succeeds on second run >test-failure Test failure from CI, local build, etc.
Projects
None yet
Development

No branches or pull requests

5 participants