Skip to content

[CI] FsHealthServiceTests.testFailsHealthOnHungIOBeyondHealthyTimeout #1450

Closed
@nknize

Description

Test failure introduced in #1167 was caught on PR #1440. Looks like a race condition as this test juggles a lot of timing variables? I think this whole test implementation needs to be simplified (perhaps a separate issue).

REPRODUCE WITH: ./gradlew ':server:test' --tests "org.opensearch.monitor.fs.FsHealthServiceTests.testFailsHealthOnHungIOBeyondHealthyTimeout" -Dtests.seed=58869CBBC128ED5C -Dtests.security.manager=true -Dtests.jvm.argline="-XX:TieredStopAtLevel=1 -XX:ReservedCodeCacheSize=64m" -Dtests.locale=uk-UA -Dtests.timezone=America/Kralendijk -Druntime.java=17
Suite: Test class org.opensearch.monitor.fs.FsHealthServiceTests
  1> [2021-10-26T13:58:12,070][INFO ][o.o.e.NodeEnvironment    ] [testLoggingOnHungIO] using [1] data paths, mounts [[/ (/dev/root)]], net usable_space [68.3gb], net total_space [96.8gb], types [ext4]
  1> [2021-10-26T13:58:12,086][INFO ][o.o.e.NodeEnvironment    ] [testLoggingOnHungIO] heap size [512mb], compressed ordinary object pointers [true]
  1> [2021-10-26T13:58:12,397][WARN ][o.o.m.f.FsHealthService  ] [testLoggingOnHungIO] health check of [/var/CITOOL/workflow/OpenSearch_CI/PR_Checks/Gradle_Check/search/server/build/testrun/test/temp/org.opensearch.monitor.fs.FsHealthServiceTests_58869CBBC128ED5C-001/tempDir-003/nodes/0] took [401ms] which is above the warn threshold of [106ms]
  1> [2021-10-26T13:58:12,499][INFO ][o.o.m.f.FsHealthServiceTests] [testFailsHealthOnHungIOBeyondHealthyTimeout] before test
  1> [2021-10-26T13:58:12,572][INFO ][o.o.e.NodeEnvironment    ] [testFailsHealthOnHungIOBeyondHealthyTimeout] using [3] data paths, mounts [[/ (/dev/root)]], net usable_space [68.3gb], net total_space [96.8gb], types [ext4]
  1> [2021-10-26T13:58:12,572][INFO ][o.o.e.NodeEnvironment    ] [testFailsHealthOnHungIOBeyondHealthyTimeout] heap size [512mb], compressed ordinary object pointers [true]
  1> [2021-10-26T13:58:12,609][INFO ][o.o.m.f.FsHealthServiceTests] [testFailsHealthOnHungIOBeyondHealthyTimeout] --> Initial health status prior to the first monitor run
  1> [2021-10-26T13:58:12,611][INFO ][o.o.m.f.FsHealthServiceTests] [testFailsHealthOnHungIOBeyondHealthyTimeout] --> First monitor run
  1> [2021-10-26T13:58:12,666][INFO ][o.o.m.f.FsHealthServiceTests] [testFailsHealthOnHungIOBeyondHealthyTimeout] --> Disrupt file system
  1> [2021-10-26T13:58:14,751][INFO ][o.o.m.f.FsHealthServiceTests] [testFailsHealthOnHungIOBeyondHealthyTimeout] --> Fix file system disruption
  1> [2021-10-26T13:58:14,769][WARN ][o.o.m.f.FsHealthService  ] [org.opensearch.monitor.fs.FsHealthServiceTests] health check of [/var/CITOOL/workflow/OpenSearch_CI/PR_Checks/Gradle_Check/search/server/build/testrun/test/temp/org.opensearch.monitor.fs.FsHealthServiceTests_58869CBBC128ED5C-001/tempDir-005/nodes/0] took [1430ms] which is above the warn threshold of [199ms]
  1> [2021-10-26T13:58:14,770][ERROR][o.o.m.f.FsHealthService  ] [org.opensearch.monitor.fs.FsHealthServiceTests] health check of [/var/CITOOL/workflow/OpenSearch_CI/PR_Checks/Gradle_Check/search/server/build/testrun/test/temp/org.opensearch.monitor.fs.FsHealthServiceTests_58869CBBC128ED5C-001/tempDir-005/nodes/0] failed, took [1430ms] which is above the healthy threshold of [752ms]
  1> [2021-10-26T13:58:16,110][WARN ][o.o.m.f.FsHealthService  ] [org.opensearch.monitor.fs.FsHealthServiceTests] health check of [/var/CITOOL/workflow/OpenSearch_CI/PR_Checks/Gradle_Check/search/server/build/testrun/test/temp/org.opensearch.monitor.fs.FsHealthServiceTests_58869CBBC128ED5C-001/tempDir-006/nodes/0] took [222ms] which is above the warn threshold of [199ms]
  1> [2021-10-26T13:58:16,228][INFO ][o.o.m.f.FsHealthServiceTests] [testFailsHealthOnHungIOBeyondHealthyTimeout] after test
  2> REPRODUCE WITH: ./gradlew ':server:test' --tests "org.opensearch.monitor.fs.FsHealthServiceTests.testFailsHealthOnHungIOBeyondHealthyTimeout" -Dtests.seed=58869CBBC128ED5C -Dtests.security.manager=true -Dtests.jvm.argline="-XX:TieredStopAtLevel=1 -XX:ReservedCodeCacheSize=64m" -Dtests.locale=uk-UA -Dtests.timezone=America/Kralendijk -Druntime.java=17
  2> java.lang.AssertionError
        at __randomizedtesting.SeedInfo.seed([58869CBBC128ED5C:DAF2C2B2B5D49B14]:0)
        at org.junit.Assert.fail(Assert.java:86)
        at org.junit.Assert.assertTrue(Assert.java:41)
        at org.junit.Assert.assertTrue(Assert.java:52)
        at org.opensearch.monitor.fs.FsHealthServiceTests.testFailsHealthOnHungIOBeyondHealthyTimeout(FsHealthServiceTests.java:239)

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    >test-failureTest failure from CI, local build, etc.bugSomething isn't workinguntriagedv2.0.0Version 2.0.0

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions