Skip to content

HDDS-1764. Fix hidden errors in acceptance tests #1059

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from

Conversation

elek
Copy link
Member

@elek elek commented Jul 4, 2019

[~bharatviswa] pinged me offline with the problem that in some cases the smoketest is failing even if the reports are green:

All smoke tests are passed, but CI is showing as Failed.

https://ci.anzix.net/job/ozone/17284/RobotTests/log.html
#1048

The root cause is a few typo after HDDS-1698, which can be fixed with the uploaded PR.

What is the problem?

In case of any error during the test execution the smoketest is failed. In this case because the typo in two docker-compose.yaml files two of the tests can't be started.

But there is no separated robot test report and the error is visible only in the console.

How did it happen?

The ACL work improved some intermittency in the acceptance tests. HDDS-1698 is committed because the acceptance tests were failed with ACL errors which hide the real error (the test was red anyway).

 

See: https://issues.apache.org/jira/browse/HDDS-1764

@elek elek added the ozone label Jul 4, 2019
@elek
Copy link
Member Author

elek commented Jul 4, 2019

The first commit makes the original problem more visible. After the first build, the next commit will fix the problem itself.

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
0 reexec 75 Docker mode activated.
_ Prechecks _
+1 dupname 0 No case conflicting files found.
0 shelldocs 0 Shelldocs was not available.
+1 @author 0 The patch does not contain any @author tags.
-1 test4tests 0 The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
_ trunk Compile Tests _
+1 mvninstall 518 trunk passed
+1 mvnsite 0 trunk passed
+1 shadedclient 960 branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 mvninstall 489 the patch passed
+1 mvnsite 0 the patch passed
+1 shellcheck 0 There were no new shellcheck issues.
+1 whitespace 0 The patch has no whitespace issues.
+1 shadedclient 782 patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 unit 110 hadoop-hdds in the patch passed.
+1 unit 192 hadoop-ozone in the patch passed.
+1 asflicense 47 The patch does not generate ASF License warnings.
3380
Subsystem Report/Notes
Docker Client=18.09.5 Server=18.09.5 base: https://builds.apache.org/job/hadoop-multibranch/job/PR-1059/1/artifact/out/Dockerfile
GITHUB PR #1059
Optional Tests dupname asflicense mvnsite unit shellcheck shelldocs
uname Linux c7e62fbea230 4.15.0-52-generic #56-Ubuntu SMP Tue Jun 4 22:49:08 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality personality/hadoop.sh
git revision trunk / 1c254a8
Test Results https://builds.apache.org/job/hadoop-multibranch/job/PR-1059/1/testReport/
Max. process+thread count 334 (vs. ulimit of 5500)
modules C: hadoop-ozone/dist U: hadoop-ozone/dist
Console output https://builds.apache.org/job/hadoop-multibranch/job/PR-1059/1/console
versions git=2.7.4 maven=3.3.9 shellcheck=0.4.6
Powered by Apache Yetus 0.10.0 http://yetus.apache.org

This message was automatically generated.

@elek
Copy link
Member Author

elek commented Jul 4, 2019

/retest

@elek
Copy link
Member Author

elek commented Jul 5, 2019

I don't know what is the final answer to avoid similar problems (obviously, the intermittent test failures are very dangerous).

As of now, I improved the error message (first commit). You can see the result in the acceptance tests of the first test:

 no such image: apache/ozone-runner::20190617-2: invalid reference format
ERROR: Test execution of /var/jenkins_home/workspace/ozone/hadoop-ozone/dist/target/ozone-0.5.0-SNAPSHOT/compose/ozone-net-topology is FAILED!!!!

compose files are fixed in the second commit.

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
0 reexec 30 Docker mode activated.
_ Prechecks _
+1 dupname 0 No case conflicting files found.
0 yamllint 0 yamllint was not available.
0 shelldocs 0 Shelldocs was not available.
+1 @author 0 The patch does not contain any @author tags.
-1 test4tests 0 The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
_ trunk Compile Tests _
+1 mvninstall 502 trunk passed
+1 compile 257 trunk passed
+1 mvnsite 0 trunk passed
+1 shadedclient 727 branch has no errors when building and testing our client artifacts.
+1 javadoc 163 trunk passed
_ Patch Compile Tests _
+1 mvninstall 441 the patch passed
+1 compile 285 the patch passed
+1 javac 285 the patch passed
+1 mvnsite 0 the patch passed
+1 shellcheck 0 There were no new shellcheck issues.
+1 whitespace 0 The patch has no whitespace issues.
+1 shadedclient 682 patch has no errors when building and testing our client artifacts.
+1 javadoc 166 the patch passed
_ Other Tests _
-1 unit 190 hadoop-hdds in the patch failed.
-1 unit 1881 hadoop-ozone in the patch failed.
+1 asflicense 43 The patch does not generate ASF License warnings.
5581
Reason Tests
Failed junit tests hadoop.ozone.container.ozoneimpl.TestOzoneContainer
hadoop.ozone.client.rpc.TestOzoneClientRetriesOnException
hadoop.ozone.client.rpc.TestOzoneRpcClient
hadoop.ozone.client.rpc.TestBlockOutputStream
hadoop.ozone.client.rpc.TestSecureOzoneRpcClient
hadoop.ozone.TestMiniChaosOzoneCluster
hadoop.ozone.client.rpc.TestOzoneAtRestEncryption
hadoop.ozone.client.rpc.TestOzoneRpcClientWithRatis
Subsystem Report/Notes
Docker Client=17.05.0-ce Server=17.05.0-ce base: https://builds.apache.org/job/hadoop-multibranch/job/PR-1059/2/artifact/out/Dockerfile
GITHUB PR #1059
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient yamllint shellcheck shelldocs
uname Linux feacd6ad17b8 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 17:16:02 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality personality/hadoop.sh
git revision trunk / 96d0555
Default Java 1.8.0_212
unit https://builds.apache.org/job/hadoop-multibranch/job/PR-1059/2/artifact/out/patch-unit-hadoop-hdds.txt
unit https://builds.apache.org/job/hadoop-multibranch/job/PR-1059/2/artifact/out/patch-unit-hadoop-ozone.txt
Test Results https://builds.apache.org/job/hadoop-multibranch/job/PR-1059/2/testReport/
Max. process+thread count 4242 (vs. ulimit of 5500)
modules C: hadoop-ozone/dist U: hadoop-ozone/dist
Console output https://builds.apache.org/job/hadoop-multibranch/job/PR-1059/2/console
versions git=2.7.4 maven=3.3.9 shellcheck=0.4.6
Powered by Apache Yetus 0.10.0 http://yetus.apache.org

This message was automatically generated.

@xiaoyuyao
Copy link
Contributor

Thanks @elek for fixing this. The change LGTM. Can you resolve the conflicts? +1 after that.

@elek
Copy link
Member Author

elek commented Jul 10, 2019

Thanks the review @xiaoyuyao I rebased it and will merge it soon.

I cross-checked it with your PR (#1066) and found that I didn't notice ozonesecure-mr (which was also fixed in your patch). I also added this line of change.

(ps: after committing the other patch with ozone-mr acceptance tests, it can be copied to support ozonesecure-mr to keep it stable....)

@elek elek closed this in 9382488 Jul 10, 2019
elek added a commit that referenced this pull request Jul 10, 2019
shanthoosh pushed a commit to shanthoosh/hadoop that referenced this pull request Oct 15, 2019
…sks original exception

Failures cleaning up the staging directory on another exception were masking the original exception making troubleshooting difficult. Add some logging and an extra try/catch around the cleanup.

Author: thunderstumpges <tstumpges@ntent.com>

Reviewers: Daniel Nishimura <dnishimura@linkedin.com>

Closes apache#1059 from thunderstumpges/try-catch-on-staging-cleanup
amahussein pushed a commit to amahussein/hadoop that referenced this pull request Oct 29, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants