-
Couldn't load subscription status.
- Fork 130
Description
Describe the bug
Note
Please first search in this page for the same failure before recording yours
This issue is to record flaky tests we are seeing. Most of time, your PR is not the reason of the test failure. So you can just record the flaky test failure here and we will dig into it later.
Report like this:
Refer to this comment #90 (comment)
You can download the log of failed run from the bottom of the summary of the workflow.
How to debug the flaky test
Flaky test is hard to understand at the first look, considering we may not have enough log. And it's also not easy to reproduce (race condition, environment dependent).
An efficient way to handle these:
- Go into the failed run, based on the test failures, you should be able to relate to the source code part.
- Download the log associated with the failed run, locating the logs in
integTest.logwhen this test was run. - If we don't have enough log to understand how it could happen, add more DEBUG log into the source code that would be enough for us to understand when this flaky test shows up next time. Change test log level to debugging like this. So we can see more info in the cluster log.
- You can also add log into the test code, so we can see more info in the test log.
Script to reproduce the flaky
This script works on Linux system
script ../flakyoutput.txt # use shell command script to record the terminal output into a file
for i in {1..10}; do
echo "=== Run time $i ==="
./gradlew clean && ./gradlew integTest --tests "*IndexManagementIndicesIT" --tests "*MetadataRegressionIT" --tests "*ActionRetryIT" -PnumNodes=3
retVal=$?
if [ $retVal -eq 1 ]; then
echo "Flaky test found in run time $i"
break
fi
done
ctrl+d # end shell command script