Improve GitHub Actions intermittent test timeouts #336
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Problem
We've seen regular, but intermittent, test timeouts in the GitHub Actions CI.
The source of the problem appears to be due to
exec.Command
’sRun
method (or at least the command we're passing to it, which I believe for the issues I've seen it occur would be acargo build
command).Solution
TestPublish
test code to log the stdout (this only prints when the test fails).exec.CommandContext
so we can set a timeout around the subprocess.Notes
I did some Googling and it seems there’s a few open issues related to this:
One suggestion I stumbled across was to set
GO_TEST_TIMEOUT_SCALE
as some architectures can have problems with the default value. But we've NOT done that here as just increasing the timeout isn't really a solution and considering the entire test run normally only takes 4mins (which is well under the 10min limit) and the fact it only occurs occasionally suggests a problem we should properly identify.Another suggestion was to split up code into more granular packages to avoid timeouts, but that's not an appropriate solution because the tests aren't timing out because of a code logic error but because the command itself being executed (
cargo build
in this case) is taking too long (well, as far as I'm concerned something is stalling randomly at the point of running the command).