Runner hangs since fails to notice a script is completed #3587
Description
Describe the bug
We are running a test using GitHub Actions, which is written in Rust. During the test execution, no logs are printed to the terminal, and the test runs for a long time.
In our test process, there is a significant chance that the job will hang and eventually fail due to a timeout. After logging into the runner to investigate, we discovered that our script actually completes execution after 10 minutes, but the job remains in a "waiting" state until the 30-minute timeout is reached, seemingly unaware that the script has already finished. As a result, the runner hangs while waiting for the script to exit.
The step that causes runner to hang:
name: Run compaction tests
shell: bash
run: |
if [[ "${{ inputs.longtime }}" == "true" ]]; then
# Run tests in longtime mode.
bash -c ".github/e2e-config/run-compaction-tests.sh longtime -c .github/e2e-config/e2e-local-config.toml"
else
# Run tests in normal mode.
bash -c ".github/e2e-config/run-compaction-tests.sh normal -c .github/e2e-config/e2e-local-config.toml"
fi
Out test script:
#!/bin/bash
# tests.sh longtime -c config.toml
set -euo pipefail
usage() {
echo "ERROR: expect [longtime|normal] -c config"
exit 2
}
check_config_path() {
if [ ! -f "$1" ]; then
echo "ERROR: config not exists ($1)"
exit 3
fi
}
check_exec_path() {
if [ ! -f "$1" ]; then
echo "ERROR: exec not exists ($1)"
exit 4
fi
}
exec_path="./target/debug/compact_testing"
check_exec_path $exec_path
if [ $# -ne 3 ]; then
usage
fi
is_normal=true
if [ "$1" == "longtime" ]; then
is_normal=false
elif [ "$1" == "normal" ]; then
is_normal=true
else
usage
fi
if [ "$2" != "-c" ]; then
usage
fi
config_path=$3
check_config_path $config_path
run_single() {
cmd="$exec_path -c $config_path $*"
$cmd
ret=`echo $?`
if [ $ret -ne 0 ]; then
echo "ERROR: $cmd"
exit $ret
else
echo "SUCCESS: $cmd"
fi
}
test_normal() {
run_single --thread 8 --ignore-read --mode 2 --nodes 2 --count 101000 --multi-compact --precision 1
run_single --thread 4 --ignore-read --mode 2 --nodes 1 --count 101000 --multi-compact --precision 2
run_single --thread 4 --ignore-read --mode 2 --nodes 4 --count 101000 --multi-compact --precision 3
run_single --thread 8 --mode 2 --nodes 2 --count 11000 --precision 3
}
test_longtime() {
run_single --thread 8 --mode 2 --count 110001000 --ignore-read
run_single --thread 8 --mode 2 --count 120001000
}
if [ $is_normal == "true" ]
then
echo "INFO: Run normal compaction test..."
test_normal
echo "INFO: End normal compaction test..."
else
echo "INFO: Run longtime compaction test..."
test_normal
test_longtime
echo "INFO: End longtime compaction test."
fi
Interestingly, we found that if the Rust program outputs logs intermittently to the terminal, the runner does not hang.
We suspect there may be an issue with how the GitHub runner waits for the script to complete.
To Reproduce
Sorry, we cannot provide a reproduction since our project is private.
Expected behavior
Github runner should always be notified that a script is completed
Runner Version and Platform
Version of your runner?
Unknown to me, sorry.
OS of the machine running the runner?
Linux