Skip to content

Runner hangs since fails to notice a script is completed #3587

Open
@niebayes

Description

Describe the bug

We are running a test using GitHub Actions, which is written in Rust. During the test execution, no logs are printed to the terminal, and the test runs for a long time.

In our test process, there is a significant chance that the job will hang and eventually fail due to a timeout. After logging into the runner to investigate, we discovered that our script actually completes execution after 10 minutes, but the job remains in a "waiting" state until the 30-minute timeout is reached, seemingly unaware that the script has already finished. As a result, the runner hangs while waiting for the script to exit.

The step that causes runner to hang:

name: Run compaction tests
shell: bash
run: |
        if [[ "${{ inputs.longtime }}" == "true" ]]; then
          # Run tests in longtime mode.
          bash -c ".github/e2e-config/run-compaction-tests.sh longtime -c .github/e2e-config/e2e-local-config.toml"
        else
          # Run tests in normal mode.
          bash -c ".github/e2e-config/run-compaction-tests.sh normal -c .github/e2e-config/e2e-local-config.toml"
        fi

Out test script:

#!/bin/bash
# tests.sh longtime -c config.toml

set -euo pipefail

usage() {
  echo "ERROR: expect [longtime|normal] -c config"
  exit 2
}
check_config_path() {
  if [ ! -f "$1" ]; then
    echo "ERROR: config not exists ($1)"
    exit 3
  fi
}
check_exec_path() {
  if [ ! -f "$1" ]; then
    echo "ERROR: exec not exists ($1)"
    exit 4
  fi
}

exec_path="./target/debug/compact_testing"
check_exec_path $exec_path

if [ $# -ne 3 ]; then
  usage
fi

is_normal=true
if [ "$1" == "longtime" ]; then
  is_normal=false
elif [ "$1" == "normal" ]; then
  is_normal=true
else
  usage
fi

if [ "$2" != "-c" ]; then
  usage
fi

config_path=$3
check_config_path $config_path

run_single() {
  cmd="$exec_path -c $config_path $*"
  $cmd
  ret=`echo $?`
  if [ $ret -ne 0 ]; then
    echo "ERROR: $cmd"
    exit $ret
  else
    echo "SUCCESS: $cmd"
  fi
}

test_normal() {
  run_single --thread 8 --ignore-read --mode 2 --nodes 2 --count 101000 --multi-compact --precision 1
  run_single --thread 4 --ignore-read --mode 2 --nodes 1 --count 101000 --multi-compact --precision 2
  run_single --thread 4 --ignore-read --mode 2 --nodes 4 --count 101000 --multi-compact --precision 3
  run_single --thread 8 --mode 2 --nodes 2 --count 11000 --precision 3
}

test_longtime() {
  run_single --thread 8 --mode 2 --count 110001000 --ignore-read
  run_single --thread 8 --mode 2 --count 120001000
}

if [ $is_normal == "true" ]
then
  echo "INFO: Run normal compaction test..."
  test_normal
  echo "INFO: End normal compaction test..."
else
  echo "INFO: Run longtime compaction test..."
  test_normal
  test_longtime
  echo "INFO: End longtime compaction test."
fi

Interestingly, we found that if the Rust program outputs logs intermittently to the terminal, the runner does not hang.

We suspect there may be an issue with how the GitHub runner waits for the script to complete.

To Reproduce
Sorry, we cannot provide a reproduction since our project is private.

Expected behavior

Github runner should always be notified that a script is completed

Runner Version and Platform

Version of your runner?
Unknown to me, sorry.

OS of the machine running the runner?
Linux

What's not working?

Job Log Output

Runner and Worker's Diagnostic Logs

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions