Skip to content

Better handling of stray jobs that need terminating in CI #20194

Closed
@Trott

Description

@Trott

This has recently started happening a lot in the macOS hosts in CI:

Some test times out. For example, in https://ci.nodejs.org/job/node-test-commit-osx/nodes=osx1010/17983/console, sequential/test-benchmark-http times out.

As a result, a stray subprocess is left that ends up causing subsequent jobs to fail. So, for example, https://ci.nodejs.org/job/node-test-commit-osx/nodes=osx1010/17988/console:

# Clean up any leftover processes, error if found.
ps awwx | grep Release/node | grep -v grep | cat
79201   ??  R    145:42.29 /Users/iojs/build/workspace/node-test-commit-osx/nodes/osx1010/out/Release/node /Users/iojs/build/workspace/node-test-commit-osx/nodes/osx1010/benchmark/http/cluster.js c=1 len=1 type=asc benchmarker=test-double chunkedEnc=true chunks=0 dur=0.1 key="" method=write n=1 res=normal
make[1]: *** [test-ci] Error 1

To fix this, someone from the Build WG (in this specific case, me) logs in and does a kill -9 on the PID. In theory, the PID should have been terminated by one of the instances of xargs kill that appears in the Makefile. My guess (that I keep forgetting to test when this comes up) is that the problem is that xargs kill needs to be xargs kill -9 to be effective in these cases on the macOS hosts.

Metadata

Metadata

Assignees

No one assigned

    Labels

    buildIssues and PRs related to build files or the CI.flaky-testIssues and PRs related to the tests with unstable failures on the CI.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions