Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Runner detect ACPI shutdown #1068

Merged
merged 36 commits into from
Jul 4, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
62d1c82
Runner edge cases improvements
DavidGOrtega May 29, 2022
6b0e6b8
Merge branch 'master' of https://github.com/iterative/cml into runner…
DavidGOrtega May 29, 2022
7a5c7c8
merge master
DavidGOrtega May 30, 2022
6bbccbb
fix parselog
DavidGOrtega May 30, 2022
99c4222
refactor ugly return
DavidGOrtega May 30, 2022
a3a0e3e
fix test
DavidGOrtega May 30, 2022
8e6b2d6
idle check not needed
DavidGOrtega May 30, 2022
cfe5a31
fix tests
DavidGOrtega May 30, 2022
f9cade3
fix gl
DavidGOrtega May 30, 2022
d381dce
Merge branch 'master' of https://github.com/iterative/cml into runner…
DavidGOrtega Jun 2, 2022
cb1e92d
multi logs parser
DavidGOrtega Jun 5, 2022
fbec7b0
Merge branch 'master' of https://github.com/iterative/cml into runner…
DavidGOrtega Jun 5, 2022
b811adf
feedback fixes
DavidGOrtega Jun 9, 2022
cc6576d
Merge branch 'master' of https://github.com/iterative/cml into runner…
DavidGOrtega Jun 9, 2022
84c2d6e
Merge branch 'master' into runner-no-special-cases
DavidGOrtega Jun 10, 2022
7a29f6a
process lost
DavidGOrtega Jun 10, 2022
6fd8049
job not id
DavidGOrtega Jun 10, 2022
6e01f9f
patterns not entities
DavidGOrtega Jun 10, 2022
45b93f5
Merge branch 'runner-no-special-cases' of https://github.com/iterativ…
DavidGOrtega Jun 10, 2022
9b0a217
Merge branch 'master' of https://github.com/iterative/cml into runner…
DavidGOrtega Jun 10, 2022
0d99565
Runner detect ACPI termination
DavidGOrtega Jun 19, 2022
2c6b392
try connect
DavidGOrtega Jun 19, 2022
8bd4d26
merge master
DavidGOrtega Jun 20, 2022
ebc75ed
on error
DavidGOrtega Jun 21, 2022
2d76a03
package-lock
DavidGOrtega Jun 21, 2022
2bc7934
rerun workflow
DavidGOrtega Jun 28, 2022
a821ef3
merge master
DavidGOrtega Jun 28, 2022
150e598
merge master
DavidGOrtega Jun 28, 2022
6858295
remove console
DavidGOrtega Jun 28, 2022
333cfb5
remove unused
DavidGOrtega Jun 28, 2022
4f7d88a
runner name exception
DavidGOrtega Jun 28, 2022
6afef9b
Merge branch 'master' into runner/detect-terminate
dacbd Jun 30, 2022
cd5d7fe
review suggestions (#1083)
dacbd Jul 1, 2022
f12abbb
:see_no_evil:
dacbd Jul 1, 2022
bd50878
lock update
dacbd Jul 1, 2022
424d996
Merge branch 'master' into runner/detect-terminate
DavidGOrtega Jul 4, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
61 changes: 34 additions & 27 deletions bin/cml/runner.js
Original file line number Diff line number Diff line change
@@ -1,19 +1,20 @@
const { join } = require('path');
const { homedir } = require('os');
const fs = require('fs').promises;
const { SpotNotifier } = require('ec2-spot-notification');
const net = require('net');
const kebabcaseKeys = require('kebabcase-keys');
const timestring = require('timestring');
const winston = require('winston');

const CML = require('../../src/cml').default;
const { randid, sleep } = require('../../src/utils');
const tf = require('../../src/terraform');

let cml;
let RUNNER;
let RUNNER_JOBS_RUNNING = [];
let RUNNER_SHUTTING_DOWN = false;
let RUNNER_TIMER = 0;
const RUNNER_JOBS_RUNNING = [];
const GH_5_MIN_TIMEOUT = (72 * 60 - 5) * 60 * 1000;

const shutdown = async (opts) => {
Expand Down Expand Up @@ -46,14 +47,15 @@ const shutdown = async (opts) => {

const retryWorkflows = async () => {
try {
if (!noRetry) {
if (RUNNER_JOBS_RUNNING.length > 0) {
await Promise.all(
RUNNER_JOBS_RUNNING.map(
async (job) => await cml.pipelineRestart({ jobId: job.id })
)
);
}
if (!noRetry && RUNNER_JOBS_RUNNING.length > 0) {
winston.info(`Still pending jobs, retrying workflow...`);

await Promise.all(
RUNNER_JOBS_RUNNING.map(
async (job) =>
await cml.pipelineRerun({ id: job.pipeline, jobId: job.id })
)
);
}
} catch (err) {
winston.error(err);
Expand Down Expand Up @@ -240,21 +242,36 @@ const runLocal = async (opts) => {
await tf.saveTfState({ tfstate, path });
}

if (process.platform === 'linux') {
const acpiSock = net.connect('/var/run/acpid.socket');
acpiSock.on('connect', () => {
winston.info('Connected to acpid service.');
});
acpiSock.on('error', (err) => {
winston.warn(
`Error connecting to ACPI socket: ${err.message}. The acpid.service helps with instance termination detection.`
);
});
acpiSock.on('data', (buf) => {
const data = buf.toString().toLowerCase();
if (data.includes('power') && data.includes('button')) {
shutdown({ ...opts, reason: 'ACPI shutdown' });
}
});
}

const dataHandler = async (data) => {
const logs = await cml.parseRunnerLog({ data });
const logs = await cml.parseRunnerLog({ data, name });
for (const log of logs) {
winston.info('runner status', log);

if (log.status === 'job_started') {
RUNNER_JOBS_RUNNING.push({ id: log.job, date: log.date });
const { job: id, pipeline, date } = log;
RUNNER_JOBS_RUNNING.push({ id, pipeline, date });
}

if (log.status === 'job_ended') {
const { job: jobId } = log;
RUNNER_JOBS_RUNNING = RUNNER_JOBS_RUNNING.filter(
(job) => job.id !== jobId
);

RUNNER_JOBS_RUNNING.pop();
if (single) await shutdown({ ...opts, reason: 'single job' });
}
}
Expand Down Expand Up @@ -295,16 +312,6 @@ const runLocal = async (opts) => {
}

if (!noRetry) {
try {
winston.info(`EC2 id ${await SpotNotifier.instanceId()}`);
SpotNotifier.on('termination', () =>
shutdown({ ...opts, reason: 'spot_termination' })
);
SpotNotifier.start();
} catch (err) {
winston.warn('SpotNotifier can not be started.');
}

if (cml.driver === 'github') {
const watcherSeventyTwo = setInterval(() => {
RUNNER_JOBS_RUNNING.forEach((job) => {
Expand Down
Loading