Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restarts Prometheus subprocess when it crashes #663

Merged
merged 10 commits into from
May 28, 2020
38 changes: 26 additions & 12 deletions src/shadowbox/infrastructure/prometheus_scraper.ts
Original file line number Diff line number Diff line change
Expand Up @@ -69,10 +69,20 @@ export class PrometheusClient {
}
}

export async function runPrometheusScraper(
args: string[], configFilename: string, configJson: {},
prometheusEndpoint: string): Promise<child_process.ChildProcess> {
mkdirp.sync(path.dirname(configFilename));
export async function startPrometheus(
prometheusConfigFilename: string, prometheusEndpoint: string, prometheusTsdbFilename: string,
prometheusLocation: string, configJson: {}, verbose: boolean) {
await writePrometheusConfigToDisk(prometheusConfigFilename, configJson);
const processArgs = [
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This library is intended to be a generic Prometheus client, but these are all application-specific parameters. Could you change it back to take a generic list of args?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Vini. That's interesting insight!

I usually recommend the opposite and put all the "opinions" into the client. This guarantees a consistent experience for all users of the client. And only allow optional customizations to be overridden. I like to think of it as analogous to a builder pattern with all required params without going full builder.


On a side note, I put the logic back into the caller.

Copy link
Collaborator

@fortuna fortuna May 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with you when we talk about the domain objects. But I see everything under infrastructure/ as a different bounded context, independent of our app, with their own models.

'--config.file', prometheusConfigFilename, '--storage.tsdb.retention', '31d',
mpmcroy marked this conversation as resolved.
Show resolved Hide resolved
'--storage.tsdb.path', prometheusTsdbFilename, '--web.listen-address', prometheusLocation,
'--log.level', verbose ? 'debug' : 'info'
];
await spawnPrometheusSubprocess(processArgs, prometheusEndpoint);
}

async function writePrometheusConfigToDisk(configFilename: string, configJson: {}) {
await mkdirp.sync(path.dirname(configFilename));
const ymlTxt = jsyaml.safeDump(configJson, {'sortKeys': true});
// Write the file asynchronously to prevent blocking the node thread.
await new Promise((resolve, reject) => {
Expand All @@ -84,15 +94,19 @@ export async function runPrometheusScraper(
}
});
});
const commandArguments = ['--config.file', configFilename];
commandArguments.push(...args);
const runProcess = child_process.spawn('/root/shadowbox/bin/prometheus', commandArguments);
}

async function spawnPrometheusSubprocess(
processArgs: string[], prometheusEndpoint: string): Promise<child_process.ChildProcess> {
logging.info('Starting Prometheus...');
mpmcroy marked this conversation as resolved.
Show resolved Hide resolved
const runProcess = child_process.spawn('/root/shadowbox/bin/prometheus', processArgs);
runProcess.on('error', (error) => {
logging.error(`Error spawning prometheus: ${error}`);
logging.error(`Error spawning Prometheus: ${error}`);
});
// TODO(fortuna): Add restart logic.
runProcess.on('exit', (code, signal) => {
logging.info(`prometheus has exited with error. Code: ${code}, Signal: ${signal}`);
logging.info(`Prometheus has exited with error. Code: ${code}, Signal: ${signal}`);
mpmcroy marked this conversation as resolved.
Show resolved Hide resolved
logging.info('Restarting Prometheus...');
spawnPrometheusSubprocess(processArgs, prometheusEndpoint);
});
// TODO(fortuna): Consider saving the output and expose it through the manager service.
runProcess.stdout.pipe(process.stdout);
Expand All @@ -102,11 +116,11 @@ export async function runPrometheusScraper(
}

async function waitForPrometheusReady(prometheusEndpoint: string) {
logging.debug('Waiting for Prometheus to be ready...');
logging.info('Waiting for Prometheus to be ready...');
mpmcroy marked this conversation as resolved.
Show resolved Hide resolved
while (!(await isHttpEndpointHealthy(prometheusEndpoint))) {
await new Promise((resolve) => setTimeout(resolve, 1000));
}
logging.debug('Prometheus is ready');
logging.info('Prometheus is ready!');
mpmcroy marked this conversation as resolved.
Show resolved Hide resolved
}

function isHttpEndpointHealthy(endpoint: string): Promise<boolean> {
Expand Down
17 changes: 8 additions & 9 deletions src/shadowbox/server/main.ts
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ import {RealClock} from '../infrastructure/clock';
import {PortProvider} from '../infrastructure/get_port';
import * as json_config from '../infrastructure/json_config';
import * as logging from '../infrastructure/logging';
import {PrometheusClient, runPrometheusScraper} from '../infrastructure/prometheus_scraper';
import {PrometheusClient, startPrometheus} from '../infrastructure/prometheus_scraper';
import {RolloutTracker} from '../infrastructure/rollout';
import {AccessKeyId} from '../model/access_key';

Expand Down Expand Up @@ -146,15 +146,14 @@ async function main() {
if (isReplayProtectionEnabled) {
shadowsocksServer.enableReplayProtection();
}

// Start Prometheus subprocess and wait for it to be up and running.
const prometheusConfigFilename = getPersistentFilename('prometheus/config.yml');
const prometheusTsdbFilename = getPersistentFilename('prometheus/data');
const prometheusEndpoint = `http://${prometheusLocation}`;
// Wait for Prometheus to be up and running.
await runPrometheusScraper(
[
'--storage.tsdb.retention', '31d', '--storage.tsdb.path',
getPersistentFilename('prometheus/data'), '--web.listen-address', prometheusLocation,
'--log.level', verbose ? 'debug' : 'info'
],
getPersistentFilename('prometheus/config.yml'), prometheusConfigJson, prometheusEndpoint);
await startPrometheus(
prometheusConfigFilename, prometheusEndpoint, prometheusTsdbFilename, prometheusLocation,
prometheusConfigJson, verbose);

const prometheusClient = new PrometheusClient(prometheusEndpoint);
if (!serverConfig.data().portForNewAccessKeys) {
Expand Down