Add plugin health monitoring to /health endpoint #1884
Draft
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Proposed changes
The
/healthendpoint now detects when plugin processes have exited (e.g., due to OOM) and returns HTTP 503 with details. Previously, killed plugins went undetected, causing silent job execution failures.Implementation:
plugin.Clientinstances during plugin discovery to enable health checksclient.Exited()on each health requestAPI Response:
Healthy state returns HTTP 200:
{ "status": "healthy", "leader": true }Unhealthy state returns HTTP 503:
{ "status": "unhealthy", "issues": ["plugin processor-files has exited"], "leader": true }Types of changes
Warning
Firewall rules blocked me from connecting to one or more addresses (expand for details)
I tried to connect to the following addresses, but was blocked by firewall rules:
set_my_env_var/home/REDACTED/work/dkron/dkron/dkron-processor-fluent /home/REDACTED/work/dkron/dkron/dkron-processor-fluent -pack /home/REDACTED/go/pkg/mod/github.com/go-jose/go-jose/v4@v4.1.3/asymmetric.go /home/REDACTED/go/pkg/mod/github.com/go-jose/go-jose/v4@v4.1.3/crypter.go 4489�� orest/autorest@v-p orest/autorest@vgithub.com/distribworks/dkron/v4/dkron(dns block)stats.dkron.io./dkron-bin ./dkron-bin agent --server --bootstrap-expect=1 --data-dir=/tmp/dkron-test --log-level=info --node-name=test-node(dns block)If you need me to access, download, or install something from one of these locations, you can either:
Original prompt
💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.