Skip to content

Conversation

Copy link
Contributor

Copilot AI commented Dec 20, 2025

Proposed changes

The /health endpoint now detects when plugin processes have exited (e.g., due to OOM) and returns HTTP 503 with details. Previously, killed plugins went undetected, causing silent job execution failures.

Implementation:

  • Track plugin.Client instances during plugin discovery to enable health checks
  • Check plugin process status via client.Exited() on each health request
  • Return HTTP 503 with issue details when any plugin has terminated
  • Include cluster leader status for server nodes

API Response:

Healthy state returns HTTP 200:

{
  "status": "healthy",
  "leader": true
}

Unhealthy state returns HTTP 503:

{
  "status": "unhealthy",
  "issues": ["plugin processor-files has exited"],
  "leader": true
}

Types of changes

  • Bugfix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation Update (if none of the other choices apply)

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

  • set_my_env_var
    • Triggering command: /home/REDACTED/work/dkron/dkron/dkron-processor-fluent /home/REDACTED/work/dkron/dkron/dkron-processor-fluent -pack /home/REDACTED/go/pkg/mod/github.com/go-jose/go-jose/v4@v4.1.3/asymmetric.go /home/REDACTED/go/pkg/mod/github.com/go-jose/go-jose/v4@v4.1.3/crypter.go 4489�� orest/autorest@v-p orest/autorest@vgithub.com/distribworks/dkron/v4/dkron (dns block)
  • stats.dkron.io
    • Triggering command: ./dkron-bin ./dkron-bin agent --server --bootstrap-expect=1 --data-dir=/tmp/dkron-test --log-level=info --node-name=test-node (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

Original prompt

This section details on the original issue you should resolve

<issue_title>Health endpoint improvement</issue_title>
<issue_description>Is your feature request related to a problem? Please describe.
We had a problem with a killed plugin process (due to OOM) which resulted in non-executing jobs.

Describe the solution you'd like
The health endpoint might be used to check whether all plugin processes are up and running.
In general, more health checks would be helpful, e.g. cluster health?

The endpoint could return a non-200 status code when the instance is not healthy!

</issue_description>

<agent_instructions>Health endpoint should check that all loaded plugins are running</agent_instructions>

Comments on the Issue (you are @copilot in this section)

@vcastellm Already on the roadmap, will work on this. @vcastellm @davidgengenbach not really the improvement you mention but I think it's better to fail fast in case of a missing plugin. In case of using as a service the OS supervisor will take care of restarting. This is the case with processor plugins.

💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

@coderabbitai
Copy link

coderabbitai bot commented Dec 20, 2025

Important

Review skipped

Bot user detected.

To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch copilot/improve-health-endpoint

Comment @coderabbitai help to get the list of available commands and usage tips.

Copilot AI and others added 3 commits December 20, 2025 10:58
Co-authored-by: vcastellm <47026+vcastellm@users.noreply.github.com>
Co-authored-by: vcastellm <47026+vcastellm@users.noreply.github.com>
Co-authored-by: vcastellm <47026+vcastellm@users.noreply.github.com>
Copilot AI changed the title [WIP] Enhance health endpoint to monitor plugin processes Add plugin health monitoring to /health endpoint Dec 20, 2025
Copilot AI requested a review from vcastellm December 20, 2025 11:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Health endpoint improvement

2 participants