Skip to content

Conversation

@humzam
Copy link
Contributor

@humzam humzam commented Jan 2, 2026

Context

We have noticed that occasionally, the Stop Runner job fails with a vague "HttpError". Possible reasons might be a brief network outage, the runner was already removed, or even a GitHub outage (which lately has been pretty common sadly). More than just a failed job, this can lead to several orphaned runners that are never cleaned up.

Screenshot 2025-12-18 at 9 52 31 AM

Proposed Solution

To increase the robustness of the Stop Runner job, this PR introduces some retry logic with exponential backoff to the runner removal step.

Additionally, if we get a 404 back that means the runner was likely already removed, so we can exit happily instead of failing.

Bonus: Instead of just printing "HttpError" which is not super helpful, we are also now logging the exact HTTP status code & message when it does truly fail.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant