Skip to content

fix(webhooks): add scheduled task to unlock stale webhook request locks#3546

Open
cymulatereouven wants to merge 1 commit intopostalserver:mainfrom
cymulatereouven:fix/tidy-stale-webhook-request-locks
Open

fix(webhooks): add scheduled task to unlock stale webhook request locks#3546
cymulatereouven wants to merge 1 commit intopostalserver:mainfrom
cymulatereouven:fix/tidy-stale-webhook-request-locks

Conversation

@cymulatereouven
Copy link
Copy Markdown

Summary

  • Problem: When a worker process crashes or is killed (common in Kubernetes with rolling updates, OOM kills, pod evictions), locks held on webhook_requests rows are never released. Unlike QueuedMessage which has TidyQueuedMessagesTask to clean up stale locks, WebhookRequest had no equivalent cleanup mechanism — causing webhook delivery to be permanently blocked.
  • Fix: Adds a TidyWebhookRequestsTask scheduled task (runs hourly) that finds webhook requests locked for more than 1 hour and unlocks them so they can be retried.
  • Adds a with_stale_lock scope on WebhookRequest (mirrors the existing pattern on QueuedMessage)

Changes

File Change
app/models/webhook_request.rb Added with_stale_lock scope (locks older than 1 hour)
app/scheduled_tasks/tidy_webhook_requests_task.rb New scheduled task to unlock stale webhook requests
app/lib/worker/process.rb Registered TidyWebhookRequestsTask in the worker TASKS array

Why unlock instead of destroy?

TidyQueuedMessagesTask destroys stale queued messages because they represent outbound email delivery attempts that are likely no longer relevant. Webhook requests, however, carry event notifications that downstream systems may still need — so we unlock them to allow retry rather than silently dropping them.

Test plan

  • Verify WebhookRequest.with_stale_lock returns only records with locked_at older than 1 hour
  • Verify TidyWebhookRequestsTask unlocks stale records (sets locked_by and locked_at to nil)
  • Verify previously-stuck webhook requests are picked up again by ProcessWebhookRequestsJob after unlock
  • Confirm no impact on actively-locked (non-stale) webhook requests

🤖 Generated with Claude Code

When a worker process crashes or is killed (common in Kubernetes with
rolling updates, OOM kills, etc.), locks held on webhook_requests are
never released. Unlike QueuedMessage which has TidyQueuedMessagesTask
to clean up stale locks, WebhookRequest had no equivalent mechanism,
causing webhook delivery to be permanently blocked.

This adds:
- A `with_stale_lock` scope on WebhookRequest (locks older than 1 hour)
- A TidyWebhookRequestsTask scheduled task that runs hourly to unlock
  stale webhook requests so they can be retried

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants