Description
As of #7277, rows in the webhook_event
, webhook_delivery
, and webhook_delivery_attempt
tables are kept forever.1 This is good, as it ensures that a failed delivery can always be resent, no matter how long the receiver was unavailable for. On the other hand, it's bad because, well, we keep those records forever. There should be a way to delete them, so that we don't use an ever-increasing amount of disk space storing webhook events from, potentially, years ago.
One obvious solution is to just have some retention period, and add an RPW for deleting events and their associated deliveries that are older than that retention period. Events would only be deleted if they do not have an active in progress delivery, and have been dispatched. Having a retention period means that we can only guarantee reliable delivery of events over that time window. If a receiver is unavailable for longer than the retention period, events dispatched to it may be deleted even if they were not successfully delivered. We could make the retention period configurable to allow the operator to select their desired tradeoff between tolerating receiver downtime and reclaiming storage in a timely manner. We could, also, allow a more generous retention period for events that have not been delivered successfully to all dispatched receivers, so that failed events that may be resent are kept for longer than events that were delivered successfully to all receivers that care about them.
An alternative approach is to just add an API endpoint for deleting a webhook event by ID. In theory, this feels a bit more comfortable to me than a time-based automatic deletion, since stuff is only deleted when the receiver explicitly says "yeah, I got this one, I definitely won't be needing it again." However, when multiple receiver endpoints are in use and may receive the same event classes, the operator of those receiver endpoints may need some way for the receivers to coordinate between themselves about which events may be deleted. We could help a bit by refusing to delete events that have not been successfully delivered to one or more of their subscribed receivers, either by default or as an option on the delete request. Another downside to this is that events that no receivers were subscribed to wouldn't be deleted, since no receiver would ever have even seen their UUID to say "yeah, I got this one", and they also won't show up in the list of events dispatched to a receiver. We would need to figure out some other way to clean up events that have no interested receivers.2
In between the first two options, we could also consider having a way to say "delete everything older than " explicitly. That way, we only delete stuff when asked to, but can do time-based deletion, including of events that no receiver wanted.
Footnotes
-
With the
webhook_delivery
andwebhook_delivery_attempt
records being deleted only if the receiver those deliveries were sent to is deleted. ↩ -
Since we won't currently send a newly-created receiver any events that match its subscriptions but occurred before the receiver was created, we could probably get away with deleting events no one was subscribed to more or less immediately. The
webhook-dispatcher
background task could just delete them immediately, I suppose... ↩