Daily webhook unpredictability & double triggers post-mortem #22
EthanThatOneKid
announced in
Announcements
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Summary
On September 5th, our daily webhook, scheduled for 5pm PST midnight GMT, exhibited erratic behavior. This issue has been resolved, and this post-mortem aims to address the root cause and outline preventive measures for the future.
Incident timeline
Recent workflow removal: A previous GitHub workflow cronjob was removed from production a few days before the incident to validate the new Cloudflare migration.
Unpredictable behavior: During this period, the webhook showed irregularities, with several days of skipped executions.
Double execution on September 5th: On September 5th, the webhook executed twice, causing unexpected behavior.
Resolution
The rogue Cloudflare cronjob has been identified and removed, restoring normal webhook functionality.
Preventive measures
To prevent future errors, we will consider the following suggestions:
Implement stricter controls for cronjob deployment, ensuring thorough review and removal procedures.
Establish monitoring and alerting mechanisms to detect webhook irregularities promptly.
Document and communicate best practices to all team members involved in workflow management.
This discussion post serves as a record of the incident, its resolution, and our commitment to preventing similar occurrences in the future.
Beta Was this translation helpful? Give feedback.
All reactions