-
Notifications
You must be signed in to change notification settings - Fork 21
fix: treat webhook connection errors as delivery errors (retry) instead of system errors #600
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
* test: destwebhook.config.custom_headers tests * feat: implement destwebhook.config.custom_headers * feat: port custom headers implementation to destwebhookstandard * docs: openapi.yaml * chore: remove unused func * feat: key value map portal ui * fix: Fix custom header display and form logic * chore: Remove left over console log --------- Co-authored-by: Alexandre Bouchard <alexbouchardd@gmail.com>
* feat: port simple-json-match from js to golang * feat: destination model filter field * feat: destination filter api * refactor: centralize destination event matching * test: publishmq filter * test: e2e filter test * docs: openapi filter unset documentation * test: e2e destination api with filter * feat: portal filter ui * chore: Filter UI display improvements and syntax guide --------- Co-authored-by: Alexandre Bouchard <alexbouchardd@gmail.com>
…ilter (#599) * feat: implement portal ui config * chore: setup eslint & prettier for portal * chore: gofmt * docs: generate config
* test: e2e suite testing destination disable * fix: check alert threshold larger or equal
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
Can you clarify what the behavior is for say RabbitMQ if the queue is unavailable / has incorrect credentials? That should also retry. Anything that is outside of Outpost control really, should retry and not go to DLQ |
I can look into RabbitMQ later. What do you think if we move forward with this PR and have RabbitMQ as a follow-up? As a whole we need to be a bit more intentional about the error handling for every destination type. This PR focuses on webhook for now. If you prefer we handle everything in this PR, happy to do so as well. |
|
I'm just using RabbitMQ as an example of all other destination types here but no, I think that's fundamental design for the error handling where destination specific config shouldn't cause message to go to DQL and that's a generalized implementation rather then destination per destination |
|
If we don't have a connection to rabbitMQ (or any destination for that matter) it's a delivery error, not "pre delivery". It's not "Outpost" fault. The only things that would outpost fault would be unexpected data in the message, errors fetch the destination config (Redis calls), parsing or application logic errors |
|
Ultimately it's about who's responsible for the issue, the operator or the tenant |
|
I get that, I guess it doesn't quite click for me what's the simplest way to differentiate. For example, when we see an error as "DNS resolution" or "connection reset" error or "generic network error", is that 100% of the time destination error? |
|
Yes it is, or least we treat it that way since its transient |
|
okay noted, let me noodle on this and apply this idea to the other destinations. For now, I'd like to move forward with v0.10.0 without this PR, then focus on the CH implementation, then I'll get back to this. Do let me know if you prefer I prioritize this instead. |

fixes #571
Fix: Treat webhook connection errors as delivery errors (retry) instead of system errors (DLQ)
Problem
Connection errors (DNS failures, connection refused, etc.) were being sent to DLQ instead of being retried. This happened because these errors were classified as "pre-delivery errors" rather than "delivery errors".
Solution
Network errors are now treated as delivery errors, which means they will be acknowledged and scheduled for retry instead of being nacked and sent to DLQ.
Error Classification
dns_errorconnection_refusedconnection_resetnetwork_unreachabletimeouttls_errorredirect_errornetwork_errorcanceledThe fix applies to both
webhookandwebhook_standarddestination types.