fix: treat webhook connection errors as delivery errors (retry) instead of system errors #600

alexluong · 2025-12-16T12:54:33Z

fixes #571

Fix: Treat webhook connection errors as delivery errors (retry) instead of system errors (DLQ)

Problem

Connection errors (DNS failures, connection refused, etc.) were being sent to DLQ instead of being retried. This happened because these errors were classified as "pre-delivery errors" rather than "delivery errors".

Solution

Network errors are now treated as delivery errors, which means they will be acknowledged and scheduled for retry instead of being nacked and sent to DLQ.

Error Classification

Code	Cause	Type	Behavior
`dns_error`	Domain doesn't exist	Delivery Error	ack + retry
`connection_refused`	Server not running	Delivery Error	ack + retry
`connection_reset`	Connection dropped	Delivery Error	ack + retry
`network_unreachable`	Network unavailable	Delivery Error	ack + retry
`timeout`	I/O timeout or deadline	Delivery Error	ack + retry
`tls_error`	Certificate/TLS failure	Delivery Error	ack + retry
`redirect_error`	Too many redirects	Delivery Error	ack + retry
`network_error`	Other network issues	Delivery Error	ack + retry
`canceled`	Context canceled (shutdown)	System Error	nack + requeue

The fix applies to both webhook and webhook_standard destination types.

* test: destwebhook.config.custom_headers tests * feat: implement destwebhook.config.custom_headers * feat: port custom headers implementation to destwebhookstandard * docs: openapi.yaml * chore: remove unused func * feat: key value map portal ui * fix: Fix custom header display and form logic * chore: Remove left over console log --------- Co-authored-by: Alexandre Bouchard <alexbouchardd@gmail.com>

* feat: port simple-json-match from js to golang * feat: destination model filter field * feat: destination filter api * refactor: centralize destination event matching * test: publishmq filter * test: e2e filter test * docs: openapi filter unset documentation * test: e2e destination api with filter * feat: portal filter ui * chore: Filter UI display improvements and syntax guide --------- Co-authored-by: Alexandre Bouchard <alexbouchardd@gmail.com>

…ilter (#599) * feat: implement portal ui config * chore: setup eslint & prettier for portal * chore: gofmt * docs: generate config

* test: e2e suite testing destination disable * fix: check alert threshold larger or equal

vercel · 2025-12-16T12:54:38Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Review	Updated (UTC)
outpost-docs	Ready	Preview, Comment	Dec 16, 2025 0:54am
outpost-website	Ready	Preview, Comment	Dec 16, 2025 0:54am

alexbouchardd · 2025-12-16T14:38:22Z

The fix applies to both webhook and webhook_standard destination types.

Can you clarify what the behavior is for say RabbitMQ if the queue is unavailable / has incorrect credentials? That should also retry. Anything that is outside of Outpost control really, should retry and not go to DLQ

alexluong · 2025-12-16T14:41:57Z

Can you clarify what the behavior is for say RabbitMQ if the queue is unavailable / has incorrect credentials?

I can look into RabbitMQ later. What do you think if we move forward with this PR and have RabbitMQ as a follow-up?

As a whole we need to be a bit more intentional about the error handling for every destination type. This PR focuses on webhook for now.

If you prefer we handle everything in this PR, happy to do so as well.

alexbouchardd · 2025-12-16T15:41:29Z

I'm just using RabbitMQ as an example of all other destination types here but no, I think that's fundamental design for the error handling where destination specific config shouldn't cause message to go to DQL and that's a generalized implementation rather then destination per destination

alexluong · 2025-12-16T15:49:32Z

Here's the delivery logic

deliverymq message handler:
  pre_delivery_err, delivery_err = deliver(destination, event)
  if pre_delivery_err -> nack
  if delivery_err -> schedule retry
  ack

It just happens that in webhook, the HTTP request is splitted into 2 steps. I am also just now realized that this PR basically will ignore cases where the system loses network, or if its DNS resolver has issue. That's why I said it's not that simple, at least from what I'm seeing.

For example, this is the step in RabbitMQ that would qualify as PreDelivery

This could be because auth issue. Or RabbitMQ destination is down. But what if it's because some system issue? How can we differentiate?

alexbouchardd · 2025-12-16T19:09:30Z

If we don't have a connection to rabbitMQ (or any destination for that matter) it's a delivery error, not "pre delivery". It's not "Outpost" fault. The only things that would outpost fault would be unexpected data in the message, errors fetch the destination config (Redis calls), parsing or application logic errors

alexbouchardd · 2025-12-16T19:10:07Z

Ultimately it's about who's responsible for the issue, the operator or the tenant

alexluong · 2025-12-16T19:23:12Z

I get that, I guess it doesn't quite click for me what's the simplest way to differentiate.

For example, when we see an error as "DNS resolution" or "connection reset" error or "generic network error", is that 100% of the time destination error?

alexbouchardd · 2025-12-17T00:59:28Z

Yes it is, or least we treat it that way since its transient

alexluong · 2025-12-17T07:57:00Z

okay noted, let me noodle on this and apply this idea to the other destinations.

For now, I'd like to move forward with v0.10.0 without this PR, then focus on the CH implementation, then I'll get back to this. Do let me know if you prefer I prioritize this instead.

alexluong and others added 7 commits December 16, 2025 15:06

chore: portal feature flag for webhook custom headers & destination f…

985df7f

…ilter (#599) * feat: implement portal ui config * chore: setup eslint & prettier for portal * chore: gofmt * docs: generate config

fix: check alert threshold larger or equal (#597)

6c43912

* test: e2e suite testing destination disable * fix: check alert threshold larger or equal

fix: improve destwebhook delivery error handling

10d96de

refactor: http request helpers

c367dd8

chore: update destwebhookstandard with same error handling logic

f1dbec3

alexluong changed the title ~~Dlq~~ fix: treat webhook connection errors as delivery errors (retry) instead of system errors Dec 16, 2025

alexluong changed the base branch from main to v0.10.0 December 16, 2025 12:55

Base automatically changed from v0.10.0 to main December 17, 2025 13:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: treat webhook connection errors as delivery errors (retry) instead of system errors #600

fix: treat webhook connection errors as delivery errors (retry) instead of system errors #600

Uh oh!

alexluong commented Dec 16, 2025

Uh oh!

vercel bot commented Dec 16, 2025

Uh oh!

alexbouchardd commented Dec 16, 2025

Uh oh!

alexluong commented Dec 16, 2025

Uh oh!

alexbouchardd commented Dec 16, 2025 •

edited

Loading

Uh oh!

alexluong commented Dec 16, 2025

Uh oh!

alexbouchardd commented Dec 16, 2025

Uh oh!

alexbouchardd commented Dec 16, 2025 •

edited

Loading

Uh oh!

alexluong commented Dec 16, 2025

Uh oh!

alexbouchardd commented Dec 17, 2025

Uh oh!

alexluong commented Dec 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

fix: treat webhook connection errors as delivery errors (retry) instead of system errors #600

Are you sure you want to change the base?

fix: treat webhook connection errors as delivery errors (retry) instead of system errors #600

Uh oh!

Conversation

alexluong commented Dec 16, 2025

Fix: Treat webhook connection errors as delivery errors (retry) instead of system errors (DLQ)

Problem

Solution

Error Classification

Uh oh!

vercel bot commented Dec 16, 2025

Uh oh!

alexbouchardd commented Dec 16, 2025

Uh oh!

alexluong commented Dec 16, 2025

Uh oh!

alexbouchardd commented Dec 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alexluong commented Dec 16, 2025

Uh oh!

alexbouchardd commented Dec 16, 2025

Uh oh!

alexbouchardd commented Dec 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alexluong commented Dec 16, 2025

Uh oh!

alexbouchardd commented Dec 17, 2025

Uh oh!

alexluong commented Dec 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

alexbouchardd commented Dec 16, 2025 •

edited

Loading

alexbouchardd commented Dec 16, 2025 •

edited

Loading