Rudderstack becomes extremely slow when you have one destination down

**Describe the bug**
We have data-plane running in k8s. There are 60 pods. When we see in Grafana that one of our Webhook destinations is down, then Rudderstack becomes extremely slow. Webhook delivery time increases dramatically, rt tables count increases from 2 to ~30 per PostgreSQL pod, webhook event sync lag time goes from 10 second to one hour almost.

**Steps to reproduce the bug**
Enter the steps to reproduce the behavior. 

1. Configure multiple webhooks as destinations for one source
2. Create a load
3. Fail one of the webhooks 
4. See how rt tables count, webhook delivery time and event sync lag time grows.

**Expected behavior**
When destination is down the system is still running fast and it doesn't affect other destinations.

**Screenshots**
<img width="835" alt="image" src="https://github.com/user-attachments/assets/d1ef3b6c-f9de-4185-a64d-23f9cdfd11e9">
<img width="1687" alt="image" src="https://github.com/user-attachments/assets/9d191efc-8edc-45c8-bbed-684ae0b42ee9">

**Any additional context**
Rudderstack version is 1.28.1

Please, tell us what to tweak so Rudderstack could work as usual at the times when one destination may go down. As well It'l be appreciated if you share how the retry logic actually works and why it affects other destinations.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rudderstack becomes extremely slow when you have one destination down #4953

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Rudderstack becomes extremely slow when you have one destination down #4953

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions