Skip to content

Conversation

alextreichler
Copy link

In certain situations it is possible where a connection is ESTABLISHED, but for some reason, such as a network disconnect, the data is still sending to an aws service such as sqs or dynamodb, but those services are not responding back. Typically in this case, it will continue to retry sending the data up to the number set by the kernel setting for tcp_retries2, typically 15, which means it takes ~ 15-16mins before the connection is terminated.

This can cause the input, such as consumers to stop consuming from a topic/partition from Redpanda until that connection is killed and redpanda connect re-establishes a new connection.

Instead of modifying a systemwide kernel setting, it is possible to pass a value on the application level using TCP_USER_TIMEOUT. This will take president over the tcp_retries2. This will make sure users don't have to modify a systemwide kernel setting and have more control specifically for redpanda connect in these cases.

@CLAassistant
Copy link

CLAassistant commented Oct 1, 2025

CLA assistant check
All committers have signed the CLA.

@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@alextreichler alextreichler force-pushed the add-tcp-user-timeout-sqs-output branch 2 times, most recently from 52f0343 to 5ecff0b Compare October 2, 2025 00:42
Copy link
Collaborator

@Jeffail Jeffail left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @alextreichler, just a few comments, the docs look great. One thing we'd need to be careful of is the possibility of other fields in future also wanting to override the HTTP client. We could be proactive here and refactor this setting to be more of a general purpose HTTP customization function that other fields could contribute to, something like httpClientFromConfig(p *service.ParsedConfig) so it's more clear that all HTTP client fields go there.

@alextreichler alextreichler force-pushed the add-tcp-user-timeout-sqs-output branch from 3978c73 to 2b55138 Compare October 2, 2025 20:40
@alextreichler
Copy link
Author

Based on Ash's earlier mention, I create an http_client.go file that is used to create the custom http client, which is called within the session.go file.

@alextreichler alextreichler force-pushed the add-tcp-user-timeout-sqs-output branch from 2b55138 to 9e7ab00 Compare October 2, 2025 21:08
@mmatczuk
Copy link
Collaborator

mmatczuk commented Oct 3, 2025

Could we have it not limited to AWS? I wonder could we perhaps have a tcp section in the main config and then tcp config object in components that support this for overwriting? This would allow to group and add more fields in the future instead of adding ad-hoc fields in multiple places.

We should have a dedicated library component that decorates / creates net.Dialer.

Possibly event this should be part of benthos?

@alextreichler alextreichler requested a review from Jeffail October 3, 2025 19:48
@alextreichler alextreichler force-pushed the add-tcp-user-timeout-sqs-output branch 2 times, most recently from 79f0fb7 to 0b199a4 Compare October 5, 2025 14:53
@alextreichler alextreichler marked this pull request as draft October 5, 2025 14:54
@alextreichler
Copy link
Author

Based on Michal M, comment - I ended up changing this PR to "draft" and I added a PR in benthos framwork to add a new package called "netclient" in order for other components to use that within their project. So, this is why I switched it to draft as the other one needs to be reviewed first.

redpanda-data/benthos#291

@alextreichler alextreichler force-pushed the add-tcp-user-timeout-sqs-output branch from 0b199a4 to 794b632 Compare October 8, 2025 21:14
Update internal/impl/aws/session.go

Co-authored-by: Ashley Jeffs <ash@jeffail.uk>
@alextreichler alextreichler force-pushed the add-tcp-user-timeout-sqs-output branch from 794b632 to 092b258 Compare October 8, 2025 21:52
@mmatczuk
Copy link
Collaborator

Closing this as work moved to Benthos.

@mmatczuk mmatczuk closed this Oct 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants