Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make max connections and acquire timeout configurable on S3 sink client #4949

Closed
oeyh opened this issue Sep 16, 2024 · 1 comment · Fixed by #4959
Closed

Make max connections and acquire timeout configurable on S3 sink client #4949

oeyh opened this issue Sep 16, 2024 · 1 comment · Fixed by #4959
Assignees
Labels
enhancement New feature or request
Milestone

Comments

@oeyh
Copy link
Collaborator

oeyh commented Sep 16, 2024

Is your feature request related to a problem? Please describe.
During load test on RDS source pipeline, which uses s3 sink to send data to S3 buffer, noticed this error:

2024-09-09T17:26:14.680 [sdk-async-response-5-20] ERROR org.opensearch.dataprepper.plugins.sink.s3.S3SinkService - Exception occurred while uploading records to s3 bucket: software.amazon.awssdk.core.exception.SdkClientException: Unable to execute HTTP request: Acquire operation took longer than the configured maximum time. This indicates that a request cannot get a connection from the pool within the specified maximum time. This can be due to high request rate.
Consider taking any of the following actions to mitigate the issue: increase max connections, increase acquire timeout, or slowing the request rate.
Increasing the max connections can increase client throughput (unless the network interface is already fully utilized), but can eventually start to hit operation system limitations on the number of file descriptors used by the process. If you already are fully utilizing your network interface or cannot further increase your connection count, increasing the acquire timeout gives extra time for requests to acquire a connection before timing out. If the connections doesn't free up, the subsequent requests will still timeout.
If the above mechanisms are not able to fix the issue, try smoothing out your requests so that large traffic bursts cannot overload the client, being more efficient with the number of times you need to call AWS, or by increasing the number of hosts sending requests.

The default value for max connections is 50 and acquire timeout is 10s.

Describe the solution you'd like
Make max connections and acquire timeout configurable in the pipeline config on S3 sink client

...
sink:
  - s3:
      s3_client:
        max_connections: 100
        acquire_timeout: 10s
...

It's also good to have the sdk metrics enabled on the client.

Describe alternatives you've considered (Optional)
N/A

Additional context
N/A

@oeyh oeyh added the untriaged label Sep 16, 2024
@dlvenable dlvenable added enhancement New feature or request and removed untriaged labels Sep 17, 2024
@dlvenable
Copy link
Member

Maybe we can just name this client?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Development

Successfully merging a pull request may close this issue.

2 participants