-
Notifications
You must be signed in to change notification settings - Fork 733
Limit S3 client connections when using virtual threads #6369
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Signed-off-by: jorgee <jorge.ejarque@seqera.io>
✅ Deploy Preview for nextflow-docs-staging canceled.
|
Signed-off-by: Ben Sherman <bentshermann@gmail.com>
Signed-off-by: Ben Sherman <bentshermann@gmail.com>
I have removed the config options we discussed and updated the docs to show the desired state.
|
As commented last week, I got some
![]() As they are internal to the AWS SDK implementation, I think we can not currently fix it. So, I think the easier option would be to limit the number of publish transfers, as it is done in the file porter. We could have the same issues for other providers. However, in case we don't want to limit it, other options that could work is moving the S3 sync to async as well as adding a limit of concurrent async operations submitted to the async client, to limit the number of AsyncHandlers. |
not sure it's worth optimising for virtual threads until they are not property supported by the AWS sdk |
@jorgee is it possible to use a sync client for S3 transfers? then we could apply the semaphore to it and ensure that there are no more than a few hundred concurrent requests by limiting Right now it seems that the async client is allocating thousands of request handlers even with a smaller max concurrency, is that right? It seems like we should move towards sync clients anyway so that we can just control it with virtual threads + semaphore. It will be nice if the AWS SDK can provide the target throughput functionality using virtual threads, but we also need to find some kind of solution since many customers have been asking for it, and we have no control over AWS timeline. |
The async client is mandatory for the S3 transfer manager in AWS SDKv2. We should manage multi-part stuff in Nextflow again, and I think in sdk v1 we also managed some transfers with S3 transfer manager. So we should also rewrite them to not use the transfer manager. |
Well, I don't know if we should go that far. I doubt we'll be able to manage the transfers as well as the transfer manager. I have an idea that I'd like to try. I'll push it if it works |
Signed-off-by: Ben Sherman <bentshermann@gmail.com>
I managed to wrap the S3 transfer manager in a synchronous interface. It just calls the async client and immediately waits for the result. I also wrapped these calls in the semaphore limiter so that the async client is controlled directly by This should ensure that the async client doesn't create too many requests when using virtual threads. I also kept the target throughput setting so that users can control the maximum throughput, but it will still be limited by maxConnections. If AWS ever improves the SDK to support virtual threads natively, we need only remove the wrapper class while the config options remain the same. This way we don't have to wait for AWS to give us a solution to customer issues. |
@jorgee can you test the PR with your benchmark and see if the OOM errors are resolved? |
I will do. I will also try to test with this AWS SDK v2 preview version aws/aws-sdk-java-v2#6268 |
Signed-off-by: jorgee <jorge.ejarque@seqera.io>
Signed-off-by: jorgee <jorge.ejarque@seqera.io>
Signed-off-by: jorgee <jorge.ejarque@seqera.io>
Some updates regarding Virtual threads. 2- From the heap memory usage, the proposed @bentsherman change was limiting the memory because CRT handlers but not the XMLInputFactory. I finally found a solution by changing the stax implementation to woodstox So the current implementation has solved the memory issues for uploads and copies. For downloads, we just need to merge with PR #6402 because few big downloads can increase a lot the heap. @bentsherman, @pditommaso do you want to keep it in separate PRs or merge both in one? |
@claude review |
Smaller, well scope pr is better IMO |
Claude finished @pditommaso's task —— View job PR Review: Limit S3 client connections when using virtual threadsReview Tasks:
✅ Overall Assessment: Well-architected solutionThis PR successfully addresses the S3 timeout issues with virtual threads through a thoughtful architectural approach. The key insight is wrapping the async S3 transfer manager in a synchronous interface while using semaphores to limit concurrency. Key Strengths🎯 Excellent Semaphore Implementation (
|
…t virtual threads Signed-off-by: jorgee <jorge.ejarque@seqera.io>
The invasion of bot |
Signed-off-by: jorgee <jorge.ejarque@seqera.io>
good catch cursor! Fixed |
Signed-off-by: jorgee <jorge.ejarque@seqera.io>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some suggestions. Overall the docs are great. These suggestions change the tense to make them active and present, but otherwise awesome.
You can use the `aws.client.maxConnections` config option to control the maximum number of concurrent HTTP connections to S3. | ||
|
||
You can configure the concurrency and throughput of the S3 transfer manager manually using the `aws.client.maxConcurrency` and `aws.client.maxNativeMemory` configuration options. Alternatively, you can use the `aws.client.targetThroughputInGbps` option to set both values automatically based on a target throughput. | ||
You can also use the `aws.client.targetThroughputInGbps` option to control the concurrency of S3 uploads and downloads specifically, based on the available network bandwidth. This setting is `10` by default, which means that Nextflow performs S3 transfers concurrently up to 10 Gbps of network throughput, up to the maximum connection limit. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can also use the `aws.client.targetThroughputInGbps` option to control the concurrency of S3 uploads and downloads specifically, based on the available network bandwidth. This setting is `10` by default, which means that Nextflow performs S3 transfers concurrently up to 10 Gbps of network throughput, up to the maximum connection limit. | |
Use the `aws.client.targetThroughputInGbps` option to control the concurrency of S3 uploads and downloads based on the available network bandwidth. This setting defaults to `10`, which allows Nextflow to perform concurrent S3 transfers up to 10 Gbps of network throughput, limited by the maximum connection count. |
You can configure the concurrency and throughput of the S3 transfer manager manually using the `aws.client.maxConcurrency` and `aws.client.maxNativeMemory` configuration options. Alternatively, you can use the `aws.client.targetThroughputInGbps` option to set both values automatically based on a target throughput. | ||
You can also use the `aws.client.targetThroughputInGbps` option to control the concurrency of S3 uploads and downloads specifically, based on the available network bandwidth. This setting is `10` by default, which means that Nextflow performs S3 transfers concurrently up to 10 Gbps of network throughput, up to the maximum connection limit. | ||
|
||
Use these settings with virtual threads to achieve optimal performance for your environment. Increasing these settings beyond their defaults may improve performance for large runs. You can enable virtual threads by setting the `NXF_ENABLE_VIRTUAL_THREADS` environment variable to `true`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use these settings with virtual threads to achieve optimal performance for your environment. Increasing these settings beyond their defaults may improve performance for large runs. You can enable virtual threads by setting the `NXF_ENABLE_VIRTUAL_THREADS` environment variable to `true`. | |
Use these settings with virtual threads to achieve optimal performance for your environment. Increasing these settings beyond their defaults may improve performance for large runs. To enable virtual threads, set the `NXF_ENABLE_VIRTUAL_THREADS` environment variable to `true`. |
## Multi-part uploads | ||
|
||
Multi-part uploads are handled by the S3 transfer manager. You can use the `aws.client.minimumPartSize` and `aws.client.multipartThreshold` config options to control when and how multi-part uploads are performed. | ||
Nextflow uploads large files to S3 as multi-part uploads. You can use the `aws.client.minimumPartSize` and `aws.client.multipartThreshold` config options to control when and how multi-part uploads are performed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nextflow uploads large files to S3 as multi-part uploads. You can use the `aws.client.minimumPartSize` and `aws.client.multipartThreshold` config options to control when and how multi-part uploads are performed. | |
Nextflow uploads large files to S3 as multi-part uploads. Use the `aws.client.minimumPartSize` and `aws.client.multipartThreshold` configuration options to control when and how multi-part uploads are performed. |
close #4995
In SDK v1, S3 tranfers are managed by the transfer manager and a pool of threads to the transfers with a sync client. In SDK v2, the transfer manager is using an async client and the pool of threads is just used for preliminar work but not the tranfers. However, previous steps in publish dir ( check if is a directory, etc.) are using the sync client producing the same timeout errors.
This PR make the changes to limit the resources when using virtual threads is S3 transfers. A semaphore with a permit equal to the client's max connections is set in the Netflow's S3 client. It avoids to perform too many concurrent calls in the S3 client.
Tested with a pipeline with 15 tasks that generate 1000 files of 25 MB each one. It generated timeout errors when using in both 25.04 (SDK v1 and master (SDK v2). They disappear with this PR.