refactor download to use tower #47

aajtodd · 2024-08-30T15:11:49Z

Issue #, if available:
n/a

Description of changes:

The primary change in this PR is to refactor download internals from a fixed worker pool size to use tower instead. The motivation for using tower is to be able to build higher level abstractions that would have been difficult with the simple channel/worker approach. Additionally I cleaned some things up and added some new tests. I benchmarked this on a c5n.18xlarge and saw no regression in download times for a single 30 GB object downloaded to RAM. The peak throughput was actually consistently a bit higher (closer to 77-78 Gbps vs the previous ~72 Gbps).

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

graebm · 2024-09-03T22:25:06Z

aws-s3-transfer-manager/src/operation/download/handle.rs

    pub async fn join(mut self) -> Result<(), crate::error::Error> {
+        self.body.close();
        while let Some(join_result) = self.tasks.join_next().await {
            join_result?;


I'm confused whether DownloadHandle.join() is supposed to stop work ASAP or not.

Tracing the close() calls down, it ultimately calls https://docs.rs/tokio/latest/tokio/sync/mpsc/struct.Receiver.html#method.close

pub(crate) struct UnorderedBody { chunks: Option<mpsc::Receiver<Result<ChunkResponse, crate::error::Error>>>,

Does this stop new parts being started? Or does it just ignore the results of all further parts?

I'm confused whether DownloadHandle.join() is supposed to stop work ASAP or not.

Stop ASAP, no. join consumes the handle so it's invalid to access it after that. The semantics are to wait for all tasks to complete their work. If we want to cancel/abort we would want to add a method for that. I suspect we'll revisit some of this as we look at progress + cancellation.

Does this stop new parts being started? Or does it just ignore the results of all further parts?

This stops any new parts being sent on the body channel. It does not stop in-progress/scheduled work, the results will just be ignored/dropped. I don't think we want to cancel or abort when invoking join as a user could still be processing results off the channel (though I suppose closing the channel may be the wrong behavior as well 🤔 ).

Taking a step back what should the semantics of join be? If we don't close the channel we can result in join hanging forever if the body isn't drained because the channel could be full (which is why I added close originally). We don't know if the caller is going to drain the body though.

graebm

accidentally submitted my one questions as a standalone comment, but looks good.

ysaito1001

LGTM!

ysaito1001 · 2024-09-03T17:02:39Z

aws-s3-transfer-manager/examples/cp.rs

@@ -291,10 +291,23 @@ async fn write_body(mut body: Body, mut dest: fs::File) -> Result<(), BoxError>
    Ok(())
 }

-async fn warmup(config: &SdkConfig) -> Result<(), BoxError> {
+async fn warmup(config: &SdkConfig, bucket: &str) -> Result<(), BoxError> {


nit: can we add an explanatory comment on what we're warming up here?

ysaito1001 · 2024-09-03T21:49:17Z

aws-s3-transfer-manager/src/operation/download/handle.rs

    /// Consume the handle and wait for download transfer to complete
    pub async fn join(mut self) -> Result<(), crate::error::Error> {
+        self.body.close();


Curious, was there an issue before because of not closing the body?

Yes, see #47 (comment) and https://github.com/awslabs/aws-s3-transfer-manager-rs/pull/47/files#diff-fab97b3f33e97628187121734da44ab5d296637253f69fa037f3f6483319e0b0R150

aajtodd added 8 commits August 27, 2024 15:48

initial tower based download

0a26a49

remove hedging for now

528a7fa

cleanup

e296ca7

add integration test for download

05a2cfe

fix join

fe1c712

fix docs and remove error for channel closed

f865de3

remove use of send_with for download

2c99413

cleanup and commonize download with tranfer context

18b22fb

aajtodd requested a review from a team as a code owner August 30, 2024 15:11

aajtodd added 2 commits August 30, 2024 11:44

update rust versions in CI to match SDK

25e08c2

assoc type bounds requires newer rustc

e14426e

Velfi approved these changes Aug 30, 2024

View reviewed changes

graebm reviewed Sep 3, 2024

View reviewed changes

graebm approved these changes Sep 3, 2024

View reviewed changes

ysaito1001 approved these changes Sep 3, 2024

View reviewed changes

aajtodd merged commit f57573c into main Sep 6, 2024
14 checks passed

aajtodd deleted the atodd/tower branch September 6, 2024 13:49

waahm7 mentioned this pull request Sep 13, 2024

Refactor Upload to use Tower #50

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor download to use tower #47

refactor download to use tower #47

aajtodd commented Aug 30, 2024

graebm Sep 3, 2024

aajtodd Sep 4, 2024

aajtodd Sep 4, 2024

graebm left a comment

ysaito1001 left a comment

ysaito1001 Sep 3, 2024

ysaito1001 Sep 3, 2024

aajtodd Sep 4, 2024

refactor download to use tower #47

refactor download to use tower #47

Conversation

aajtodd commented Aug 30, 2024

graebm Sep 3, 2024

Choose a reason for hiding this comment

aajtodd Sep 4, 2024

Choose a reason for hiding this comment

aajtodd Sep 4, 2024

Choose a reason for hiding this comment

graebm left a comment

Choose a reason for hiding this comment

ysaito1001 left a comment

Choose a reason for hiding this comment

ysaito1001 Sep 3, 2024

Choose a reason for hiding this comment

ysaito1001 Sep 3, 2024

Choose a reason for hiding this comment

aajtodd Sep 4, 2024

Choose a reason for hiding this comment