Refactor Upload to use Tower #50

waahm7 · 2024-09-13T16:23:55Z

Description of changes:
This is my first Rust PR, so feel free to provide lots of feedback.

Similar to refactor download to use tower #47, this PR refactors the upload_part implementation from a fixed pool of workers to tower. This will allow us to easily implement and test higher-level abstractions like hedging for slow parts. I saw no performance difference with this.
This also refactors the upload pipeline to distinguish between the read_body and upload_part phases. read_body still uses a fixed pool of workers because we need to support unknown content length, and I couldn't figure out how to implement an unknown amount of work in tower. The upload_part phase uses tower's concurrency_limit layer to manage the concurrency.
I used two different JoinSets because they both have different return types, and upload_tasks requires a lock. If I combine them into one JoinSet, we run into a deadlock where we are still trying to read_body and join all the tasks. Both the join function and read_body need a lock to join or read the body and spawn upload_tasks. Please feel free to suggest if there is a better way to accomplish this.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

aajtodd

Couple minor things. I also think it would be nice to add some integration tests with a fake server at this point for upload like we did for download. Myself or Yuki can help with structuring that if needed.

aajtodd · 2024-09-16T12:40:32Z

aws-s3-transfer-manager/src/operation/upload/service.rs

+       + Send {
+    let svc = service_fn(upload_part_handler);
+    ServiceBuilder::new()
+        .concurrency_limit(ctx.handle.num_workers())


maybe add a TODO that this needs "globalized"

Thanks, I have added a FIXME.

aws-s3-transfer-manager/src/operation/upload/service.rs

aajtodd · 2024-09-16T12:45:28Z

aws-s3-transfer-manager/src/operation/upload/service.rs

+        };
+        let svc = svc.clone();
+        let task = async move { svc.oneshot(req).await }
+            .instrument(tracing::trace_span!("upload_part", worker = part_number));


fix: worker = part_number doesn't make sense here, either drop it or change it to part_number = part_number (I'm not sure if knowing the part number in the logs is helpful or not)

Thanks, I have updated it to part_number = part_number since I think it will be useful to know the part_number in logs.

aajtodd · 2024-09-16T12:48:23Z

aws-s3-transfer-manager/src/operation/upload/handle.rs

 use tokio::task;

 /// Response type for a single upload object request.
 #[derive(Debug)]
 #[non_exhaustive]
 pub struct UploadHandle {
    /// All child multipart upload tasks spawned for this upload
-    pub(crate) tasks: task::JoinSet<Result<Vec<CompletedPart>, crate::error::Error>>,
+    pub(crate) upload_tasks: Arc<Mutex<task::JoinSet<Result<CompletedPart, crate::error::Error>>>>,


This can probably be a regular mutex from stdlib. See https://doc.servo.org/tokio/sync/struct.Mutex.html#which-kind-of-mutex-should-you-use

I am not sure if we can make it a regular mutex. We do keep this lock across await points at https://github.com/awslabs/aws-s3-transfer-manager-rs/pull/50/files#diff-a98b4945e17362a1dcad7da7e15d7ef7af38ff5f88ae751261823a5f23bb3652R135.

Ahh I missed that.

aws-s3-transfer-manager/src/operation/upload/service.rs

ysaito1001 · 2024-09-23T21:40:31Z

aws-s3-transfer-manager/src/operation/upload/service.rs

+    for i in 0..n_workers {
+        let worker = read_body(
+            part_reader.clone(),
+            handle.ctx.clone(),
+            svc.clone(),
+            handle.upload_tasks.clone(),
+        )
+        .instrument(tracing::debug_span!("read_body", worker = i));
+        handle.read_tasks.spawn(worker);
+    }


Just to better understand, from the PR description:

This also refactors the upload pipeline to distinguish between the read_body and upload_part phases. read_body still uses a fixed pool of workers because we need to support unknown content length, and I couldn't figure out how to implement an unknown amount of work in tower.

Do these sentences imply that upload_part is NOT restricted by the fixed pool of workers, since it only refers to read_body still using a fixed pool of workers?

Upload part is restricted by a pool of workers, but instead of us explicitly managing a pool of workers where each worker reads a part_body and then uploads a part, we let tower manage it using the concurrency_limit layer at https://github.com/awslabs/aws-s3-transfer-manager-rs/pull/50/files#diff-a6c023261dc31765237ede4502d30ba640bca1ef9be58cb92e48ccdf69c4768cR72, and the pool of read_body workers simply spawns N upload_part tasks.

waahm7 added 23 commits September 9, 2024 14:22

add service.rs

ea1a26c

initial wip for upload_parts

b2818e7

clean up upload_parts

70d505e

more cleanup

4ac47d7

cleanup and comments

fefb266

remove one use variables

07cf4af

add todo for reading the body

5e6de70

separate read body tasks

27f87ee

refactor

67508ac

import undo

91bb163

renames

667ee03

comments

12d19a1

fix doc

001926b

async issue point

7e0f0af

at least the compiler is not complaining now

b8c8832

back to two joinsets due to deadlock

ae930c7

more cleanup

fc96c1d

fmt

a064a51

comments

1c1c44a

add upload_test filed

3fe0f61

remove file

a90195f

comment update

347ce5f

cleanup

b5a1e3e

waahm7 requested a review from a team as a code owner September 13, 2024 16:23

aajtodd reviewed Sep 16, 2024

View reviewed changes

ysaito1001 reviewed Sep 18, 2024

View reviewed changes

aws-s3-transfer-manager/src/operation/upload/service.rs Outdated Show resolved Hide resolved

PR Feedback

c754db6

aajtodd approved these changes Sep 23, 2024

View reviewed changes

ysaito1001 reviewed Sep 23, 2024

View reviewed changes

ysaito1001 approved these changes Sep 24, 2024

View reviewed changes

waahm7 merged commit e929fac into main Sep 24, 2024
13 checks passed

waahm7 deleted the upload-tower-multi-thread-body branch September 24, 2024 19:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor Upload to use Tower #50

Refactor Upload to use Tower #50

waahm7 commented Sep 13, 2024 •

edited

Loading

aajtodd left a comment

aajtodd Sep 16, 2024

waahm7 Sep 23, 2024

aajtodd Sep 16, 2024

waahm7 Sep 23, 2024

aajtodd Sep 16, 2024

waahm7 Sep 23, 2024

aajtodd Sep 23, 2024

ysaito1001 Sep 23, 2024

waahm7 Sep 23, 2024

Refactor Upload to use Tower #50

Refactor Upload to use Tower #50

Conversation

waahm7 commented Sep 13, 2024 • edited Loading

aajtodd left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

waahm7 commented Sep 13, 2024 •

edited

Loading