[Storage] Refactor storage and fix data transfer service #1239

Michaelvll · 2022-10-12T20:46:42Z

Adopt changes for storage from #1152.

Tested:

tests/run_smoke_tests.sh TestStorageWithCredentials
The following command

gsutil mb gs://sky-imagenet-bucket-gcp
python - <<EOF
from sky.data import data_transfer
data_transfer.s3_to_gcs('sky-imagenet-bucket', 'sky-imagenet-bucket-gcp') 
EOF

romilbhardwaj

Thanks for fixing this @Michaelvll!

sky/data/storage.py

sky/data/data_transfer.py

romilbhardwaj · 2022-10-14T00:26:31Z

sky/data/data_transfer.py

+    response = storagetransfer.transferJobs().create(
+        body=transfer_job).execute()
+    operation = storagetransfer.transferJobs().run(jobName=response['name'],
+                                                   body={
+                                                       'projectId': project_id
+                                                   }).execute()


Curious - are there any benefits other than readability to calling transferJobs().create() followed by transferJobs().run() instead of setting the schedule field to current time and only calling transferJobs().create() like we had before?

(Just to be clear, I prefer the former as we have now, but curious if there's some other reason to do it this way)

The main reason for using the current interactive way to start the job is that we can get the name of the operation for the run().execute() (this is different from the name of the submitted TransferJob, in the response). With the name, we don't have to list all the running transfering jobs and find out the correct operation name that is scheduled by the cloud in L98 below. Also, since we will only run the TransferJob once and in a blocking manner, I feel like having a schedule field in the specification can be a bit misleading.

sky/data/data_transfer.py

romilbhardwaj

lgtm! Thanks for fixing this!

…g#1239) * Refactor storage and fix data transfer service * fix UX for the data transfer * UX fixes * Address comments

Michaelvll added 3 commits October 12, 2022 13:40

Refactor storage and fix data transfer service

39e6459

fix UX for the data transfer

7ac775f

UX fixes

63a0248

concretevitamin requested a review from romilbhardwaj October 12, 2022 22:59

romilbhardwaj reviewed Oct 14, 2022

View reviewed changes

Address comments

cd114b2

romilbhardwaj approved these changes Oct 16, 2022

View reviewed changes

Michaelvll merged commit 2d4fee3 into master Oct 16, 2022

Michaelvll deleted the data-transfer branch October 16, 2022 23:40

ewzeng pushed a commit to ewzeng/skypilot that referenced this pull request Oct 24, 2022

[Storage] Refactor storage and fix data transfer service (skypilot-or…

c513324

…g#1239) * Refactor storage and fix data transfer service * fix UX for the data transfer * UX fixes * Address comments

ewzeng pushed a commit to ewzeng/skypilot that referenced this pull request Oct 24, 2022

[Storage] Refactor storage and fix data transfer service (skypilot-or…

9eb90b4

…g#1239) * Refactor storage and fix data transfer service * fix UX for the data transfer * UX fixes * Address comments

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Storage] Refactor storage and fix data transfer service #1239

[Storage] Refactor storage and fix data transfer service #1239

Michaelvll commented Oct 12, 2022 •

edited

Loading

romilbhardwaj left a comment

romilbhardwaj Oct 14, 2022

Michaelvll Oct 16, 2022

romilbhardwaj left a comment

[Storage] Refactor storage and fix data transfer service #1239

[Storage] Refactor storage and fix data transfer service #1239

Conversation

Michaelvll commented Oct 12, 2022 • edited Loading

romilbhardwaj left a comment

Choose a reason for hiding this comment

romilbhardwaj Oct 14, 2022

Choose a reason for hiding this comment

Michaelvll Oct 16, 2022

Choose a reason for hiding this comment

romilbhardwaj left a comment

Choose a reason for hiding this comment

Michaelvll commented Oct 12, 2022 •

edited

Loading