Make storage upload have no chunk size by default.#606
Make storage upload have no chunk size by default.#606dhermes wants to merge 1 commit intogoogleapis:masterfrom
Conversation
|
It hasn't been merged into the mainline yet, but I implemented a chunksize=None approach for uploads in gsutil's fork of apitools: https://github.com/GoogleCloudPlatform/gsutil/blob/master/gslib/third_party/storage_apitools/transfer.py#L773 However, because you're using httplib2, there are a number of other issues that you'll need to solve to get this to work. https://github.com/GoogleCloudPlatform/gsutil/blob/master/gslib/gcs_json_media.py has some examples. |
|
As for chunk size in general, the ideal thing is not to use chunks for uploads or downloads at all unless you have a specific reason to do so (for example, buffering a streaming transfer). |
|
Rather than passing in an |
|
Are we leaving the blob's own CHUNK_SIZE around, but just using it for downloads? Should we be making it an explicit parameter there, too? |
|
I plan on implementing the default I just wanted to make sure passing |
|
so i've finally got myself an official repo for apitools; i'm planning on finally doing a round-robin merge to get all the copies back in sync. in particular:
|
|
@craigcitro w00t! Thanks for the news. The main question is
which @thobrla indicates is only in upstream ( |
|
Ping @craigcitro
|
|
sorry, i missed this question earlier in the thread. i believe the code you're looking for is still upstream in gsutil, but i'm working on getting the various versions of apitools unified. |
|
@craigcitro Do you need help on the unification? |
|
UPDATE: Chatting with @craigcitro today (March 2, 2015) it seems the @tseaver Do you want to vendor in the latest updates from |
|
Yeah, I'll vendor it in. |
|
@craigcitro I've done the re-vendoring, but now lack context to get the last set of tests passing: ======================================================================
ERROR: test_upload_from_file_resumable (gcloud.storage.test_blob.Test_Blob)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/tseaver/projects/agendaless/Google/src/gcloud-python/gcloud/storage/test_blob.py", line 418, in test_upload_from_file_resumable
blob.upload_from_file(fh, rewind=True)
File "/home/tseaver/projects/agendaless/Google/src/gcloud-python/gcloud/storage/blob.py", line 354, in upload_from_file
finish_callback=lambda *args: None)
File "/home/tseaver/projects/agendaless/Google/src/gcloud-python/_gcloud_vendor/apitools/base/py/transfer.py", line 804, in StreamInChunks
additional_headers=additional_headers)
File "/home/tseaver/projects/agendaless/Google/src/gcloud-python/_gcloud_vendor/apitools/base/py/transfer.py", line 764, in __StreamMedia
'%d' % self.progress)
CommunicationError: Failed to transfer all bytes in chunk, upload paused at byte 4
======================================================================
FAIL: test_download_as_string (gcloud.storage.test_blob.Test_Blob)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/tseaver/projects/agendaless/Google/src/gcloud-python/gcloud/storage/test_blob.py", line 328, in test_download_as_string
self.assertEqual(fetched, b'abcdef')
AssertionError: 'abc' != 'abcdef'
- abc
+ abcdef
======================================================================
FAIL: test_download_to_file (gcloud.storage.test_blob.Test_Blob)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/tseaver/projects/agendaless/Google/src/gcloud-python/gcloud/storage/test_blob.py", line 272, in test_download_to_file
self.assertEqual(fh.getvalue(), b'abcdef')
AssertionError: 'abc' != 'abcdef'
- abc
+ abcdef
======================================================================
FAIL: test_download_to_filename (gcloud.storage.test_blob.Test_Blob)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/tseaver/projects/agendaless/Google/src/gcloud-python/gcloud/storage/test_blob.py", line 307, in test_download_to_filename
self.assertEqual(wrote, b'abcdef')
AssertionError: 'abc' != 'abcdef'
- abc
+ abcdef
----------------------------------------------------------------------It looks as though code which deals with explicitly-chunked stuff from the server side is now broken. |
|
is the easiest way for me to play with it just to clone your branch? |
|
Likely so: I don't want to make it a PR here until things pass. Just run |
|
@tseaver @craigcitro I'm happy to take up the vendor-ing in process of |
|
i think he was hitting test failures on upload/download; i'm intending to try it, but i've been believing that for 3+ weeks. 😉 |
|
I can babysit you if you like. |
|
my issue is that it hasn't hit the top of my todo list -- currently heads-down on something else completely. gsutil is currently using the version of apitools at master, so i'm reasonably confident in that code. i just need to drop the new version in and see why it's failing for veneer. |
|
Cool |
|
#755 will lead to de-vendoring Once that lands, we can re-visit this one. |
|
I agree. After hacking on #754 and a late night chat with Craig, it seems that is the best way forward. |
|
Yes this PR is the one that made us realize #811 was "necessary" (more like was an option). |
|
Closing this PR and going to take the new approach of just making |
Co-authored-by: Anthonios Partheniou <partheniou@google.com>
Co-authored-by: Anthonios Partheniou <partheniou@google.com>
Co-authored-by: release-please[bot] <55107282+release-please[bot]@users.noreply.github.com>
Source-Link: googleapis/synthtool@050953d Post-Processor: gcr.io/cloud-devrel-public-resources/owlbot-python:latest@sha256:65e656411895bff71cffcae97246966460160028f253c2e45b7a25d805a5b142 Co-authored-by: Owl Bot <gcf-owl-bot[bot]@users.noreply.github.com>
* chore(deps): update all dependencies to v4 * 🦉 Updates from OwlBot post-processor See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md * revert Co-authored-by: Owl Bot <gcf-owl-bot[bot]@users.noreply.github.com> Co-authored-by: Anthonios Partheniou <partheniou@google.com>
Fixes #546.
@tseaver @thobrla
I took a stab at the issue but am still unsure of a few things:
apitoolsactually attempt a full upload whenchunksize=None? (/cc @craigcitro)use_chunked=Falseand then just use the chunk size on theBlob?Also @thobrla I'm curious, RE: "We should allow the user to override the chunk size". Do we need a default chunk size for downloads? In the case that someone creates an object with a custom chunk size, should we use it in this method or still try the "all in one go" approach?