Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to set blob metadata #1185

Closed
remcohaszing opened this issue Oct 15, 2015 · 22 comments
Closed

Unable to set blob metadata #1185

remcohaszing opened this issue Oct 15, 2015 · 22 comments
Assignees
Labels
api: storage Issues related to the Cloud Storage API. priority: p2 Moderately-important priority. Fix may not be included in next release.

Comments

@remcohaszing
Copy link

I am trying to set custom metadata on a blob. However, this does not work. I'm using the following script:

#!/usr/bin/env python3
import os

from gcloud import storage

client = storage.Client('super secret id')
bucket = client.get_bucket('super secret bucket')

blob = bucket.blob('kirby.png')
blob.metadata = dict(Color='Pink')
with open(os.path.expanduser('~/Pictures/kirby.png'), 'rb') as img_data:
    blob.upload_from_file(img_data)

When retrieving the blob or inspecing it in the developer console, the metadata is still unset.

@tseaver
Copy link
Contributor

tseaver commented Oct 15, 2015

Setting the metadata locally does not cause the instance to make the API call to patch it on the server, and we don't send metadata along with the upload_from_file() call. You need to add a call to blob.patch() to persist the changes to blob.metadata.

@tseaver tseaver added the api: storage Issues related to the Cloud Storage API. label Oct 15, 2015
@remcohaszing
Copy link
Author

I modified the script:

#!/usr/bin/env python3
import os

from gcloud import storage

client = storage.Client('dc-app-api')
bucket = client.get_bucket('appsemble-dev')

blob = bucket.blob('kirby.png')
with open(os.path.expanduser('~/Pictures/kirby.png'), 'rb') as img_data:
    blob.upload_from_file(img_data)
blob.metadata = {'Color': 'Pink', 'Spam': 'Cheese'}
blob.metadata['Foo'] = 'Bar'
blob.patch()

This sets the Color and Spam metadata fields, but not the Foo field. There is no warning about this whatsoever.

Also there is no documentation on the Blob class on how to update metadata.

@dhermes
Copy link
Contributor

dhermes commented Oct 15, 2015

Thanks for reporting. Looking into it now.

@dhermes
Copy link
Contributor

dhermes commented Oct 15, 2015

Ahhh. This is because the Blob.metadata getter returns a copy.deepcopy of the dictionary stored, so doing

blob.metadata['Foo'] = 'Bar'

updates the copy, not the dictionary actually stored on the Blob. We should fix this.

@tseaver WDYT?

@tseaver
Copy link
Contributor

tseaver commented Oct 15, 2015

To allow mutation of the metadata in-place (as opposed to assigning to it wholesale), we would need a way to detect those mutations, in order to mark it as changed (for a subsequent call to blob.patch() to pick up).

@dhermes
Copy link
Contributor

dhermes commented Oct 15, 2015

Right. Custom dict subclass?

@tseaver
Copy link
Contributor

tseaver commented Oct 15, 2015

Right, overriding:

  • __delitem__
  • __setitem__
  • clear
  • update
  • setdefault
  • pop
  • popitem

@remcohaszing
Copy link
Author

A custom dict class could work for the examples I provided. This would still not make a dict behave as expected tough. Since dicts are mutable, I would expect the following to work:

metadata = {'Holy': 'Grail'}
blob.metadata = metadata
metadata['Ministry'] = 'Silly Walks'
blob.patch()

If the setter of blob.metadata would return this new class, this would no longer refer to the original dict that has been set.

@remcohaszing
Copy link
Author

I think this subclass could work if no setter is defined. An empty metadata object could be set in the Blob constructor. This exposes the metadata as if it is a regular dict. This would break with the current behaviour, where setting metadata is allowed.

@dhermes
Copy link
Contributor

dhermes commented Oct 15, 2015

The fact that the getter returns a copy is not set in stone. We did it to "protect" the integrity of the actual data own by the class, but didn't think through situations like this.

We don't need a getter, but do need a setter to track changes on the object. Can't have a setter without a getter, so that's just how it's gotta be. (Also, all properties are stored on a non-public attribute called _properties, another reason for a getter.)

@remcohaszing
Copy link
Author

I suppose it could look something like this:

class Metadata(dict):
    def __init__(self, blob):
        self._blob = blob

    def __setitem__(self, key, value):
        super(self, Metadata).__setitem__(key, value)
        self._blob._patch_property('metadata', self)

    # XXX Repeat pattern for other modifier functions


class Blob(_PropertyMixin):
    ...

    def __init__(self, ...)
        ...
        self._properties['metadata'] = Metadata(self)

    @property
    def metadata(self):
        return self._properties['metadata']

This marks the metadata as modified if it gets edited and ensures the metadata can't be overridden using a setter.

A side effect is that the metadata property is never None. I personally like that better anyway.

@saxenanurag
Copy link

blob.upload_from_file()
blob.metadata = {'luke': 'skywalker'}
blob.patch()

worked for me. I was able to upload a file and apply the meta data after the file upload.

@tseaver
Copy link
Contributor

tseaver commented Dec 16, 2015

@saxenanurag I'm in the middle of reviewing how we are using the ex-apitools code (now in gcloud.streaming), and can see how we should be able to send the patched metadata along with the upload request.

@dhermes
Copy link
Contributor

dhermes commented Dec 17, 2015

👍 What he said 😀

@saxenanurag
Copy link

Thanks.

@pdknsk
Copy link

pdknsk commented Mar 24, 2016

I think this is a duplicate of #754. The use of metadata is a bit ambiguous as metadata refers to both the metadata of an object (such as Cache-Control) and the metadata property, meant for custom metadata.

Anyway, to the issue. Any update? The problem with patch is that the object is online briefly with wrong metadata, or less briefly in case of a connection failure between two requests. And, it increases metageneration.

>>> text = bucket.blob('file.txt')
>>> text.cache_control = 'no-cache'
>>> text.upload_from_string('text')
>>> text.reload()
>>> text.cache_control
>>> text.metageneration
1
>>> text.cache_control = 'no-cache'
>>> text.patch()
>>> text.reload()
>>> text.cache_control
u'no-cache'
>>> text.metageneration
2

@pdknsk
Copy link

pdknsk commented Mar 19, 2017

I've updated a previous patch, hack perhaps, which makes it work. It's not tested extensively.

--- a/blob.py
+++ b/blob.py
@@ -494,6 +494,11 @@
 
         # Configure the upload request parameters.
         request = Request(upload_url, 'POST', headers)
+
+        if self._properties:
+            headers['content-type'] = 'application/json'
+            request.body = json.dumps(self._properties)
+
         upload.configure_request(upload_config, request, url_builder)
 
         # Configure final URL

The file google/cloud/storage/blob.py is located in your Python site-packages directory.

>>> text = bucket.blob('file.txt')
>>> text.cache_control = 'no-cache'
>>> text.metadata = {'custom': 'metadata'}
>>> text.upload_from_string('text')
>>> text.reload()
>>> text.cache_control
u'no-cache'
>>> text.metadata
{u'custom': u'metadata'}
>>> text.metageneration
1
>>> text.download_as_string()
'text'

@lukesneeringer lukesneeringer self-assigned this Mar 20, 2017
@lukesneeringer lukesneeringer added the priority: p2 Moderately-important priority. Fix may not be included in next release. label Apr 19, 2017
@tseaver
Copy link
Contributor

tseaver commented Jun 5, 2017

#3362 passes along all non-ACL blob properties with the (initial or only) upload request.

@tseaver tseaver closed this as completed Jun 5, 2017
@gjkemnitz
Copy link

gjkemnitz commented Oct 20, 2018

You can work around this bug by creating a completely new dictionary, assigning it to blob.metadata, and doing blob.patch():

newdict={}
for x in blob.metadata:
    newdict[x] = blob.metadata[x]
    newdict['mynewproperty'] = 'awesome'
blob.metadata=newdict
blob.patch()

Your blob will now have the {'mynewproperty':'awesome'} metadata property in its stored metadata list on GCS.

@Prabakaran1410
Copy link

With patch() the metadata of a file is getting updated properly. But this has two operations,

  1. First upload a file with no metadata
  2. Then updating the metadata using patch()

Is there any option to upload the file along with its metadata? this should be a single step process

@tseaver
Copy link
Contributor

tseaver commented Nov 7, 2019

@Prabakaran1410 You don't need to call patch on the blob after upload: since PR #3362, all properties assigned to the blob before upload are passed through.

@paul121
Copy link

paul121 commented Jul 30, 2020

I had some trouble getting this to work because the metadata was always None (even though I manually added values via the Storage UI). Later I realized I was using bucket.blob("name") rather than bucket.get_blob("name") so the metadata hadn't been loaded. Might help someone else.

parthea pushed a commit that referenced this issue Jul 6, 2023
…loudPlatform/python-docs-samples#1185)

* Add Snippet for Listing All Subscriptions in a Project

* Fixed the failed tests on Kokoro

classify_text_tutorial_test.py::test_query_text
classify_text_tutorial_test.py::test_query_category
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: storage Issues related to the Cloud Storage API. priority: p2 Moderately-important priority. Fix may not be included in next release.
Projects
None yet
Development

No branches or pull requests

9 participants