Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Google Cloud Storage backend using the gcloud-python library #236

Merged
merged 28 commits into from
Apr 19, 2017
Merged

Add Google Cloud Storage backend using the gcloud-python library #236

merged 28 commits into from
Apr 19, 2017

Conversation

scjody
Copy link
Contributor

@scjody scjody commented Dec 14, 2016

Based on #146

This uses the native gcloud-python library, which is recommended by Google for production setups and allows finer-grained authentication than current solutions (which only work when Google Cloud Storage is used in S3 compatibility mode).

@codecov-io
Copy link

codecov-io commented Dec 14, 2016

Codecov Report

Merging #236 into master will increase coverage by 2.44%.
The diff coverage is 96.23%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #236      +/-   ##
==========================================
+ Coverage   72.87%   75.32%   +2.44%     
==========================================
  Files          10       11       +1     
  Lines        1379     1552     +173     
==========================================
+ Hits         1005     1169     +164     
- Misses        374      383       +9
Impacted Files Coverage Δ
storages/backends/gs.py 72.22% <100%> (+0.79%) ⬆️
storages/utils.py 100% <100%> (ø) ⬆️
storages/backends/s3boto.py 87.54% <100%> (-0.66%) ⬇️
storages/backends/gcloud.py 95.65% <95.65%> (ø)
storages/backends/apache_libcloud.py 0% <0%> (ø) ⬆️
storages/backends/s3boto3.py 85.13% <0%> (+0.13%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 304ac43...7ee116b. Read the comment docs.

self._file = None


class GoogleCloudStorage(Storage):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eirsyl The existing backends all use the @deconstructible decorator here but I'm not sure why. Commit comments suggest it has something to do with migrations but I'm able to migrate to a FileField using this backend for storage with this code as is. Therefore I'm inclined to leave the decorator out and if someone needs it, they can add it. What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Never mind, I figured out why the decorator is needed and added it.

@scjody
Copy link
Contributor Author

scjody commented Jan 5, 2017

I reviewed the code and made some minor changes. It looks good - very close to the S3 implementation, but S3 and Google Cloud Storage are very close to begin with.

Onto tests!

@scjody
Copy link
Contributor Author

scjody commented Jan 11, 2017

@eirsyl @jschneier Does this look reasonable?

Next steps on our end are that we'll be using this in our staging environment within the next 2 weeks.

@eirsyl
Copy link
Contributor

eirsyl commented Jan 14, 2017

Great work! Looks good to me 👍

@scjody
Copy link
Contributor Author

scjody commented Jan 30, 2017

@jschneier We've been using this successfully in our staging environment for several days. Would you be able to merge it soon please, or let me know what's still missing?

@scjody
Copy link
Contributor Author

scjody commented Feb 6, 2017

@jschneier Can you please take a look at this? The README says to bug you until it's merged, so that's what I'm doing 😸. It also said to add myself to AUTHORS so I added myself and @eirsyl. @eirsyl please let me know if you want a different name or don't want to be credited.

@scjody
Copy link
Contributor Author

scjody commented Feb 10, 2017

@jschneier Will you have time to look at this sometime soon please?

@ZuluPro Is this something you could look at instead? (I can't tell if you have write access but it looks like you might.)

@jschneier
Copy link
Owner

jschneier commented Feb 10, 2017 via email

@sherbondy
Copy link

I think there might be a renewed interest in getting this PR merged given the events that unfolded today :/

@scjody
Copy link
Contributor Author

scjody commented Mar 6, 2017

@jschneier Just a reminder: can you please review this soon? I've updated AUTHORS again to fix the merge conflict.

@pasiorovuo
Copy link

This is great development, and I hope it gets merged soon. I'd like to present one wish though - please take #281 into account and make use of tempfile.SpooledTemporaryFile optional / configurable.

@stinky2nine
Copy link

@jschneier Friendly ping.

@mhindery
Copy link

mhindery commented Apr 6, 2017

Would also love this to be merged, as the current solution for GCS is far from ideal ...

@scjody
I have tried this out (by taking the master branch of this repo and applying the patch), but I get the error

ImproperlyConfigured: Could not load Google Cloud Storage bindings.
See https://github.com/GoogleCloudPlatform/gcloud-python

It crashes on these imports

screen shot 2017-04-06 at 14 46 51

However, I can do these 3 imports just fine in a python shell.

@jschneier
Copy link
Owner

Hi.

Can someone articulate what the issue is with the current gs.py binding? I'm happy to merge something like this since it seems that there is demand for it. I also greatly appreciate that the code is similar to the other backends that exist although I can't help but feel we can probably find a better abstraction since there is quite a bit shared (and that the code quality of S3BotoStorage is showing its age a bit).

When we do merge this I'd like to deprecate gs.py.

@stinky2nine
Copy link

@jschneier gs.py is using Google Cloud Storage's XML Interoperable API which uses keyed-hash message authentication code (a.k.a. developer keys) that is linked to your Google account. Interoperable API is really meant for migration to Google Cloud Storage. The biggest problem with the developer keys is security and privacy. Developer keys should not be shared with anyone as they can be used to gain access to other Google Cloud Storage buckets linked to your Google account.

I believe that this pull request depends on gcloud.storage.client.Client which would require using OAuth 2.0 for Google Cloud Storage API authN and authZ.

@mhindery
Copy link

mhindery commented Apr 10, 2017

As extra note on the explanation above:

  • If you work with multiple developers on a project, they each have to enable the interoperability mode and create keys which then have to be picked up by you project config/settings. This means quite some unnecessary work and config. Alternatively you would have to distribute the credentials for 1 account to everybody. This is not wanted.
  • When deploying on Google Cloud instances or (kubernetes) clusters, there are by default application credentials present for authentication against GCS (service accounts). It is these services accounts that are used to manage roles and authentication on the Google Cloud platform everywhere. On developer machines (linked to your Google account), they are also present system-wide for all applications to pick up by default. Each program thus runs under the dev account when running locally, and uses the instance/cluster's account when deployed somewhere, and this all happens automatically. They are the recommended way of authenticating and are used throughout the platform (https://developers.google.com/identity/protocols/application-default-credentials), available for multiple languages.

@scjody
Copy link
Contributor Author

scjody commented Apr 10, 2017

@jschneier It's mostly about authentication as others have articulated above. (I've personally tested the new functionality using an instance's service account and with a developer's user account, but I expect any authentication type supported by the google-cloud-python library will also work.)

I also believe that since the new backend uses the official Google library directly and does not require a compatibility mode intended only for short term data migrations, it is more likely to be reliable in the long term.

@scjody
Copy link
Contributor Author

scjody commented Apr 10, 2017

@mhindery Are you sure you're in the same virtualenv in both cases? Can you try debugging the try part of the code above using ipdb? Or removing the try/except to see what the exact error is?

eirsyl and others added 9 commits April 10, 2017 17:35
The "gcloud" Python module is now called "google.cloud", so use the new
name.  This means the name of this module needs to change too, since it
can no longer be "google".
This decorator is required to allow Django 1.7 migrations to serialize
instances of this class.  Without the decorator, running "makemigrations"
on code like the following will fail with a "Cannot Serialize" error.

from storages.backends.gcloud import GoogleCloudStorage

gcs = GoogleCloudStorage()

class SomeModel(models.Models):
    some_field = models.FileField(storage=gcs)
We want to use this in the Google Cloud Storage backend as well.

This is a separate commit so we can verify the clean_name tests in s3boto
still pass (before I move those as well).
This is used by s3boto to support the AWS_LOCATION setting, which isn't
needed by Google Cloud Storage
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants