Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make Bucket.delete() in storage more restricted / opinionated #564

Closed
dhermes opened this issue Jan 22, 2015 · 9 comments · Fixed by #581
Closed

Make Bucket.delete() in storage more restricted / opinionated #564

dhermes opened this issue Jan 22, 2015 · 9 comments · Fixed by #581
Assignees
Labels
api: storage Issues related to the Cloud Storage API.

Comments

@dhermes
Copy link
Contributor

dhermes commented Jan 22, 2015

UPDATED 1/26/14 by @dhermes

After discussion with @jgeewax the approach will be

  • Cap at 512 object deletes (or similar number) and cowardly refuse if more objects exist
  • Don't try to deal with retries or 409s, just raise the error and give info
  • RE: eventual consistency, just ignore (maybe log?) 404s for object deletes

See @thobrla comment:

...If you provide this functionality, it must be made extremely clear that this is a
non-atomic operation that is not expected to succeed if there are concurrent writers to
the bucket. The danger here is providing the illusion of atomicity to the client, especially
because that illusion is likely to work at small scale and then fail cryptically at large scale.

@dhermes dhermes added the api: storage Issues related to the Cloud Storage API. label Jan 22, 2015
@jgeewax
Copy link
Contributor

jgeewax commented Jan 24, 2015

So we're saying that we will not provide a way to delete a bucket... at all? That seems off to me.

Couldn't we provide the method along with more specific errors ? Ie, say that this might raise a ConcurrencyError if we try to delete a bucket when someone else is writing to it?

@dhermes
Copy link
Contributor Author

dhermes commented Jan 24, 2015

No, we'll still have Connection.delete_bucket, but the recursive helper to also delete all the objects is too big a promise to make (and too dangerous a code path to make so conveniently accessible). The complexity required to actually do it right is not worth the feature (bucket deletion as a percentage of requests that come from the library will be effectively 0).

This way, if people use Connection.delete_bucket accidentally, they'll just get a 409 Conflict and not lose the bucket name (from the global namespace).

@jgeewax
Copy link
Contributor

jgeewax commented Jan 24, 2015

So here's the argument I use for stuff like that....

If we don't have a Bucket.delete(force=True) method, people still need their buckets deleted... so...

If we're lucky, people will make this a utility method...

def delete_bucket(bucket):
  for key in bucket:
    key.delete()
  bucket.delete()

# ... lots of other code

util.delete_bucket(bucket)

If we're unlucky, people will do things like...

[key.delete() for key in bucket]
bucket.delete()

Now let's say GCS does create an atomic way of deleting a bucket (or a more efficient way of deleting a bucket, or any other major improvement).

I'm still using my utility method (if we're lucky it's a utility method...), I don't benefit from this improvement when I run pip install -U gcloud. If I'm not using a single utility method to do this, then I have to scour my code for those snippets and replace all of them (and in hacky start-ups, they could exist in a lot of places...)

If we have the method exposed, then we at least maintain control over it and can improve as the API improves. If we make the decision to toss it away now, people will still do it, and still not understand the consequences... but we lose the ability to control it, document the nuances and problems with it, and improve it as the API evolves.

I'm very much in favor of keeping the method, putting a big red warning label on the API docs, and having descriptive errors when things (that we explain at the start) go wrong.

/cc @GoogleCloudPlatform/cloud-storage

@dhermes
Copy link
Contributor Author

dhermes commented Jan 24, 2015

RE: "scour my code for those snippets and replace all of them". Apps using GCS as a blob store (most) will not delete buckets in production code. I'd guess some meta-applications would create and destroy buckets as part of their running application, but I'd also guess they'd want their own custom cleanup code. (Again the main issue is that the for key in bucket is not atomic and only gives eventually consistent results.)

I'd like to hear what the storage folks have to say.

@jgeewax
Copy link
Contributor

jgeewax commented Jan 24, 2015

You'd be surprised the scary things people will do in their production apps.... Regardless, if they're going to do it, I'd rather us be the ones controlling it...

@dhermes
Copy link
Contributor Author

dhermes commented Jan 26, 2015

Deleting a bucket and all the objects in it would blow up most production apps that use it as a blob store. I agree I'd be surprised by zany things, but removing all blobs associated with an application is a whole different level of scary.

I agree with you that we should seek to be the ones doing implementation for them, but am coming from the side of maintainability. It takes a lot of code to actually get a bucket delete right (as @thobrla mentioned). The complexity bump is huge, but the actual feature added is miniscule (due to the fact that a full bucket delete is so rare for a production application).

@thobrla
Copy link

thobrla commented Jan 26, 2015

If you choose to implement recursive delete:

  1. It could take an unbounded amount of time depending on how many objects there are in the bucket.
  2. You'll need to handle concurrent writes and return a clear error condition in this case (optionally, implementing retries).
  3. Eventual listing consistency can cause errors in the case of non-concurrent writes. In other words, if the an application serially uploads and object, waits for success, then deletes the bucket, it's possible the object won't be present in the listing at the time that you but that the delete will fail.

I don't have a strong opinion as to whether you should implement this or not; if you do, make sure you document the behavior clearly, as well as the errors you'll raise in the case of the above failures.

@dhermes dhermes changed the title Remove Bucket.delete() from storage Make Bucket.delete() in storage more restricted / opinionated Jan 26, 2015
@dhermes
Copy link
Contributor Author

dhermes commented Jan 26, 2015

Update description and issue title after chat with @jgeewax

@jgeewax jgeewax added this to the Storage Stable milestone Jan 30, 2015
@dhermes dhermes self-assigned this Jan 30, 2015
@dhermes
Copy link
Contributor Author

dhermes commented Jan 30, 2015

Working on this

dhermes added a commit to dhermes/google-cloud-python that referenced this issue Jan 30, 2015
dhermes added a commit to dhermes/google-cloud-python that referenced this issue Feb 3, 2015
parthea added a commit that referenced this issue Oct 21, 2023
* chore: remove obsolete workarounds in owlbot.py

* remove workarounds in owlbot.py

* restore workaround in owlbot.py

* remove test code
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: storage Issues related to the Cloud Storage API.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants