Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cloud NDB - Get operation ignores transient locking failure in memcache leading to cache inconsistency #652

Closed
justinkwaugh opened this issue May 17, 2021 · 3 comments
Assignees
Labels
api: datastore Issues related to the googleapis/python-ndb API. priority: p3 Desirable enhancement or fix. May not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.

Comments

@justinkwaugh
Copy link

justinkwaugh commented May 17, 2021

cloud-ndb v1.8.0 with Memcache global cache

There is a sequence of steps that can lead to cache inconsistency which is caused by the read thread failing to write a lock for transient reasons. The sequence of steps is:

  1. Reader gets from memcache and finds nothing
  2. Writer writes lock value
  3. Reader has transient failure when attempting to lock the key
  4. Reader watches key
  5. Reader reads from db
  6. Writer updates db
  7. Writer fails to delete lock from db for whatever reason (connection reset most likely currently)
  8. Reader writes stale value using cas

The problem here is that exceptions on transient failures on cache operations from reads are swallowed, which for most of the calls is fine, however very specifically for the lock call in _datastore_api.lookup() any exception needs to be treated as the key being locked such that it will not attempt to update memcache with a new value

@product-auto-label product-auto-label bot added the api: datastore Issues related to the googleapis/python-ndb API. label May 17, 2021
@crwilcox crwilcox added priority: p3 Desirable enhancement or fix. May not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. labels May 17, 2021
@crwilcox
Copy link
Collaborator

crwilcox commented May 17, 2021

This appears to be porting/slight alteration on an open issue on the legacy NDB implementation. GoogleCloudPlatform/datastore-ndb-python#84

To be clear, the difference is Reader has transient failure when attempting to lock the key

That issue and #651 speaks to how the reader does lock the key and overwrites. This presents the case where the lock fails due to transient issues.

@justinkwaugh
Copy link
Author

Yes, the main point of this one is not the mechanism of locking which is the focus of #651 and the legacy #84, it's more related to the newly implemented transient failure handling.

@chrisrossi
Copy link
Contributor

Closed by #667

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: datastore Issues related to the googleapis/python-ndb API. priority: p3 Desirable enhancement or fix. May not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.
Projects
None yet
Development

No branches or pull requests

3 participants