Skip to content

Cloud NDB - Get operation ignores transient locking failure in memcache leading to cache inconsistency #652

Closed
@justinkwaugh

Description

@justinkwaugh

cloud-ndb v1.8.0 with Memcache global cache

There is a sequence of steps that can lead to cache inconsistency which is caused by the read thread failing to write a lock for transient reasons. The sequence of steps is:

  1. Reader gets from memcache and finds nothing
  2. Writer writes lock value
  3. Reader has transient failure when attempting to lock the key
  4. Reader watches key
  5. Reader reads from db
  6. Writer updates db
  7. Writer fails to delete lock from db for whatever reason (connection reset most likely currently)
  8. Reader writes stale value using cas

The problem here is that exceptions on transient failures on cache operations from reads are swallowed, which for most of the calls is fine, however very specifically for the lock call in _datastore_api.lookup() any exception needs to be treated as the key being locked such that it will not attempt to update memcache with a new value

Metadata

Metadata

Assignees

Labels

api: datastoreIssues related to the googleapis/python-ndb API.priority: p3Desirable enhancement or fix. May not be included in next release.type: bugError or flaw in code with unintended results or allowing sub-optimal usage patterns.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions