-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clearing the global cache at the end of a transaction is unnecessary #657
Comments
Code that cleared the global cache at the end of a transaction was found to be unnecessary. Fixes googleapis#657
@chrisrossi @crwilcox I do not believe this is correct at all. This is the only correct way to clear the locked keys when in a transaction, because you want the unlocking to happen along the transaction boundary. For scenario 1 yes, it's not necessary and in fact in theory is bad. For scenario 2 your reasoning is that it is already deleting regardless of transactionality, but that is an actual bug as it is incorrect to be unlocking the keys mid transaction. That is a regression from legacy NDB, and can lead to cache inconsistencies. See #650. There is nothing wrong with these delete calls failing on put because there can never be stale values in the cache, there are only lock values (once #656 is fixed). Those are fine to leave in the cache as they will expire after 32s if the delete fails. I definitely agree that keys which were only looked up do not need to be deleted here, and in fact can possibly lead to cache inconsistencies. A list of keys whose database records were mutated during the transaction are the correct set of keys to be deleted. I have concerns that these changes keep being made which seem to indicate an incorrect understanding of how the library is trying to maintain cache consistency. |
Maybe it would help to layout the correct flow of memcache operations during a transaction with multiple records being modified:
This is how things should work. The most critically important thing is step 5 which is not correct right now due to bug #656. There are caveats to step 5 and 9 that currently are not enough to have perfect cache consistency (see #653) but it's close. It is also very important that the lock is held for the duration of the transaction, which is currently bug #650, and which is being broken by the change in this ticket. The tasklets for Step 10 are not being flushed so may not happen quickly which is bug #655. Additionally as mentioned, in step 10 only keys of mutated objects should be deleted which currently has no bug ticket. |
Now here is the correct flow for a read thread:
In this flow, step 4 is very important which currently is not correct due to bugs #651 and #652. It is the combination of the memcache add in step 4 with the memcache cas call in 8 that allows for cache consistency. Obviously this is all memcache centric, but the same applies for redis. Step 4 would be something like SETNX in redis terms. |
Hi @justinkwaugh , I think the point still stands that with the code in its current state, this code doesn't really do anything. It looks like we may have to do something similar to this for #650, but it wouldn't be exactly this, because this code deletes keys that were only read during the transaction. So, we'll deal with that in the scope of #650. This will be an iterative process. Thank you for bearing with us. |
@chrisrossi I hear what you are saying, however deleting immediately after put and not at the end of the transaction is in my opinion a worse incorrect state to be in than deleting the read only keys. The title of the ticket and the argument within is that clearing the global cache after the transaction is unnecessary because it is already being done. But that's exactly backwards from reality. Clearing the global cache after the transaction is the only time it is necessary in a transactional flow, it just needs to be more restrictive. So you are planning to remove a needed feature because a bug elsewhere exists allowing it to sort of work, so that later you can fix that bug and then add this feature back in. #650 is meant to be a narrowly focused one line bug fix to address a regression from legacy NDB. The deleting of read only keys is how the legacy NDB works, so it is a bug there as well. It makes more sense to fix #650 to bring it inline with legacy NDB and then to address a new ticket which would be to make sure that this clear_global_cache call which still has to be done in the place it is currently doing it is more restrictive about which keys it deletes. I'm all for iterative, but there is an order of criticality and likelihood of causing issues that needs to be looked at. For instance, #656 is actually causing cache inconsistencies on a regular basis for me in real usage causing breakages and the library with the bugs as-is is a showstopper for me. Cache consistency is also a holistic thing. I have made several bug tickets to point out specific bugs, but the solution needs to be looked at more broadly when thinking about how to prioritize and address the issues. An example is your argument "then we'd be in a pickle if those delete keys failed, as it would leave stale data in the cache". If bugs #656 and #651 are fixed, then the likelihood of that is limited to various edge cases as listed in some of the other lower priority cache inconsistency bugs, and goes to zero if all the rest are fixed. Your arguments have also stated that the cache clearing is leading to problematic cache inconsistencies when in reality it is one of the lowest of likelihood to do so. A more reasonable order of bug fixes which would actually iteratively improve the cache consistency would be: |
When in a transaction, keys should only be cleared from the global cache after transaction has been committed. Fixes googleapis#650 googleapis#657
Closed by #660 |
There is code which, after committing a transaction, clears keys from the global_cache if they are used during that transaction. After looking at this code a little more closely, I've concluded this is unnecessary.
The code in a question is the method,
google.cloud.ndb.context._Context._clear_global_cache
which is called fromgoogle.cloud.ndb._transaction._transaction_async
. What it does is look to see what keys are present in the local, in memory cache, which is only used for the transaction, and then deletes all of those keys in the global cache.Running through the possible scenarios, I'm not seeing a good reason to do this:
Scenario 1: Entity lookup. If an entity lookup is done during a transaction, the global cache is not used at all. It is neither queried nor written to. So there is no impact on the global cache and clearing these keys is unnecessary.
Scenario 2: Entity put. When an entity is written to Datastore, whether or not in a transaction, that entity's key is deleted from the global cache. Deleting the same key again in the call to
_clear_global_cache
is redundant.Scenario 3: Entity delete. This is essentially the same as Scenario 2. The key has already been deleted before the call to
_clear_global_cache
, so deleting it again is redundant.I believe these are all the scenarios.
What's the harm in calling
_clear_global_cache
anyway?Recently a spate of issues has had me looking at fault tolerance and how we handle transient errors. Specifically, in the work leading up to #654, I had to look at everywhere the cache was being used and reason about what should happen if a particular call to the cache fails. In the case of
_clear_global_cache
, I think that if weren't redundant code, then we'd be in a pickle if those delete keys failed, as it would leave stale data in the cache and we'd have to figure out how to mitigate the damage from that.The finding that this code is unnecessary in the first place is fortunate, because now we don't have to reason about fault tolerance for it.If we leave it in, we remain with the burden of maintaining it and reasoning about what should happen in the event of a communication failure with the cache. Luckily, it can be removed, which is a much better outcome.
The text was updated successfully, but these errors were encountered: