Skip to content

Skip read lock to improve concurrency #100

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Dec 1, 2021
Merged

Skip read lock to improve concurrency #100

merged 1 commit into from
Dec 1, 2021

Conversation

rayluo
Copy link
Contributor

@rayluo rayluo commented Nov 12, 2021

An attempt to bypass the locking behavior when reading token cache. Most of the time, token cache is only serving read operation, so conceptually we will only need to lock once per hour. This would likely improve the concurrency a lot.

@jiasli

@jiasli
Copy link
Contributor

jiasli commented Nov 18, 2021

I think this works great for find, but we still need to rethink about modify. If 2 processes are writing to he same file, maybe we can let one process wait for a while, instead of making a hard failure with portalocker.LOCK_NB?

@rayluo
Copy link
Contributor Author

rayluo commented Nov 18, 2021

I think this works great for find, but we still need to rethink about modify. If 2 processes are writing to he same file, maybe we can let one process wait for a while, instead of making a hard failure with portalocker.LOCK_NB?

True. There is still room for improvement for modify, which will probably happen in a different PR.

That being said, the improvement on find would still indirectly help modify, because now there will be zero reader to content for the lock.

@jiasli
Copy link
Contributor

jiasli commented Nov 29, 2021

I think we'd better accelerate the fix. I have been tripped by this issue myself several times when executing Azure CLI in parallels.

Avoid the unnecessary sleep after the last retry
@jiasli
Copy link
Contributor

jiasli commented Feb 18, 2022

I think there is still a very small chance that Azure/azure-cli#20273 can be triggered: It is when the token expires. If multiple instances refreshes at the same time, the permission issue may still happen since there is no retry logic:

def modify(self, credential_type, old_entry, new_key_value_pairs=None):
with CrossPlatLock(self._lock_location):
self._reload_if_necessary()
super(PersistedTokenCache, self).modify(
credential_type,
old_entry,
new_key_value_pairs=new_key_value_pairs)
self._persistence.save(self.serialize())
self._last_sync = time.time()

This issue has been bugging Azure CLI for years (Azure/azure-cli#9427).

@rayluo
Copy link
Contributor Author

rayluo commented Feb 18, 2022

I think there is still a very small chance that Azure/azure-cli#20273 can be triggered: It is when the token expires. If multiple instances refreshes at the same time,

We consider "multiple app instances encountered token expiration at the same time" would happen less frequently than "multiple app instances run az login at the same time". The reason is, multiple az login would presumably happening at time T, thus racing with each other. But multiple az foo bar job workers would likely run in a slightly different pace. It is like the job time adds in some natural jitter.

Please continue to advocate customers to adopt a pattern of "run one az login first, followed by a bunch of az foo bar job workers". Or, even better, "run one az login and then one az foo bar to warm up (i.e. populate the token cache with a fresh AT for scope foo), and then followed by a bunch of az foo bar job workers".

the permission issue may still happen since there is no retry logic:

def modify(self, credential_type, old_entry, new_key_value_pairs=None):
with CrossPlatLock(self._lock_location):
self._reload_if_necessary()
super(PersistedTokenCache, self).modify(
credential_type,
old_entry,
new_key_value_pairs=new_key_value_pairs)
self._persistence.save(self.serialize())
self._last_sync = time.time()

Retry was considered unnecessary for persistence.save(), when/if the with CrossPlatLock(...) was successful. FWIW, with CrossPlatLock(...) already has its own retry logic.

This issue has been bugging Azure CLI for years (Azure/azure-cli#9427).

True. All these "unable to obtain a write lock" issues are designed to prevent a token cache file corruption issue (Azure/azure-cli#9427). In that sense, we are already in a better place than token cache file corruption.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants