Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

token refresh offset #12136

Merged
merged 33 commits into from
Jul 17, 2020
Merged

token refresh offset #12136

merged 33 commits into from
Jul 17, 2020

Conversation

xiangyan99
Copy link
Member

No description provided.

@xiangyan99 xiangyan99 requested a review from chlowell June 19, 2020 20:30
@xiangyan99 xiangyan99 marked this pull request as ready for review June 22, 2020 20:59
@xiangyan99 xiangyan99 requested a review from schaabs as a code owner June 22, 2020 20:59
@xiangyan99
Copy link
Member Author

/azp run python - identity - ci

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).


def get_cached_access_token(self, scopes, query=None):
# type: (Sequence[str], Optional[dict]) -> Optional[AccessToken]
tokens = self._cache.find(TokenCache.CredentialType.ACCESS_TOKEN, target=list(scopes), query=query)
for token in tokens:
expires_on = int(token["expires_on"])
if expires_on - 300 > int(time.time()):
if expires_on - 30 > int(time.time()):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be

Suggested change
if expires_on - 30 > int(time.time()):
if expires_on - self._token_refresh_timeout > int(time.time()):

or is there some rationale for always using 30 seconds?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not _token_refresh_timeout.

We don't have a clear design for this value but it must be less than _token_refresh_offset (default to 120). Or it will hide the auto refresh feature.

The old one 300 does not meet the requirement so I updated it to 30.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder whether we need an explicit margin here. The 1s margin in if expires_on > int(time.time()) seems okay to me. My reasoning:

  • functionally, this line served to hardcode token_refresh_offset=300
    • if all cached tokens would expire within 300 seconds, this method would return None, prompting the caller to acquire a new token
  • token_refresh_offset will now be observed by callers of this method
  • when a caller enters its refresh window, it should begin trying to acquire a new token
  • while trying to acquire a new token, the caller should return any valid token it has

One bad outcome that could follow is the caller using a token that expires in flight. That request will fail, but the caller's other option was to raise without sending the request at all, because it couldn't acquire a new token. It seems better to try the request, which could after all succeed.

What do you think?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The difference is when it is still in token_refresh_retry_timeout time frame.

Extreme case: user gets a token from us which expires in 1s. It is still in token_refresh_retry_timeout time frame so it does not get refreshed.

vs

They get None from us so it forces a refresh.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But if the credential is waiting on the retry timeout, it won't try to get a new token, regardless of what it gets back from the cache. Returning None in that case only guarantees the current request will fail, no?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. if there is no valid token (it returns None), no matter it is in retry timeout window or not, we will try to get one.

Retry timeout only applies to there is A valid token but it is within the refresh offset window.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I overlooked this behavior. Credentials should observe the retry timeout when the cache is empty.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this is the behavior we want. If we have no access_token and the first attempt to get one failed, do we really want to hold all requests for 30 seconds before attempting to get one? I think we need to clarify this more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My opinion is if there is no one available, every time user calls our library to get one, we will try it w/o a cool down time.


def get_cached_access_token(self, scopes, query=None):
# type: (Sequence[str], Optional[dict]) -> Optional[AccessToken]
tokens = self._cache.find(TokenCache.CredentialType.ACCESS_TOKEN, target=list(scopes), query=query)
for token in tokens:
expires_on = int(token["expires_on"])
if expires_on - 300 > int(time.time()):
if expires_on - 30 > int(time.time()):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder whether we need an explicit margin here. The 1s margin in if expires_on > int(time.time()) seems okay to me. My reasoning:

  • functionally, this line served to hardcode token_refresh_offset=300
    • if all cached tokens would expire within 300 seconds, this method would return None, prompting the caller to acquire a new token
  • token_refresh_offset will now be observed by callers of this method
  • when a caller enters its refresh window, it should begin trying to acquire a new token
  • while trying to acquire a new token, the caller should return any valid token it has

One bad outcome that could follow is the caller using a token that expires in flight. That request will fail, but the caller's other option was to raise without sending the request at all, because it couldn't acquire a new token. It seems better to try the request, which could after all succeed.

What do you think?

@xiangyan99 xiangyan99 requested a review from chlowell June 24, 2020 22:42
Copy link
Member

@chlowell chlowell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Credentials also need to expose the refresh offset in their public API, for the authentication policy.

sdk/identity/azure-identity/tests/test_aad_client.py Outdated Show resolved Hide resolved
sdk/identity/azure-identity/tests/test_aad_client.py Outdated Show resolved Hide resolved

def get_cached_access_token(self, scopes, query=None):
# type: (Sequence[str], Optional[dict]) -> Optional[AccessToken]
tokens = self._cache.find(TokenCache.CredentialType.ACCESS_TOKEN, target=list(scopes), query=query)
for token in tokens:
expires_on = int(token["expires_on"])
if expires_on - 300 > int(time.time()):
if expires_on - 30 > int(time.time()):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But if the credential is waiting on the retry timeout, it won't try to get a new token, regardless of what it gets back from the cache. Returning None in that case only guarantees the current request will fail, no?

@xiangyan99
Copy link
Member Author

Credentials also need to expose the refresh offset in their public API, for the authentication policy.

Do you mean BearerTokenCredentialPolicy?

refresh offset is only configurable in ctor. I don't see BearerTokenCredentialPolicy call credential ctors.

@chlowell
Copy link
Member

Yes, recall that BearerTokenCredentialPolicy doesn't call get_token on every request. It waits until the last token it received is about to expire, as determined by a hardcoded value.

@xiangyan99
Copy link
Member Author

Yes, recall that BearerTokenCredentialPolicy doesn't call get_token on every request. It waits until the last token it received is about to expire, as determined by a hardcoded value.

Yes. And given we implemented the cache functionality in get_token. I don't see a requirement to update the default value of _need_new_token

@chlowell
Copy link
Member

What if I want to refresh tokens 6 minutes before they expire?

@xiangyan99
Copy link
Member Author

xiangyan99 commented Jun 25, 2020

What if I want to refresh tokens 6 minutes before they expire?

This is a good point. Please open a separate issue to track it. We cannot fix both of them in same PR because that one needs core changes and per our rule, core changes cannot combine with other changes in same PR.

@xiangyan99 xiangyan99 requested a review from chlowell June 29, 2020 16:02
def get_cached_token(self, scopes):
# type: (Iterable[str]) -> Optional[AccessToken]
tokens = self._cache.find(TokenCache.CredentialType.ACCESS_TOKEN, target=list(scopes))
for token in tokens:
expires_on = int(token["expires_on"])
if expires_on - 300 > int(time.time()):
if expires_on - 30 > int(time.time()):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be using the constant you've defined?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I commented on another instance of this, I'm not certain we need any margin here. We want credentials to return cached tokens as necessary so long as they're valid; doesn't that imply having the cache return a token right up until its expiry?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Margin removed.

try:
self._redeem_refresh_token(scopes, **kwargs)
except Exception: # pylint: disable=broad-except
pass
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we be logging refreshes which fail here? Is this already done in _redeem_refresh_token?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good question.

I am leaning towards not logging it because:

  • if there is a valid token available, user will continue to use that one and there is no need to log it.
  • if there is no valid token, user cannot get one and we will log that event (already implemented)

Comment on lines 49 to +55
if not token:
token = self._client.obtain_token_by_client_certificate(scopes, self._certificate, **kwargs)
elif self._client.should_refresh(token):
try:
self._client.obtain_token_by_client_certificate(scopes, self._certificate, **kwargs)
except Exception: # pylint: disable=broad-except
pass
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic:

if not token:
  # get new token
elif should_refrsesh:
  try:
     # get new token
  except Exception:
     # swallow

seems to be present in most if not all the credentials. Perhaps it could be moved into a base or mixin, and have the implementation just provide a callback or an override for the # get new token functionality?

Copy link
Member Author

@xiangyan99 xiangyan99 Jul 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. But different credentials have different ways to refresh/redeem tokens. So I have not found a clean way to do it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think of something like this:

class CredentialBase(ABC):
    def __init__(self, **kwargs):
        self._client = AadClient(...)

    def _get_token_impl(*scopes, **kwargs):
        if not scopes:
            raise ValueError('"get_token" requires at least one scope')

        token = self._client.get_cached_access_token(scopes)
        if not token:
            token = self._request_token(scopes, **kwargs)
        elif self._client.should_refresh(token):
            try:
                self._request_token(scopes, **kwargs)
            except Exception:  # pylint:disable=broad-except
                pass
        return token

    @abc.abstractmethod
    def _request_token(self, *scopes, **kwargs):
        pass

class Credential(CredentialBase):
    def get_token(*scopes, **kwargs):
        """relevant user-facing docstring"""
        return self._get_token_impl(*scopes, **kwargs)

    def _request_token(*scopes, **kwargs):
        """get a new token according to this credential's personal idiom"""
        ...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean we make a shared credential base?

I would like to have it into a separate issue/PR as code refactoring.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactoring always has a lower priority than new features. Merging this code is an open-ended commitment to maintaining it as is, so it's worth investigating a better organization now. The one I sketched may have its own problems (e.g. multiple inheritance would require some care) but it seems workable. What do you think? Have you tried something similar already?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think when we do refactoring by adding a shared class for all credentials, we can do further than only this. But I don't want to rush it right before a release.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xiangyan99 xiangyan99 requested a review from chlowell July 15, 2020 20:33
@chlowell chlowell added Azure.Identity Client This issue points to a problem in the data-plane of the library. labels Jul 17, 2020
@xiangyan99 xiangyan99 requested a review from chlowell July 17, 2020 21:29
@xiangyan99 xiangyan99 merged commit 117a6f5 into master Jul 17, 2020
@xiangyan99 xiangyan99 deleted the identity_token_refresh_offset branch July 17, 2020 22:16
iscai-msft added a commit to iscai-msft/azure-sdk-for-python that referenced this pull request Jul 20, 2020
…into regenerate_keys

* 'master' of https://github.com/Azure/azure-sdk-for-python: (100 commits)
  replace aka link (Azure#12597)
  [ServiceBus] Message/ReceivedMessage Properties alignment with other languages (Azure#12451)
  Find list of installed packages using pkg_resources (Azure#12591)
  token refresh offset (Azure#12136)
  updates (Azure#12595)
  User authentication samples (Azure#11343)
  Remove unnecessary base class (Azure#12374)
  Sequence -> Iterable for scopes (Azure#12579)
  Disable apistubgen step until issue is fixed (Azure#12594)
  fix pylint issue (Azure#12578)
  fix name in example (Azure#12572)
  Update tests.md (Azure#12574)
  Add stress tests for max batch size/prefetch, and for unsettled message receipt.  Add capability to not auto-complete and adjust max_batch_size into the base stress tester. (Azure#12344)
  [formrecognizer] Capitalize enum values (Azure#12540)
  Update Pinned CI Packages (Azure#11586)
  remove async response hook policy (Azure#12529)
  update to target new warden version (Azure#12522)
  fix azure-storage-blob readme and samples issues (Azure#12511)
  code fence not formatted appropriately (Azure#12520)
  Fix documentation typo (Azure#12519)
  ...
iscai-msft added a commit to iscai-msft/azure-sdk-for-python that referenced this pull request Jul 20, 2020
…into regenerate_certs

* 'master' of https://github.com/Azure/azure-sdk-for-python: (100 commits)
  replace aka link (Azure#12597)
  [ServiceBus] Message/ReceivedMessage Properties alignment with other languages (Azure#12451)
  Find list of installed packages using pkg_resources (Azure#12591)
  token refresh offset (Azure#12136)
  updates (Azure#12595)
  User authentication samples (Azure#11343)
  Remove unnecessary base class (Azure#12374)
  Sequence -> Iterable for scopes (Azure#12579)
  Disable apistubgen step until issue is fixed (Azure#12594)
  fix pylint issue (Azure#12578)
  fix name in example (Azure#12572)
  Update tests.md (Azure#12574)
  Add stress tests for max batch size/prefetch, and for unsettled message receipt.  Add capability to not auto-complete and adjust max_batch_size into the base stress tester. (Azure#12344)
  [formrecognizer] Capitalize enum values (Azure#12540)
  Update Pinned CI Packages (Azure#11586)
  remove async response hook policy (Azure#12529)
  update to target new warden version (Azure#12522)
  fix azure-storage-blob readme and samples issues (Azure#12511)
  code fence not formatted appropriately (Azure#12520)
  Fix documentation typo (Azure#12519)
  ...
iscai-msft added a commit to iscai-msft/azure-sdk-for-python that referenced this pull request Jul 21, 2020
…into ta_opinion_mining_sample

* 'master' of https://github.com/Azure/azure-sdk-for-python: (124 commits)
  [formrecognizer] Add type to FormField (Azure#12561)
  Add example summary for azure-identity readme.md (Azure#12509)
  Add logging to credentials (Azure#12319)
  Sdk automation/track2 azure mgmt keyvault (Azure#12638)
  Remove unnecessary coroutine declaration (Azure#12602)
  [Cosmos] Fix type comment (Azure#12598)
  replace aka link (Azure#12597)
  [ServiceBus] Message/ReceivedMessage Properties alignment with other languages (Azure#12451)
  Find list of installed packages using pkg_resources (Azure#12591)
  token refresh offset (Azure#12136)
  updates (Azure#12595)
  User authentication samples (Azure#11343)
  Remove unnecessary base class (Azure#12374)
  Sequence -> Iterable for scopes (Azure#12579)
  Disable apistubgen step until issue is fixed (Azure#12594)
  fix pylint issue (Azure#12578)
  fix name in example (Azure#12572)
  Update tests.md (Azure#12574)
  Add stress tests for max batch size/prefetch, and for unsettled message receipt.  Add capability to not auto-complete and adjust max_batch_size into the base stress tester. (Azure#12344)
  [formrecognizer] Capitalize enum values (Azure#12540)
  ...
iscai-msft added a commit to iscai-msft/azure-sdk-for-python that referenced this pull request Jul 29, 2020
…into regenerate_secrets

* 'master' of https://github.com/Azure/azure-sdk-for-python: (96 commits)
  replace aka link (Azure#12597)
  [ServiceBus] Message/ReceivedMessage Properties alignment with other languages (Azure#12451)
  Find list of installed packages using pkg_resources (Azure#12591)
  token refresh offset (Azure#12136)
  updates (Azure#12595)
  User authentication samples (Azure#11343)
  Remove unnecessary base class (Azure#12374)
  Sequence -> Iterable for scopes (Azure#12579)
  Disable apistubgen step until issue is fixed (Azure#12594)
  fix pylint issue (Azure#12578)
  fix name in example (Azure#12572)
  Update tests.md (Azure#12574)
  Add stress tests for max batch size/prefetch, and for unsettled message receipt.  Add capability to not auto-complete and adjust max_batch_size into the base stress tester. (Azure#12344)
  [formrecognizer] Capitalize enum values (Azure#12540)
  Update Pinned CI Packages (Azure#11586)
  remove async response hook policy (Azure#12529)
  update to target new warden version (Azure#12522)
  fix azure-storage-blob readme and samples issues (Azure#12511)
  code fence not formatted appropriately (Azure#12520)
  Fix documentation typo (Azure#12519)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Azure.Identity Client This issue points to a problem in the data-plane of the library.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants