Skip to content

Board Review: Azure Communication Services (SPOOL) - Auth - Configurable Proactive Autorefresh Interval (.NET, JS/TS, Java & Python) #3712

Closed

Description

Contacts and Timeline

  • Responsible service team: ASC Auth
  • Main contacts: Petr Svihlik (@petrsvihlik), Aigerim Beishenbekova (@AikoBB)
  • Expected code complete date: Jan 2022
  • Expected release date: Jan 2022

About the Service

  • Link to documentation introducing/describing the service: N/A
  • Link to the service REST APIs: N/A
  • Link to GitHub issue for previous review sessions, if applicable: N/A

Note: The changes we are introducing are not related related to any specific service. They reside in the common library (a shared lib used by all ACS modalities). The behavior we are modifying is called "proactive token autorefresh" and it's documented here: https://github.com/Azure/azure-sdk-for-js/tree/main/sdk/communication/communication-common#create-a-credential-with-proactive-refreshing

About the client library

  • Name of the client library: Azure.Communication.Common
  • Languages for this review: C#, JS/TS, Java, Python

Artifacts

.NET

Java

Python

  • APIView Link:
    1. there’s no common lib in Python, all of the changes in the _shared folder will need to be replicated to all other packages (sms, chat…) Unfortunately, the rule of the _shared folders being in sync was broken so we drafted a PR fixing this and it generated a couple of API views for various packages – all linked here
    2. the proactive autorefresh capability was missing from the Python SDK completely, so we implemented it and distributed the implementation across the _shared folders. that generated this API view - https://apiview.dev/Assemblies/Review/c442ec17a8ee42f09cafffbe24438505?diffOnly=True&diffRevisionId=4c8766d9b6404fe29862dc5f0160f4e3 (changes in kwargs of the __init__ method, please ignore all the other API changes - they will be mitigated once the PR 1) gets merged

TypeScript

The change

  • Added a way to configure the proactive refresh time interval via the options bag
  • Changed the default value of the proactive refresh time interval

Champion Scenario

Token Refresh Flow for Custom Teams Endpoint

The change we're introducing relates to the credential proactive autorefresh logic where, originally, this functionality was being triggered 10 minutes before the expiry of a given token. We are making this time interval configurable as it turned out, the hardcoded value might not be sufficient for all scenarios. To be more specific, here’s a scenario that one of the private preview customers ran into when using autorefresh in combination with the Custom Teams Endpoint:

  • The developer:

    • configures the SDK to use AutoRefreshTokenCredential,
    • enables proactive refresh,
    • and implements up the tokenRefresher callback function/method to get AAD tokens via MSAL and exchange them for Communication Identity tokens
  • During the app's runtime:

    • The initial Communication Identity access token is about to expire (== its expiration is less or equal to 10 minutes)
    • Proactive autorefresh is triggered
    • A new AAD token is requested from MSAL
    • The returned token is an old (but still valid) one returned from the MSAL's cache
    • The AAD token is sent for an exchange for the Communication Identity access token
    • Because the token exchange mechanism always returns a token with an expiration time equal to the one from the AAD token, the new token's about to expire again ☹
    • The whole flow is being triggered again and again in a loop until MSAL returns a fresh (uncached) token with longer expiration

We want to provide the developer with an option to align the refresh interval with MSAL’s cache times. We're also changing the default refresh interval to 4.5 minutes before token expiry. MSAL's offset is 5 minutes in all languages (JS, .NET, Java, Python) so this value accounts for possible clock skew and to avoid any unnecessary calls.

Considerations:

The time offset values in MSAL and ACS SDKs

  • ACS SDKs have a hardcoded offset of 10 minutes before the token expiry to trigger the autorefresh
  • MSAL won't proceed with the token acquisition if the token's validity is not lower than 5 minutes
    • This value is configurable and it can also be overridden by calling the acquireTokenSilent with forceRefresh==true
    • Please note that MSAL won't trigger the refresh automatically; it needs to be initiated by the consumer of that lib by calling any of the acquireToken* methods
  • As you can see, there is a 5 minute period where the ACS will be unsuccessfully trying to refresh the token
  • By making the interval in ACS SDKs configurable, we expect customers to be setting it in alignment with the MSAL's interval or any other 3rd party lib (such as Auth0) that they don't have control over and that doesn't offer any of these workarounds
  • If we don't make it configurable, we force them to adjust the MSAL's settings or to use the forceRefresh.

Default values

  • After the internal ACS review, we decided to also change the the default refresh interval value (from 10 to 4.5 minutes).
  • Changing the default alone might seem like a good enough approach, however, it's important to stay aware of the fact that the autorefresh functionality is a general concept, not something CTE-specific. By changing the default value we're making assumptions about how the library will be used and hiding important details behind a single number. Also, if we decide to change the default again in the future, this detail might be unintentionally omitted and things could start falling apart.
  • We want to keep the way to configure it in case the customer already relies on the current behavior in an unexpected way or uses an authentication library other than MSAL (e.g. Auth0).

Exponential retry

  • We may consider adding exponential retry as a general extension of the autorefreshing logic improving the performance in scenarios where the customer is not aware of repeated unsuccessful refresh attempts. However, we believe that for deterministic or predictable scenarios such as this, the approaches suggested above are better suited.

Documentation update

  • Besides making the autorefresh configurable, we are also going to document all the options on the MSAL's side that I described above + make it clear that it's a customer's responsibility to return a token with a long enough validity

cc @jorgegarchirota

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions