Skip to content

Fix encryption key cache design for AKV provider #3464

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

cheenamalhotra
Copy link
Member

@cheenamalhotra cheenamalhotra commented Jul 9, 2025

Description

This PR fixes a couple of issues:

  1. Concurrency issue with AKV cache: If parallel threads access the Local or Global CEK caches at the same time, they end up all making requests to Azure Key Vault to acquire key information. This causes unnecessary requests over the wire for unwrapping key which is already "supposedly" in cache.
    • This is addressed by usage of Semaphore Slim to limit cache access to one thread at a time.
  2. Sync over Async: As of now requests to cache Key are made asynchronously, causing thread starvation due to sync-over-async issue. Since AKV provider is purely sync, we should not be making any async calls synchronously.
    • This is addressed by calling sync APIs to fetch key information from AKV.

Issues

Fixes above issues, and additionally:

Testing

Repro can be found here: https://gist.github.com/cheenamalhotra/79f8ae709147e0b52239fd54c3049932

When registering AKV provider globally:

  • With current driver: The number of 'unwrap' key requests triggered by Azure Key Vault SDK are more than 1 for parallel operations tests.
  • With fix: Only 1 Unwrap call is made by the driver as Cached Encryption key is available.

Guidelines

Please review the contribution guidelines before submitting a pull request:

P.S. We might want to consider it for 6.1.0 as it's critically needed for a customer.

@Copilot Copilot AI review requested due to automatic review settings July 9, 2025 00:37
@cheenamalhotra cheenamalhotra requested a review from a team as a code owner July 9, 2025 00:37
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refactors the Azure Key Vault (AKV) provider cache design to address cache TTL enforcement, remove sync-over-async calls, and prevent redundant cache fetches under concurrency.

  • Modernized SqlSymmetricKeyCache to use target-typed new, inline TryGetValue, and removed forced TTL reset.
  • Switched SqlConnection to C# 12 collection expressions and dropped global TTL zeroing.
  • Refactored SqlColumnEncryptionAzureKeyVaultProvider and AzureSqlKeyCryptographer from async task-based caching to synchronous patterns with a semaphore guard.

Reviewed Changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated no comments.

Show a summary per file
File Description
src/Microsoft.Data.SqlClient/src/Microsoft/Data/SqlClient/SqlSymmetricKeyCache.cs Modernized cache lookup and entry options; removed TTL override
src/Microsoft.Data.SqlClient/netcore/src/Microsoft/Data/SqlClient/SqlConnection.cs Replaced new List<string>(…) with [ .. keys]; removed TTL zeroing
src/Microsoft.Data.SqlClient/add-ons/AzureKeyVaultProvider/SqlColumnEncryptionAzureKeyVaultProvider.cs Added SemaphoreSlim around local cache access
src/Microsoft.Data.SqlClient/add-ons/AzureKeyVaultProvider/LocalCache.cs Adjusted using order; removed unused Contains method
src/Microsoft.Data.SqlClient/add-ons/AzureKeyVaultProvider/AzureSqlKeyCryptographer.cs Replaced async fetch/dictionaries with sync KeyVaultKey cache
Comments suppressed due to low confidence (5)

src/Microsoft.Data.SqlClient/add-ons/AzureKeyVaultProvider/AzureSqlKeyCryptographer.cs:31

  • [nitpick] The _keyFetchTaskDictionary now stores KeyVaultKey instances rather than tasks. Consider renaming it to better reflect its contents, e.g. _keyCacheDictionary or _fetchedKeys.
        private readonly ConcurrentDictionary<string, KeyVaultKey> _keyFetchTaskDictionary = new();

src/Microsoft.Data.SqlClient/add-ons/AzureKeyVaultProvider/AzureSqlKeyCryptographer.cs:175

  • FetchKeyFromKeyVault retrieves keyClient but never initializes it. You need to call CreateKeyClient(vaultUri) before accessing _keyClientDictionary or handle the case where keyClient is null.
            return keyClient?.GetKey(keyName, keyVersion);

src/Microsoft.Data.SqlClient/add-ons/AzureKeyVaultProvider/AzureSqlKeyCryptographer.cs:81

  • With the async/task-based guard removed, parallel calls for a missing key will all invoke FetchKeyFromKeyVault. Consider adding a lock or double-checked locking to prevent N+1 fetches under concurrency.
            if (_keyFetchTaskDictionary.TryGetValue(keyIdentifierUri, out KeyVaultKey keyVaultKey))

src/Microsoft.Data.SqlClient/netcore/src/Microsoft/Data/SqlClient/SqlConnection.cs:279

  • The method signature returns List<string>, but the collection expression [...] produces an array by default. This will cause a type mismatch. Either construct a List<string> explicitly or verify that C# 12 collection expressions target the declared return type.
                return [.. s_systemColumnEncryptionKeyStoreProviders.Keys];

src/Microsoft.Data.SqlClient/netcore/src/Microsoft/Data/SqlClient/SqlConnection.cs:294

  • Same as above: returning a collection expression for a method declared to return List<string> may not compile. Consider new List<string>(...) or confirm the collection expression creates a List<string>.
                return [.. _customColumnEncryptionKeyStoreProviders.Keys];

@cheenamalhotra cheenamalhotra added this to the 6.1.0 milestone Jul 9, 2025
@cheenamalhotra cheenamalhotra requested a review from a team July 10, 2025 06:01
<value>Internal error. Empty '{0}' specified.</value>
</data>
<data name="GetKeyFailed" xml:space="preserve">
<value>Fetching the key failed: '{0}'.</value>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
<value>Fetching the key failed: '{0}'.</value>
<value>Failed to fetch Key from Azure Key Vault. Key: {0}</value>

Comment on lines +58 to +62
try
{
ParseAKVPath(keyIdentifierUri, out Uri vaultUri, out string keyName, out string keyVersion);
CreateKeyClient(vaultUri);
FetchKey(vaultUri, keyName, keyVersion, keyIdentifierUri);
}
// Allow only one thread to proceed to ensure thread safety
// as we will need to fetch key information from Azure Key Vault if the key is not found in cache.
_keyDictionarySemaphore.Wait();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move _keyDictionarySemaphore.Wait(); to before the try. We don't want to execute the finally if Wait() throws an exception.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if GetOrAdd() could be used here as well, and avoid the semaphore? Pass a lambda that fetches the key, and let ConcurrentDictionary block all other calls until the fetch + add is complete. Would that work?

Comment on lines +416 to +419
try
{
// Allow only one thread to access the cache at a time.
_cacheSemaphore.Wait();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move _cacheSemaphore.Wait(); to before the try. We don't want to execute the finally if Wait() throws an exception.

Comment on lines +56 to +59
try
{
Debug.Assert(SqlConnection.ColumnEncryptionTrustedMasterKeyPaths is not null, @"SqlConnection.ColumnEncryptionTrustedMasterKeyPaths should not be null");

SqlSecurityUtility.ThrowIfKeyPathIsNotTrustedForServer(serverName, keyInfo.keyPath);
// Acquire the lock to ensure thread safety when accessing the cache
_cacheLock.Wait();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move _cacheLock.Wait(); to before the try. We don't want to execute the finally if Wait() throws an exception.

Copy link
Contributor

@paulmedynski paulmedynski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A variety of comments.

/// </summary>
private readonly ConcurrentDictionary<string, KeyVaultKey> _keyDictionary = new();
private SemaphoreSlim _keyDictionarySemaphore = new(1, 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SemaphoreSlim is IDisposable. Do we need to dispose of it?

{
Task<Azure.Response<KeyVaultKey>> fetchKeyTask = FetchKeyFromKeyVault(vaultUri, keyName, keyVersion);
_keyFetchTaskDictionary.AddOrUpdate(keyResourceUri, fetchKeyTask, (k, v) => fetchKeyTask);
AKVEventSource.Log.TryTraceEvent("Fetching requested master key: {0}", keyName);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Master key, or column encryption key?

}
throw ADP.GetKeyFailed(keyName);
}
else
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unnecessary else block.

_keyClientDictionary.TryAdd(vaultUri, keyClient);
}

return keyClient;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If TryAdd() fails (because another thread just added a KeyClient for this vaultUri), then we're returning a different KeyClient instance than has been stored in the dictionary. Is this a problem?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you could replace this function body with:

return _keyClientDictionary.GetOrAdd(vaultUri, (_) => new KeyClient(vaultUri, TokenCredential));

Comment on lines +58 to +62
try
{
ParseAKVPath(keyIdentifierUri, out Uri vaultUri, out string keyName, out string keyVersion);
CreateKeyClient(vaultUri);
FetchKey(vaultUri, keyName, keyVersion, keyIdentifierUri);
}
// Allow only one thread to proceed to ensure thread safety
// as we will need to fetch key information from Azure Key Vault if the key is not found in cache.
_keyDictionarySemaphore.Wait();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if GetOrAdd() could be used here as well, and avoid the semaphore? Pass a lambda that fetches the key, and let ConcurrentDictionary block all other calls until the fetch + add is complete. Would that work?

}

/// <summary>
/// Validates that a key is of type RSA
/// </summary>
/// <param name="key"></param>
/// <returns></returns>
private KeyVaultKey ValidateRsaKey(KeyVaultKey key)
private static KeyVaultKey ValidateRsaKey(KeyVaultKey key)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this function return KeyVaultKey? Not sure what the intention is here, since neither the return value nor exceptions are documented.

@@ -397,14 +400,7 @@ private byte[] CompileMasterKeyMetadata(string masterKeyPath, bool allowEnclaveC
/// Produces a string of hexadecimal character pairs preceded with "0x", where each pair represents the corresponding element in value; for example, "0x7F2C4A00".
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value -> source

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Always encrypted: 401 errors from Azure Key Vault is cached forever
3 participants