-
Notifications
You must be signed in to change notification settings - Fork 2.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Session Token Management APIs #36971
base: main
Are you sure you want to change the base?
Conversation
API change check APIView has identified API level changes in this PR and created following API reviews. |
…into tvaron3/sessionTokenHelper
…xinlian12/azure-sdk-for-python into tvaron3/sessionTokenHelper
Azure Pipelines successfully started running 1 pipeline(s). |
sdk/cosmos/azure-cosmos/azure/cosmos/_change_feed/feed_range_internal.py
Outdated
Show resolved
Hide resolved
remove_session_tokens = [session_tokens[i], session_tokens[j]] | ||
for token in remove_session_tokens: | ||
session_tokens.remove(token) | ||
i = -1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should it be i--
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No I want to reset i to the beginning of the list because I want to include the merged session token I created because it could still be merged again with another session token and the indexing gets messed up after removing items from the list. Essentially:
- Merge the session tokens with same pkrangeid
- then remove the old session tokens
- then reset to beginning
- repeat until no common pkrangeids.
Alternative approach:
- Make lists of session tokens with common pkrangeids
- Merge all the session tokens in those lists
- combine the remaining session tokens
for partitionKeyRange in partition_key_ranges] | ||
|
||
def get_updated_session_token(self, | ||
feed_ranges_to_session_tokens: List[Tuple[FeedRange, str]], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Parameter names shouldn't be type descriptions - where would I get this pairing of feed ranges and session tokens? Or how would I build it myself?
Without seeing the full workflow here - it's difficult to know the best way to design this method parameters - could I see a customer sample?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR description has sample of how it could be used. There are two main scenarios: the customer storing in their session token cache by feed range representing logical partition or by feed range representing physical partition. If they want to store their session tokens by logical partition, they would use the feed_range_from_partition_key
api I exposed and get the session token from the response and create this list of tuples. The other option is them storing the session tokens by physical partition using the read_feed_ranges
api to get the feed ranges than they have to use the is_feed_range_subset
api to figure out which partition key is in each range and then combine that with the session token from the response.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks
/azp run python - cosmos - tests |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run python - cosmos - tests |
Azure Pipelines successfully started running 1 pipeline(s). |
…n3/azure-sdk-for-python into tvaron3/sessionTokenHelper
Problem Statement
Customers that maintain their own session tokens could need ways to get the most updated session token. For example, a customer using multiple clients and keeping their session token in a cache could face race conditions when updating the cache. If a customer has a high cardinality of logical partition keys, it will mean storing many session tokens. Addresses #36286.
Changes
APIs
Container.py
def get_latest_session_token(feed_ranges_to_session_tokens: List, target_feed_range: str): --> str
- Requires no metadata callsdef feed_range_for_logical_partition(pk: PartitionKey): --> FeedRange
- There could be metadata calls for the collection properties, but it is cacheddef is_feed_range_subset(parent_feed_range: str, child_feed_range: str): --> bool
- no metadata calls are necessary for thisdef read_feed_ranges(num_of_ranges: int): --> List
- This would be out of scope for this pr. This would require metadata calls to setup the pkrange cacheSamples
Tradeoffs to Storing Session Token by Logical Partition Key vs Physical Partition vs Artificial Feed Ranges
Storing session tokens by logical partition keys has the benefit of requiring fewer updates. This approach minimizes the number of concurrent updates, which can be advantageous in terms of performance. Additionally, during a failover, the availability impact is reduced because there are fewer updates to the session tokens. For example, if Region A fails over and the client has a session token with a global LSN of 42, the next request would go to Region B, where the LSN on the replicas might be 32 due to replication lag. This discrepancy would trigger the 404 / 1002 exception (Read Session Not Available) for any requests with this session token.
On the other hand, using physical partitions or artificial feed ranges involves an optimistic get from the cache, as the number of concurrent updates will increase significantly. However, the benefit of this approach is that the cardinality of the stored session tokens would be significantly less, which can simplify management and reduce overhead. It would also mean a bigger blast radius during a failover as the scenario shown above would be more common.
Implementation
Glossary
Session Token Format: PKRangeId:VersionNumber#GlobalLSN#RegionId1=LocalLSN1#RegionId2=LocalLSN2...
Compound session token: Comma separated session tokens
Some Scenarios
Flow
For the
def is_feed_range_subset(parent_feed_range: str, child_feed_range: str): --> str
api, the implementation will follow the .NET implementation in this pr https://github.com/Azure/azure-cosmos-dotnet-v3/pull/4566/files. Merging session tokens will be done in the same way as the session container. Merging will take higher version number, higher global lsn, and higher local lsns.