Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Session Token Management APIs #36971

Open
wants to merge 67 commits into
base: main
Choose a base branch
from

Conversation

tvaron3
Copy link
Member

@tvaron3 tvaron3 commented Aug 21, 2024

Problem Statement

Customers that maintain their own session tokens could need ways to get the most updated session token. For example, a customer using multiple clients and keeping their session token in a cache could face race conditions when updating the cache. If a customer has a high cardinality of logical partition keys, it will mean storing many session tokens. Addresses #36286.

Changes

  • The changes would be part of a preview package
  • Added a new method to get most updated session tokens for customers wanting to keep track of their own session tokens.
  • Added a new api for converting logical partition key to feed range
  • Added new api for checking if a feed range is a subset of another feed range
  • Fixed bug with overlapping ranges and added test coverage
  • In the future could add ability to get artificial feed ranges to do operations

APIs

Container.py
def get_latest_session_token(feed_ranges_to_session_tokens: List, target_feed_range: str): --> str - Requires no metadata calls
def feed_range_for_logical_partition(pk: PartitionKey): --> FeedRange - There could be metadata calls for the collection properties, but it is cached
def is_feed_range_subset(parent_feed_range: str, child_feed_range: str): --> bool- no metadata calls are necessary for this
def read_feed_ranges(num_of_ranges: int): --> List - This would be out of scope for this pr. This would require metadata calls to setup the pkrange cache

Samples

# This would be happening through different clients 
# Using physical partition model for read operations
cache = {}
session_token = ""
feed_range = container.feed_range_for_logical_partition(logical_pk)
# Get the correct session token from the cache 
for stored_feed_range, stored_session_token in cache:
    if container.is_feed_range_subset(stored_feed_range, feed_range):
        session_token = stored_session_token
read_item = container.read_item(doc_to_read, logical_pk, session_token)

logical_pk_feed_range = container.feed_range_for_logical_partition(logical_pk)
# we recommend using the response hook to get the session token but for the example just getting it from last
# response headers
session_token = container.client_connection.last_response_headers["x-ms-session-token"]
feed_ranges_and_session_tokens = []

# Get feed ranges for physical partitions
container_feed_ranges = container.read_feed_ranges()
target_feed_range = ""

# which feed range maps to the logical pk from the operation
for feed_range in container_feed_ranges:
    if container.is_feed_range_subset(feed_range, logical_pk_feed_range):
        target_feed_range = feed_range
        break 
for cached_feed_range, cached_session_token in cache:
        feed_ranges_and_session_tokens.append((cached_feed_range, cached_session_token))
# Add the target feed range and session token from the operation
feed_ranges_and_session_tokens.append((target_feed_range, session_token))
cache[feed_range] = container.get_latest_session_token(feed_ranges_and_session_tokens, target_feed_range)



# Different ways of storing the session token and how to get most updated session token

# ---------------------1. using logical partition key ---------------------------------------------------
# could also use the one stored from the responses headers
target_feed_range = container.feed_range_for_logical_partition(logical_pk)
updated_session_token = container.get_latest_session_token(feed_ranges_and_session_tokens, target_feed_range)
# ---------------------2. using artificial feed range ----------------------------------------------------
# Get four artificial feed ranges
container_feed_ranges = container.read_feed_ranges(4)

pk_feed_range = container.feed_range_for_logical_partition(logical_pk)
target_feed_range = ""
# which feed range maps to the logical pk from the operation
for feed_range in container_feed_ranges:
    if container.is_feed_range_subset(feed_range, pk_feed_range):
        target_feed_range = feed_range
        break 

updated_session_token = container.get_latest_session_token(feed_ranges_and_session_tokens, target_feed_range)
# ---------------------3. using physical partitions -----------------------------------------------------
# Get feed ranges for physical partitions
container_feed_ranges = container.read_feed_ranges()

pk_feed_range = container.feed_range_for_logical_partition(logical_pk)
target_feed_range = ""
# which feed range maps to the logical pk from the operation
for feed_range in container_feed_ranges:
    if container.is_feed_range_subset(feed_range, pk_feed_range):
        target_feed_range = feed_range
        break 

updated_session_token = container.get_latest_session_token(feed_ranges_and_session_tokens, target_feed_range)
# ------------------------------------------------------------------------------------------------------

Tradeoffs to Storing Session Token by Logical Partition Key vs Physical Partition vs Artificial Feed Ranges

Storing session tokens by logical partition keys has the benefit of requiring fewer updates. This approach minimizes the number of concurrent updates, which can be advantageous in terms of performance. Additionally, during a failover, the availability impact is reduced because there are fewer updates to the session tokens. For example, if Region A fails over and the client has a session token with a global LSN of 42, the next request would go to Region B, where the LSN on the replicas might be 32 due to replication lag. This discrepancy would trigger the 404 / 1002 exception (Read Session Not Available) for any requests with this session token.

On the other hand, using physical partitions or artificial feed ranges involves an optimistic get from the cache, as the number of concurrent updates will increase significantly. However, the benefit of this approach is that the cardinality of the stored session tokens would be significantly less, which can simplify management and reduce overhead. It would also mean a bigger blast radius during a failover as the scenario shown above would be more common.

Implementation

Glossary

Session Token Format: PKRangeId:VersionNumber#GlobalLSN#RegionId1=LocalLSN1#RegionId2=LocalLSN2...
Compound session token: Comma separated session tokens

Some Scenarios

Scenario Input Output
Normal Case [("AA-BB", "0:1#54#3=50"), ("AA-BB","0:1#51#3=52")], "AA-BB" "0:1#54#3=52"
Physical Partition Split with Both Children [("AA-DD", "0:1#51#3=52"), ("AA-BB","1:1#55#3=52"), ("BB-DD","2:1#54#3=52")], "AA-DD" "1:1#55#3=52, 2:1#54#3=52"
Physical Partition Split with One Child [("AA-DD", "0:1#51#3=52"), ("AA-BB","1:1#55#3=52")], "AA-DD" "0:1#51#3=52, 1:1#55#3=52"
Physical Partition Merge [("AA-DD", "0:1#55#3=52"), ("AA-BB","1:1#51#3=52")], "AA-DD" "0:1#55#3=52"
Compound Session Token [("AA-DD", "2:1#54#3=52, 1:1#55#3=52"), ("AA-BB","0:1#51#3=52")], "AA-BB" "2:1#54#3=52, 1:1#55#3=52, 0:1#51#3=52"
Several Compound Session Token [("AA-DD", "2:1#57#3=52, 1:1#57#3=52"), ("AA-DD","2:1#56#3=52, 1:1#58#3=52")], "AA-DD" "2:1#57#3=52, 1:1#58#3=52"
Overlapping Ranges [("AA-CC", "0:1#54#3=52"), ("BB-FF","2:1#51#3=52")], "AA-EE" "0:1#54#3=52,2:1#51#3=52"
No Relevant Feed Ranges [("CC-DD", "0:1#54#3=52"), ("EE-FF","0:1#51")], "AA-BB" throw illegal argument exception

Flow

For the def is_feed_range_subset(parent_feed_range: str, child_feed_range: str): --> str api, the implementation will follow the .NET implementation in this pr https://github.com/Azure/azure-cosmos-dotnet-v3/pull/4566/files. Merging session tokens will be done in the same way as the session container. Merging will take higher version number, higher global lsn, and higher local lsns.

---
title: Merge Session Tokens
---
flowchart TB
A["[(feed_range, session_token), ...],  target_feed_range"] --> B[filter all tuples with feed_range overlapping with target_feed_range]
B --> C{'Is there a feed_range that is a superset of some of the other feed_ranges excluding tuples with compound session tokens?}
C -- Yes and Superset Feed Range has Higher LSN --> F["merge and take the pkrangeid(s) of the higher session token(s)"]
C -- Yes and Superset has Lower LSN --> I{Are there feed_ranges that can be combined to be equal or larger than the super set feed range?}
I -- yes --> F
I -- No --> Z[compound the session tokens]
Z --> C
F --> C
C-- no --> H[compound the session tokens]
H --> E[Merge any session tokens with same pkrangeids]
Loading

@azure-sdk
Copy link
Collaborator

API change check

APIView has identified API level changes in this PR and created following API reviews.

azure-cosmos

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

remove_session_tokens = [session_tokens[i], session_tokens[j]]
for token in remove_session_tokens:
session_tokens.remove(token)
i = -1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should it be i--?

Copy link
Member Author

@tvaron3 tvaron3 Oct 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No I want to reset i to the beginning of the list because I want to include the merged session token I created because it could still be merged again with another session token and the indexing gets messed up after removing items from the list. Essentially:

  1. Merge the session tokens with same pkrangeid
  2. then remove the old session tokens
  3. then reset to beginning
  4. repeat until no common pkrangeids.

Alternative approach:

  1. Make lists of session tokens with common pkrangeids
  2. Merge all the session tokens in those lists
  3. combine the remaining session tokens

@tvaron3 tvaron3 self-assigned this Oct 15, 2024
sdk/cosmos/azure-cosmos/azure/cosmos/aio/_container.py Outdated Show resolved Hide resolved
for partitionKeyRange in partition_key_ranges]

def get_updated_session_token(self,
feed_ranges_to_session_tokens: List[Tuple[FeedRange, str]],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parameter names shouldn't be type descriptions - where would I get this pairing of feed ranges and session tokens? Or how would I build it myself?
Without seeing the full workflow here - it's difficult to know the best way to design this method parameters - could I see a customer sample?

Copy link
Member Author

@tvaron3 tvaron3 Oct 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR description has sample of how it could be used. There are two main scenarios: the customer storing in their session token cache by feed range representing logical partition or by feed range representing physical partition. If they want to store their session tokens by logical partition, they would use the feed_range_from_partition_key api I exposed and get the session token from the response and create this list of tuples. The other option is them storing the session tokens by physical partition using the read_feed_ranges api to get the feed ranges than they have to use the is_feed_range_subset api to figure out which partition key is in each range and then combine that with the session token from the response.

@annatisch annatisch dismissed their stale review October 16, 2024 18:21

Can discuss further during review

Copy link
Member

@xinlian12 xinlian12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks

@tvaron3
Copy link
Member Author

tvaron3 commented Oct 17, 2024

/azp run python - cosmos - tests

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@tvaron3
Copy link
Member Author

tvaron3 commented Oct 25, 2024

/azp run python - cosmos - tests

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

5 participants