Description
openedon Nov 10, 2023
- Package Name: @azure/cosmos
- Package Version: 3.15.1 (also tried 4.0.0)
- Operating system:
- nodejs
- version: 18.18.2
- browser
- name/version:
- typescript
- version:
Describe the bug
This is more of a question rather than a bug. Regarding session consistency, I was under the impression that within the context of a single client, i.e. one instance of CosmosClient
, session tokens would be handled automatically so that, for example, we would have read-after-write consistency; however, that doesn't seem to be the case for us, a subsequent read after a write doesn't always return the updated results. Are we mistaken and we have to manually manage session tokens ourselves?
As background, we have a database account configured with default 'Session' consistency. It is replicated to a second, read-only region. We have an API that uses @azure/cosmos
to talk to the database. It uses a singleton client, and there is only one instance of the API (it does not scale to multiple instances). We also have a suite of tests that make API calls to verify behavior. This is the behavior we see:
- When our CI pipeline (BitBucket) runs the test suite, it will almost always fail. The failures vary from run to run, but the failures are always due to failing read-after-write semantics.
- Tests always succeed when the default consistency is 'Bounded Staleness'.
- Tests always succeed when we remove the second read-only region (i.e. only a single read-write region) while keeping the default 'Session' consistency.
- Tests always succeed when run from my local machine, even when database account is configured for 'Session' consistency and data is replicated to a second region.
I'm pretty sure the consistency issues we're seeing are not due to database requests going to different regions. As far as I see, @azure/cosmos
always defaults to the global endpoint/primary region, though just to be sure, we've tried setting the database hostname to the regional endpoint ([accountname]-[region].documents.azure.us
) and nothing changed, so presumably the consistency issues come from reading from a stale replica in the primary region. I also understand why changing to 'Bounded Staleness' would work (we'd want to stick with 'Session' consistency if possible and not incur the additional RU overhead of 'Bounded Staleness'). What is perplexing is:
- Why does removing the second region cause the tests to succeed if the API service is always going to the primary region?
- Why would the tests succeed when run locally but not when running from our CI pipeline?
I'd appreciate any help in figuring out what may be going wrong.