Skip to content

Route SELECT at SERIAL/LOCAL_SERIAL consistency like LWT#227

Open
sylwiaszunejko wants to merge 4 commits into
scylladb:masterfrom
sylwiaszunejko:route-select-lwt
Open

Route SELECT at SERIAL/LOCAL_SERIAL consistency like LWT#227
sylwiaszunejko wants to merge 4 commits into
scylladb:masterfrom
sylwiaszunejko:route-select-lwt

Conversation

@sylwiaszunejko
Copy link
Copy Markdown

The server cannot mark SELECT statements as LWT during PREPARE because the consistency level is only provided at EXECUTE time. This PR adds client-side detection so that any statement executed with regular consistency SERIAL or LOCAL_SERIAL is routed like an LWT (to a single replica owner).

Fixes: https://scylladb.atlassian.net/browse/DRIVER-617

Comment on lines +545 to +555
internal static RetryDecisionWithReason GuardSerialConsistencyDowngrade(RetryDecisionWithReason result, ConsistencyLevel originalCl)
{
if (originalCl.IsSerialConsistencyLevel()
&& result.Decision.DecisionType == RetryDecision.RetryDecisionType.Retry
&& result.Decision.RetryConsistencyLevel.HasValue
&& !result.Decision.RetryConsistencyLevel.Value.IsSerialConsistencyLevel())
{
return new RetryDecisionWithReason(RetryDecision.Rethrow(), result.Reason);
}
return result;
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two competing constraints in LWT case:

  1. We really really don't want to retry on antoher host
  2. There are cases when node is gone, so we have no choice

So, probably we need to decide based on case:

  1. If node is up and running we should make sure that cl is not degraded and we don't retry on next node
  2. For critical cases, when say taget node is down (at least UnavailableException),then we should throw

Drivers rely on server-provided prepared metadata to detect LWT statements.
However, the server cannot set the LWT flag for SELECT statements during
PREPARE because the consistency level is only known at EXECUTE time.

This change ensures that any statement executed with regular consistency
SERIAL or LOCAL_SERIAL is treated as LWT-equivalent for routing.
When a statement is executed with SERIAL or LOCAL_SERIAL as regular
consistency (serial/Paxos read), retry policies must not downgrade it
to a non-serial consistency level like ONE or QUORUM, as that would
break the serial read guarantees.

Add GuardSerialConsistencyDowngrade() which intercepts retry decisions
for ReadTimeout, WriteTimeout, and Unavailable errors: if the original
CL was serial and the policy suggests a non-serial CL, the decision is
overridden to Rethrow.
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds client-side handling so that statements executed with regular consistency SERIAL / LOCAL_SERIAL are routed like LWT requests (i.e., deterministically to a single replica owner), compensating for the fact that the server can’t mark these as LWT during PREPARE because consistency is only known at EXECUTE time.

Changes:

  • Treat ConsistencyLevel.Serial / LocalSerial as “LWT-like” for routing purposes (including effective consistency coming from request options/execution profiles).
  • Add retry-guard logic intended to prevent unsafe retry behaviors for serial-consistency requests.
  • Add unit/integration tests around serial-consistency LWT detection and routing behavior.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
src/Cassandra/Statement.cs Changes default IsLwt() behavior to treat serial regular CL as LWT-like.
src/Cassandra/BoundStatement.cs Extends LWT detection to include serial regular consistency on bound statements.
src/Cassandra/Requests/RequestHandler.cs Uses effective consistency (statement vs request options) and decorates statements to force LWT-like routing when needed.
src/Cassandra/Requests/RequestExecution.cs Wraps retry decisions with a new guard for serial-consistency retry safety.
src/Cassandra.Tests/SerialConsistencyLwtTests.cs Adds unit tests for serial-consistency LWT detection and guarded retry behavior.
src/Cassandra.IntegrationTests/ScyllaLwtTests.cs Adds Scylla real-cluster tests asserting serial-consistency SELECTs stick to a single coordinator.
src/Cassandra.IntegrationTests/OpenTelemetry/OpenTelemetryTests.cs Gates one OTEL integration test behind a Scylla version attribute.
src/Cassandra.IntegrationTests/Mapping/Tests/Update.cs Gates UpdateIf_Applied_Test behind a Scylla version attribute.
src/Cassandra.IntegrationTests/Mapping/Tests/InsertTests.cs Gates InsertIfNotExists_Applied_Test behind a Scylla version attribute.
src/Cassandra.IntegrationTests/Mapping/Tests/Delete.cs Gates DeleteIf_Applied_Test behind a Scylla version attribute.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +542 to +545
/// Guards LWT (serial consistency) requests against unsafe retry behavior.
/// For LWT requests:
/// - UnavailableException (node is down): always rethrow, since retrying on another node is unsafe
/// - ReadTimeout/WriteTimeout (node is up): prevent CL downgrade and force retry on the same host
Comment on lines 213 to 215
[Category(TestCategory.RealCluster)]
[Test]
[Test, TestScyllaVersion(2026, 1)]
public async Task AddOpenTelemetry_MapperAndMapperAsync_SessionRequestIsParentOfNodeRequest()
Comment on lines +103 to 105
[Test, TestScyllaVersion(2026, 1)]
public void UpdateIf_Applied_Test()
{
Comment on lines +396 to 398
[Test, TestScyllaVersion(2026, 1)]
public void InsertIfNotExists_Applied_Test()
{
Comment on lines +165 to 167
[Test, TestScyllaVersion(2026, 1)]
public void DeleteIf_Applied_Test()
{
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants