-
Notifications
You must be signed in to change notification settings - Fork 888
CASSJAVA-116: Retry or Speculative Execution with RequestIdGenerator throws "Duplicate Key" #2060
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: 4.x
Are you sure you want to change the base?
Conversation
| default Statement<?> getDecoratedStatement( | ||
| @NonNull Statement<?> statement, @NonNull String requestId) { | ||
| // in case of retry or speculative execution | ||
| if (statement.getCustomPayload().containsKey(getCustomPayloadKey())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we not override the value instead of early-exiting? traceparent key will contain node request ID and IMHO it should be unique per attempt.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Retries or speculative execution attempts are ideally supposed to be the same (unmodified) content that went out the first time... kinda makes sense given the role those two features play in the driver. With that in mind I assume you'd want to have the same ID here so that the retry or speculative execution attempt could be more easily identified as what they are and not mistaken for other follow-up requests from the driver.
@SiyaoIsHiding can confirm if that's what she was after here or not but that seems right to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I agree, I would want to retain the same request ID as the fact that the query was speculated/retried is useful context when tracing a request.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should override instead of using the same ID for the node requests under the same session request.
TL;DR: In all of our ID generator implementations, the node request ID already contains the session request ID in it.
So let's say the session request ID is "4bf92f3577b34da6a3ce929d0e0e4736", if we search nodeRequestID.contains("4bf92f3577b34da6a3ce929d0e0e4736"), we will get all the retry/speculative executions for this session request, e.g. "00-4bf92f3577b34da6a3ce929d0e0e4736-a3ce929d0e0e4736-01" or 00-4bf92f3577b34da6a3ce929d0e0e4736-0e4736a3ce929d0e-01"
This aligns with W3C context , that the trace ID is the same across the transasction, but span ID should be always unique.
This is also documented in our implementations.
Lines 35 to 40 in f631081
| /** | |
| * {session-request-id}-{random-uuid} All node requests for a session request will have the same | |
| * session request id | |
| */ | |
| @Override | |
| public String getNodeRequestId(@NonNull Request statement, @NonNull String parentId) { |
Lines 50 to 57 in f631081
| /** | |
| * Following the format of W3C "traceparent" spec, | |
| * https://www.w3.org/TR/trace-context/#traceparent-header-field-values e.g. | |
| * "00-4bf92f3577b34da6a3ce929d0e0e4736-a3ce929d0e0e4736-01" All node requests in the same session | |
| * request share the same "trace-id" field value | |
| */ | |
| @Override | |
| public String getNodeRequestId(@NonNull Request statement, @NonNull String parentId) { |
Good catch from @lukasz-antoniak 🙏
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I'm missing something but as I understand it the question isn't whether node request ID contains session request ID or not. The question is whether we should update the node request ID (regardless of it's structure) when we need to send retries or speculative executions. In other contexts we go to considerable lengths to make sure we're sending the exact same messages we sent earlier... so it seems a bit weird not to do so here.
The W3C docs linked above indicate that trace-id SHOULD be unique but they don't require it. Furthermore they're talking about uniqueness across individual requests; this spec is silent on things like retrying a request. And at least some sources I'm finding seem to indicate that the same trace-id should be used if a transaction is retried.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This value in the custom payload, what we call "request-id", is not the trace ID, it's the "traceparent" for context propagation.
| Driver concept | W3C concept | Example |
|---|---|---|
| Request ID value in the custom payload | traceparent (for context propagation) | 00-4bf92f3577b34da6a3ce929d0e0e4736-a3ce929d0e0e4736-01 |
| Session request ID | trace ID | 4bf92f3577b34da6a3ce929d0e0e4736 |
| The 8 bytes that we add to a session request ID to form the node request ID | span ID | a3ce929d0e0e4736 |
And the traceparent context propagation format is:
version "-" trace-id "-" span-id "-" trace-flags
Trace ID needs to be the same across all spans in a transaction.
Span ID is always unique. Otherwise, let's say if the driver sends out 2 requests, including a retry, both with the same node request ID. Then on the server side, you see two identical requests, how can you tell which one is the first, which one is the retry?
Span ID is an identifier that always need to be unique to identify each individual operation. Trace ID is how we correlate retries and speculative executions under one session request. The value in the custom payload needs both, so that we can both correlate and separate different operations in one transaction.
| Map<String, ByteBuffer> existing = statement.getCustomPayload(); | ||
| String key = getCustomPayloadKey(); | ||
|
|
||
| NullAllowingImmutableMap.Builder<String, ByteBuffer> builder = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just wondering, at this point what value is NullAllowingImmutableMap providing?
Since it is what has the duplicate key restriction, and it's main value is allowing null keys and values, but we would never set a null key/value in this context (correct me if I'm wrong?).
As far as I understand it, it's just ImmutableMap that has the null key/value restriction. Why wouldn't we just use a regular Map implementation and wrap it in Collections.unmodifiableMap?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did some spelunking and it looks like @emerkle826 made that mod way back in the day. I've already pinged him demanding that he explain himself asking very politely if we could provide the relevant context 🧌
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
re: what map to use.
It seems like the protocol allows null values in the custom payload (source).
So the custom payload provided by the user can already have null values in it, and we need to support that.
But a normal ImmutableMap does not allow a null value. I'm guessing that's why we were using a NullAllowingImmutableMap. Pending for @olim7t and @emerkle826 to confirm.
However, Collections.unmodifiableMap is also immutable and allows null values. I cannot quite see why we don't use that.
The code here will be cleaner if we use Collections.unmodifiableMap. If no objections, I think we can go that way.
No description provided.