-
Notifications
You must be signed in to change notification settings - Fork 6.9k
Fix Statement.cancel() #37472
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Fix Statement.cancel() #37472
Conversation
|
@terrymanu Sorry , the previous pr was accidently closed. |
|
@terrymanu Please review this. |
terrymanu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current PR has blocking issues and shouldn’t be merged:
- Root cause not fixed: The handshake exposes a 32-bit connection ID while Process stores a 64-bit one. On MySQL, ProcessRegistry matches the full 64-bit ID and MySQLKillProcessExecutor relies on node bits, so the client’s 32-bit ID never matches and may be treated as
“remote,” causing cancel/KILL to fail. On PostgreSQL, BackendKeyData only carries a 32-bit processId, but Process stores only a 64-bit connectionId and never sets postgresqlProcessId; CancelRequest looks up by 32-bit, so in a cluster it can’t find the target. - Cluster forwarding gap: KillProcessHandler parses IDs with Integer.parseInt; 64-bit IDs with node bits are dropped and never forwarded to the right node.
- Persistence gap: YamlProcess/swapper only store mysqlThreadId/postgresqlProcessId and omit the 64-bit connection ID and secretKey, so cross-node cancel/kill lacks critical data.
- Change surface too large: 71 files touched, with many unrelated format/alignment tweaks (e.g., Firebird/OpenGauss frontend), which inflates the diff and review cost.
Suggested direction: (1) Align the handshake identifier with what Process stores—either make the handshake ID reconstruct node bits or map the 32-bit handshake ID back to the 64-bit stored value. (2) PostgreSQL should store both 32-bit postgresqlProcessId and 64-bit
postgresqlConnectionId; Cancel should match on 32-bit, and both the 64-bit ID and secretKey should be persisted. (3) KillProcessHandler must accept and forward 64-bit connection IDs. (4) Extend YamlProcess/swapper to include 64-bit connection IDs and secretKey. (5) Trim
this PR to only the necessary changes, remove unrelated formatting tweaks, and add tests covering handshake → process bind → cancel/KILL for local and cross-node paths (32/64-bit, secretKey mismatch).
In its current form, the PR should not be merged.
Thankyou for the review and i will surely look into this |
|
hi,@terrymanu i have an doubt ,connection id is pass through firebird/opengaus , and the function written in these are in int |
|
Currently, with an auto-increment ID scheme, there might be a scenario where some clients use connection pools while others connect directly. Suppose the auto-increment ID overflows but some original connections haven’t been released yet. In that case, duplicate IDs could occur. |
I will open a new issue to discuss this problem. In the meantime, I think we should pause any further work on this PR. |
Please give me some time ,I am working on this issue |
Thanks for pointing this out. I agree this scenario can lead to duplicate IDs when auto-increment overflows and some connections are still active (especially with mixed pooled and direct connections). To handle this safely, I used a BitSet to track allocated connection IDs up to the maximum allowed connections. On allocation, we pick the next clear bit, and on release, we clear it. This ensures an ID is reused only after the original connection is fully released, even after overflow. This avoids collisions, works well with connection pools, and keeps the behavior deterministic. |
|
@terrymanu what are you thoughts about this |
|
We’ve overcomplicated this. The real goal is just to make Kill/Cancel hit the running process. The chatter exploded because the scope ballooned: we mixed in 64-bit global IDs, cross-node forwarding, persistence, and even Firebird/OpenGauss while the client protocol still exposes 32-bit IDs, and we never nailed a clear mapping. On top of that, we touched 70+ files before agreeing on the design, so formatting/compat issues drowned out the core fix. Let’s simplify and get back on target:
First make the core path reliable; discuss global IDs/bitset allocation separately afterward. |
Thanks for the clarification. I agree the scope grew too large and diluted the core fix. I’ll reset this PR to focus only on the essential path: Plan:
I’ll update the PR with a minimal, clean diff following this approach. |
4420755 to
0d59385
Compare
|
I’ve reduced the PR to only the process ID binding during authentication, removing all unrelated changes to keep the scope minimal and focused. |
|
This change fixes the MySQL handshake ID mismatch on the executing node, which was the root cause of KILL / Statement.cancel() failing in cluster mode. Cross-node routing is intentionally not addressed here and can be discussed separately. |
|
@terrymanu please review the pr. So i can work on the cross node routing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the effort—good direction, but the current patch doesn’t solve the root cause and introduces new issues. Please refine and resubmit:
- Design/abstraction issue: Process is a cross-protocol infra class. Adding mysqlThreadId and MySQL-specific lookup into Process/ProcessRegistry bakes dialect details into shared code. Please revert these changes and handle MySQL handshake thread id binding at the MySQL protocol/session layer, or introduce a protocol-agnostic “native connection identifier” abstraction instead.
- Root cause still unsolved: The handshake thread id remains node-local and not cluster-unique; processId is still not aligned with the handshake id. During execution, ProcessEngine.executeSQL/completeSQLExecution recreates Process with the same processId, dropping the thread id you set at handshake time, so KILL can still fail to find the running process. Please first ensure the handshake id is cluster-unique (or can route to the right node) and keep it associated with the processId across the full lifecycle before wiring cancel/kill.
- Newly introduced risks: The numeric thread-id scan in ProcessRegistry.get may kill the wrong session if thread ids collide across nodes; the new test sets threadId manually and doesn’t cover the real handshake→execute→kill flow, and it leaves the singleton ProcessRegistry state dirty. Please fix these or roll them back.
- Unrelated/incorrect changes to revert: Remove the MySQL-specific fields and lookup logic from the infra Process/ProcessRegistry layer; redo the binding and routing in the MySQL protocol layer.
I encourage you to continue: make the handshake connection/thread id cluster-unique, keep it consistently mapped to the processId through the execution lifecycle, and add an end-to-end test covering handshake → execute → KILL/cancel. Looking forward to the updated version.
Fixes #37404
Changes proposed in this pull request:
In Apache ShardingSphere Proxy using the MySQL protocol, cancelling a running SQL statement does not correctly interrupt
the underlying JDBC Statement in a cluster environment.
Although a Process instance is created for the execution, the MySQL thread ID is not bound to the connection session. As a
result, KILL QUERY / Statement.cancel() cannot locate or terminate the actual running MySQL query.
This pull request binds the MySQL thread ID to the connection session after authentication, ensuring that cancellation signals
are correctly propagated to the JDBC layer.
Expected behavior
Actual behavior
Environment
ShardingSphere version: 5.5.3-SNAPSHOT
OS: Windows 11
Database: MySQL
Mode: Cluster mode
Before committing this PR, I'm sure that I have checked the following options: