Skip to content

Commit 884d84d

Browse files
committed
[yugabyte#19855] DocDB: Handle blocking shutdown issue on drop table due to active write requests
Summary: As part of commit yugabyte@4d360c7, we changed `TabletPeer::GetRaftConsensus` to return `IllegalState` on Shutdown instead of `NotFound`. This caused a regression in org.yb.cql.TestIndex.testDropDuringWrite As part of AssembleDocWriteBatch, the stuck write query requests the status of a transaction and sees ABORTED state. It then tries to wait for the returned coodinator safe time, giving enough window for actually committed transactions to apply at this participant. `TransactionParticipant::WaitForSafeTime` eventually calls `Tablet::DoGetSafeTime` which tries to access `TabletPeer::GetRaftConsensus()`. But since the shutdown request comes in and sets the flags in the meanwhile, the tablet peer now returns `IllegalState` instead of `NotFound` (prior to the above quoted commit). Earlier, this `NotFound` was being streamed back. But post the above commit, we were instead getting into a state where we execute `mvcc_.SafeTimeForFollower`, which ends up blocking until the request deadline. ``` Result<HybridTime> Tablet::DoGetSafeTime(...) { ... if (require_lease == RequireLease::kFallbackToFollower && ht_lease_result.status().IsIllegalState()) { return CheckSafeTime(mvcc_.SafeTimeForFollower(min_allowed, deadline), min_allowed); } ... } ``` The above snippet was introduced in yugabyte#7729 and is required for correctness. This diff addresses the regression by returning a retryable error at the participant when in shutdown state. Since the `WriteQuery` would eventually not be processed due to already shut down consensus, we could early fail the `WaitForSafeTime` request with a retryable error status. Jira: DB-8799 Test Plan: Jenkins ./yb_build.sh fastdebug --java-test org.yb.cql.TestIndex#testDropDuringWrite -n 20 --tp 1 Reviewers: sergei, arybochkin, rsami Reviewed By: arybochkin, rsami Subscribers: bogdan, ybase Differential Revision: https://phorge.dev.yugabyte.com/D31617
1 parent d70e123 commit 884d84d

File tree

1 file changed

+11
-0
lines changed

1 file changed

+11
-0
lines changed

src/yb/tablet/transaction_participant.cc

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1237,6 +1237,17 @@ class TransactionParticipant::Impl
12371237
}
12381238

12391239
Result<HybridTime> WaitForSafeTime(HybridTime safe_time, CoarseTimePoint deadline) {
1240+
// Once a WriteQuery passes conflict resolution, it performs all the required read operations
1241+
// as part of docdb::AssembleDocWriteBatch. While iterating over the relevant intents, it
1242+
// requests statuses of the corresponding transactions as of the picked read time. On seeing
1243+
// an ABORTED status, it decides to wait until the coordinator returned safe time so as to not
1244+
// wrongly interpret a COMMITTED transaction as aborted. A shutdown request could arrive in
1245+
// the meanwhile and change the state of the tablet peer. If so, return a retryable error
1246+
// instead of invoking the downstream code which eventually does a blocking wait until the
1247+
// deadline passes.
1248+
if (Closing()) {
1249+
return STATUS_FORMAT(IllegalState, "$0Transaction Participant is shutting down", LogPrefix());
1250+
}
12401251
return participant_context_.WaitForSafeTime(safe_time, deadline);
12411252
}
12421253

0 commit comments

Comments
 (0)