Skip to content

Commit

Permalink
archival: Add failsafe for stale archival STM
Browse files Browse the repository at this point in the history
The 'replicate' method of the archival_metadata_stm using the following
algorithm:
1. Acquire the lock.
2. Create the promise.
3. Replicate the configuration batch and get its offset.
4. Wait until the offset is applied.
5. Wait until the future associated with the created promise is set.
6. Return the result and release the lock.

The background fiber of the archival STM is applying every record batch
to the in-memory state. If the promise is created the background loop
will
- set it to 'errc::success' when the batch is applied and reset it
- set it to error value if the batch can't be applied

Note that the promise is used as a one time mailbox. The replicate
method creates a promise and replicates the batch and then the
background fiber applies this record batch and sends result to the
mailbox (our promise).

This is correct under assumption that the in-memory state of the STM is
up to date (sync method was called before calling repliate). This
assumption is not always correct.

IF between the 'sync' and 'replciate' calls another fiber invoked
'replicate' there is no error because the 'replicate' is waiting until
all changes are applied to the STM. The only problem that we may have is
when the caller invokes 'replicate' without calling 'sync' first. In
this case some old batch may trigger the promise and we will get
incorrect error code in the 'replicate' method.

To avoid this this commit adds extra step to the 'replicate' method. If
'insync' offset of the manifest is lower than 'committed' offset of the
Raft group it will call 'do_sync'. After that it will proceed to step 2
in the algorithm described above. The final algorithm looks like this:

1. Acquire the lock.
2. Call sync if in-sync offset is less than committed offset.
3. Create the promise.
4. Replicate the configuration batch and get its offset.
5. Wait until the offset is applied.
6. Wait until the future associated with the created promise is set.
7. Return the result and release the lock.
  • Loading branch information
Lazin committed Mar 22, 2024
1 parent ac97144 commit a07f8fb
Showing 1 changed file with 20 additions and 0 deletions.
20 changes: 20 additions & 0 deletions src/v/archival/archival_metadata_stm.cc
Original file line number Diff line number Diff line change
Expand Up @@ -802,6 +802,26 @@ ss::future<std::error_code> archival_metadata_stm::do_replicate_commands(

const auto current_term = _insync_term;

// If the caller didn't invoke `sync` before calling `replicate` we
// might have some batches which are not applied to the STM yet. These
// batches could potentially set _active_operation_res and therefore
// mask the actual failure.
{
auto commit = _raft->committed_offset();
auto insync = _manifest->get_insync_offset();
if (insync < commit) {
vlog(_logger.debug, "Replicate is called while STM is catching up");
auto sync_res = co_await do_sync(
config::shard_local_cfg()
.cloud_storage_metadata_sync_timeout_ms.value(),
&as);
if (!sync_res) {
vlog(_logger.warn, "Failed to catch up");
co_return errc::timeout;
}
}
}

// Create a promise to deliver the result of the batch application
_active_operation_res.emplace();
auto broken_promise_to_shutdown = [](const ss::broken_promise&) {
Expand Down

0 comments on commit a07f8fb

Please sign in to comment.