Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
archival: Add failsafe for stale archival STM
The 'replicate' method of the archival_metadata_stm using the following algorithm: 1. Acquire the lock. 2. Create the promise. 3. Replicate the configuration batch and get its offset. 4. Wait until the offset is applied. 5. Wait until the future associated with the created promise is set. 6. Return the result and release the lock. The background fiber of the archival STM is applying every record batch to the in-memory state. If the promise is created the background loop will - set it to 'errc::success' when the batch is applied and reset it - set it to error value if the batch can't be applied Note that the promise is used as a one time mailbox. The replicate method creates a promise and replicates the batch and then the background fiber applies this record batch and sends result to the mailbox (our promise). This is correct under assumption that the in-memory state of the STM is up to date (sync method was called before calling repliate). This assumption is not always correct. IF between the 'sync' and 'replciate' calls another fiber invoked 'replicate' there is no error because the 'replicate' is waiting until all changes are applied to the STM. The only problem that we may have is when the caller invokes 'replicate' without calling 'sync' first. In this case some old batch may trigger the promise and we will get incorrect error code in the 'replicate' method. To avoid this this commit adds extra step to the 'replicate' method. If 'insync' offset of the manifest is lower than 'committed' offset of the Raft group it will call 'do_sync'. After that it will proceed to step 2 in the algorithm described above. The final algorithm looks like this: 1. Acquire the lock. 2. Call sync if in-sync offset is less than committed offset. 3. Create the promise. 4. Replicate the configuration batch and get its offset. 5. Wait until the offset is applied. 6. Wait until the future associated with the created promise is set. 7. Return the result and release the lock.
- Loading branch information