Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
pageserver: startup: ensure local disk state is durable (#8835)
refs #6989 Problem ------- After unclean shutdown, we get restarted, start reading the local filesystem, and make decisions based on those reads. However, some of the data might have not yet been fsynced when the unclean shutdown completed. Durability matters even though Pageservers are conceptually just a cache of state in S3. For example: - the cloud control plane is no control loop => pageserver responses to tenant attachmentm, etc, needs to be durable. - the storage controller does not rely on this (as much?) - we don't have layer file checksumming, so, downloaded+renamed but not fsynced layer files are technically not to be trusted - #2683 Solution -------- `syncfs` the tenants directory during startup, before we start reading from it. This is a bit overkill because we do remove some temp files (InMemoryLayer!) later during startup. Further, these temp files are particularly likely to be dirty in the kernel page cache. However, we don't want to refactor that cleanup code right now, and the dirty data on pageservers is generally not that high. Last, with [direct IO](#8130) we're going to have near-zero kernel page cache anyway quite soon.
- Loading branch information
9724177
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
3861 tests run: 3745 passed, 0 failed, 116 skipped (full report)
Code coverage* (full report)
functions
:32.2% (7257 of 22572 functions)
lines
:50.3% (58796 of 117003 lines)
* collected from Rust tests only
9724177 at 2024-08-26T18:03:56.252Z :recycle: