Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pageserver: startup: ensure local disk state is durable #8835

Merged
merged 1 commit into from
Aug 26, 2024

Commits on Aug 26, 2024

  1. pageserver: ensure local disk state is durable during startup

    refs #6989
    
    Problem
    -------
    
    After unclean shutdown, we get restarted and read the local filesystem
    to make decisions on those reads. Some of the data might have not yet
    been fsynced when the unclean shutdown completed.
    
    Durability matters even though Pageservers are conceptually just a cache
    of state in S3. For example:
    - the cloud control plane is no control loop => pageserver responses
      to tenant attachmentm, etc, needs to be durable.
      - the storage controller does not rely on this (as much?)
    - we don't have layer file checksumming, so, downloaded+renamed but not
      fsynced layer files are technically not to be trusted
      - #2683
    
    Solution
    --------
    
    `syncfs` the tenants directory during startup, before we start reading from it.
    
    This is a bit overkill because we do remove some temp files (InMemoryLayer!)
    later during startup. Further, these temp files are particularly likely to
    be dirty in the kernel page cache. However, we don't want to refactor that
    cleanup code right now, and the dirty data on pageservers is generally
    not that high. Last, with [direct
    IO](#8130) we're going to
    have near-zero kernel page cache anyway quite soon.
    problame committed Aug 26, 2024
    Configuration menu
    Copy the full SHA
    307ace2 View commit details
    Browse the repository at this point in the history