Epic: pageserver: layer visibility

Currently, we make eviction decisions based on access time, and have a "hot set" based on what ends up resident as a result of that atime-driven eviction.

This results in sub-optimal outcomes sometimes:
- A delta that has been covered with an image layer is usually not useful unless there is some branchpoint that can read it: but we keep these layers on disk until their atime hits a threshold anyway.
- Secondary locations download such deltas even though they're not going to be needed when we fail over a tenant to the secondary location.
- We lack a clear metric for how much data we would like to have on disk for a tenant in order to satisfy read performance goals (i.e. fast reads for the data visible to existent branches): resident size can be either an over-estimate or an under-estimate.

I this epic, we add a concept of "visibility" to layers, where visibility means that we might need this layer to service a getpage request.  This does not need to be always accurate because it is a heuristic, but it needs to have some properties we can rely on:
- Once a layer is read, we mark it visible
- When a layer is covered during compaction, we update its visibility immediately to make it a priority target for eviction
- Across a restart, we should recover an accurate view of visibility so that we don't do things like thrashing secondary locations' ideas of visible sets

When we implement timeline archival, archived timelines' branchpoints should not contribute to visibility of layers.


```[tasklist]
### Tasks
- [x] Add visibility type+state to Layer.  Make them visible by default, and set to visible any time they are accessed
- [x] In compaction, when we cover some deltas with an image layer, update their visibility
- [x] On startup, globally recalculate visibility based on image layer coverage and branch points
- [x] Expose a 'visible size' metric for tenants (or timelines)
- [ ] https://github.com/neondatabase/neon/pull/8616
- [ ] https://github.com/neondatabase/neon/pull/8617
- [ ] https://github.com/neondatabase/neon/pull/8679
- [ ] Optimization: on branch creation, update visibility rather than waiting for client reads to mark things visible
- [ ] Optimization: on branch deletion, update visibility to mark no-longer visible layers as such.
- [ ] Optimization: transmit covered ranges from branches to their parents so that the parents don't need mark visible all data at the branch point.
```




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Epic: pageserver: layer visibility #8398

jcsp
openedon Jul 16, 2024

Tasks

Assignees

Labels

Type

Projects

Milestone

Relationships

Development