bypass PageCache for `InMemoryLayer::get_values_reconstruct_data`

part of epic https://github.com/neondatabase/neon/issues/7386

bit of prior discussion in https://neondb.slack.com/archives/C033RQ5SPDH/p1719411245662839

---

`InMemoryLayer::get_values_reconstruct_data` uses `read_blob`, which internally uses the PageCache for block access.

Switch it to vectored reads that bypass the PageCache.

However, we want to deliver equivalent performance compared to the current code in the case where the current code, in one call, reads multiple blobs from the same 8kb EphemeralFile page.

Strategy for this (planned together with @VladLazar ):

1. store the blob lengths in the in-memory btree
  - avoid consuming more memory space by using u32 instead of u64 for offset. u32 is enough if we cap EphemeralFile to 4GiB, which is way larger than we want it to go anyways 3. 
2. Get rid of the whole blob_io business for InMemoryLayer, we don't need it if we store offset and length in the in-memory index.
3. For `get_values_reconstruct_data`, feed the `(offset, length)` pairs directly into the `VectoredReadBuilder` (after sorting them in offset order, so the builder can merge adjacent blob reads as needed)

```[tasklist]
### Tasks
- [ ] https://github.com/neondatabase/neon/pull/8717
- [ ] https://github.com/neondatabase/neon/pull/8537
- [x] extraordinary rollout to pre-prod  & observe benchmark results
- [ ] rollout to prod & observe page cache dashboard
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bypass PageCache for `InMemoryLayer::get_values_reconstruct_data` #8183

problame
openedon Jun 27, 2024

Tasks

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

bypass PageCache for InMemoryLayer::get_values_reconstruct_data #8183

Description

problameopenedon Jun 27, 2024