pageserver: handle decompression outside vectored `read_blobs` #8942

yliang412 · 2024-09-06T12:12:19Z

Part of #8130.

Problem

Currently, decompression is performed within the read_blobs implementation and the decompressed blob will be appended to the end of the BytesMut buffer. We will lose this flexibility of extending the buffer when we switch to using our own dio-aligned buffer (WIP in #8730). To facilitate the adoption of aligned buffer, we need to refactor the code to perform decompression outside read_blobs.

Summary of changes

VectoredBlobReader::read_blobs will return VectoredBlob without performing decompression and appending decompressed blob. It becomes the caller's responsibility to decompress the buffer.
Added a new BufView type that functions as Cow<Bytes, &[u8]>.
Perform decompression within VectoredBlob::read so that people don't have to explicitly thinking about compression when using the reader interface.

Checklist before requesting a review

I have performed a self-review of my code.
If it is a core feature, I have added thorough tests.
Do we need to implement analytics? if so did you add the relevant metrics to the dashboard?
If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section.

Checklist before merging

Do not forget to reformat commit message to not include the above checklist

Signed-off-by: Yuchen Liang <yuchen@neon.tech>

…-read-blobs

pageserver/src/tenant/vectored_blob_io.rs

pageserver/src/tenant/storage_layer/delta_layer.rs

github-actions · 2024-09-06T14:47:08Z

4986 tests run: 4822 passed, 0 failed, 164 skipped (full report)

Flaky tests (6)

Postgres 17

test_pageserver_compaction_smoke: release-arm64
test_subscriber_synchronous_commit: release-arm64
test_pg_regress[4]: debug-x86-64
test_readonly_node_gc: debug-x86-64

Postgres 16

test_ondemand_wal_download_in_replication_slot_funcs: release-x86-64

Postgres 15

test_ondemand_wal_download_in_replication_slot_funcs: release-x86-64

Code coverage* (full report)

functions: 32.1% (7475 of 23255 functions)
lines: 50.0% (60227 of 120574 lines)

* collected from Rust tests only

_{The comment gets automatically updated with the latest test results
67bd614 at 2024-09-24T16:56:24.080Z :recycle:}

pageserver/src/tenant/storage_layer/image_layer.rs

problame

Regarding @koivunej's #8942 (comment): I think if decompress took a Cow<Bytes> instead of the Option<Bytes> it would make the consuming code paths more concise.

pageserver/src/tenant/vectored_blob_io.rs

pageserver/src/tenant/storage_layer/image_layer.rs

pageserver/src/tenant/storage_layer/delta_layer.rs

arpad-m

I chose appending to the buffer in read_blobs because I didn't want to make compression someone has to explicitly think about when using the reader interface.

Because if it is something someone has to explicitly think about, mistakes can happen, and people can forget about it, resulting in the code working on compressed data. And it is more cumbersome to work with the API.

Therefore, I agree with Christian's suggestion to add a function to VectoredBlob to allow reading of the blobs, and returning something like Cow<Bytes, &[u8]> (or a custom enum if that doesn't work).

Ideally one would make the members of VectoredBlob private so that one can't mistakenly forget about calling the decompress function on it.

Signed-off-by: Yuchen Liang <yuchen@neon.tech>

orca-security-us

Orca Security Scan Summary

Status	Check	Issues by priority
Passed	Infrastructure as Code	0 0 0 0	View in Orca
Passed	Secrets	0 0 0 0	View in Orca
Passed	Vulnerabilities	0 0 0 0	View in Orca

arpad-m

Much better now, thanks for adjusting it. It's well done work.

pageserver/src/tenant/vectored_blob_io.rs

…-read-blobs

Signed-off-by: Yuchen Liang <yuchen@neon.tech>

…-read-blobs

Signed-off-by: Yuchen Liang <yuchen@neon.tech>

lift compression to be handled outside vectored read_blobs

5f5653a

Signed-off-by: Yuchen Liang <yuchen@neon.tech>

yliang412 force-pushed the yuchen/handle-decompression-outside-vectored-read-blobs branch from 44793b7 to 5f5653a Compare September 6, 2024 13:05

yliang412 added the c/storage/pageserver Component: storage: pageserver label Sep 6, 2024

yliang412 self-assigned this Sep 6, 2024

yliang412 requested review from problame and arpad-m September 6, 2024 13:19

Merge branch 'main' into yuchen/handle-decompression-outside-vectored…

0ebb99f

…-read-blobs

yliang412 marked this pull request as ready for review September 6, 2024 13:28

yliang412 requested a review from a team as a code owner September 6, 2024 13:28

koivunej reviewed Sep 6, 2024

View reviewed changes

pageserver/src/tenant/vectored_blob_io.rs Outdated Show resolved Hide resolved

koivunej reviewed Sep 6, 2024

View reviewed changes

pageserver/src/tenant/storage_layer/delta_layer.rs Outdated Show resolved Hide resolved

koivunej reviewed Sep 6, 2024

View reviewed changes

pageserver/src/tenant/storage_layer/image_layer.rs Outdated Show resolved Hide resolved

problame approved these changes Sep 6, 2024

View reviewed changes

pageserver/src/tenant/vectored_blob_io.rs Outdated Show resolved Hide resolved

pageserver/src/tenant/vectored_blob_io.rs Outdated Show resolved Hide resolved

problame reviewed Sep 6, 2024

View reviewed changes

pageserver/src/tenant/vectored_blob_io.rs Show resolved Hide resolved

problame reviewed Sep 6, 2024

View reviewed changes

pageserver/src/tenant/storage_layer/image_layer.rs Show resolved Hide resolved

problame reviewed Sep 6, 2024

View reviewed changes

pageserver/src/tenant/storage_layer/delta_layer.rs Show resolved Hide resolved

arpad-m requested changes Sep 6, 2024

View reviewed changes

use custom view type

32284e5

Signed-off-by: Yuchen Liang <yuchen@neon.tech>

orca-security-us bot reviewed Sep 9, 2024

View reviewed changes

arpad-m approved these changes Sep 10, 2024

View reviewed changes

pageserver/src/tenant/vectored_blob_io.rs Outdated Show resolved Hide resolved

pageserver/src/tenant/vectored_blob_io.rs Outdated Show resolved Hide resolved

yliang412 mentioned this pull request Sep 16, 2024

pageserver: direct I/O #8130

Open

yliang412 and others added 6 commits September 23, 2024 14:01

Merge branch 'main' into yuchen/handle-decompression-outside-vectored…

6b54afd

…-read-blobs

review: update VectoredBlob doc comments

cb7fc93

Signed-off-by: Yuchen Liang <yuchen@neon.tech>

review: use explicit into_bytes function to avoid hiding copy

1ce757c

Signed-off-by: Yuchen Liang <yuchen@neon.tech>

review: rename VectoredBlobBufView to BufView

1ca519f

Signed-off-by: Yuchen Liang <yuchen@neon.tech>

review: declare decompressed_vec in the block that needs it

bb07ab0

Signed-off-by: Yuchen Liang <yuchen@neon.tech>

Merge branch 'main' into yuchen/handle-decompression-outside-vectored…

ac80a44

…-read-blobs

yliang412 mentioned this pull request Sep 24, 2024

[WIP]: dedup if-err-then-call-on_key_error-and-set-ignore_key_with_err-and-continue logic #9123

Closed

5 tasks

fix clippy

67bd614

Signed-off-by: Yuchen Liang <yuchen@neon.tech>

yliang412 enabled auto-merge (squash) September 24, 2024 14:54

yliang412 merged commit 4f67b02 into main Sep 24, 2024
75 checks passed

yliang412 deleted the yuchen/handle-decompression-outside-vectored-read-blobs branch September 24, 2024 16:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pageserver: handle decompression outside vectored `read_blobs` #8942

pageserver: handle decompression outside vectored `read_blobs` #8942

yliang412 commented Sep 6, 2024 •

edited

Loading