Skip to content

Conversation

@mamcx
Copy link
Contributor

@mamcx mamcx commented Dec 3, 2024

Description of Changes

Compress the snapshot, using Zstd.

Closes #1592 & #1594.

API and ABI breaking changes

It could read previously uncompressed data because it will detect using magic bytes the compression algorithm used, if any.

Because it wrap the files with CompressReader / CompressWriter is transparent to the rest of the engine.

Expected complexity level and risk

1

Testing

  • Backwards Compatibility: Can replay both compressed and uncompressed segment files in the same snapshot
  • Added simple test that open/write the snapshot
  • Include a [ignored] test that using the environment variables SNAPSHOT="path'", IDENTITY=hex` for check the compression ratio on existing data*
  • Added benches for both cases`
  • Manually check with standalone & private that data get compressed

Bench

[crates/bench/benches/special.rs:234:9] &size = SnapshotSize {
    compressed_type: None,
    object_count   : 9,
    file_size      :      525 bytes,
    object_size    :   592145 bytes,
    total_size     :   592670 bytes,
}
[crates/bench/benches/special.rs:234:9] &size = SnapshotSize {
    compressed_type: Zstd,
    object_count   : 9,
    file_size      :      420 bytes,
    object_size    :    17896 bytes,
    total_size     :    18316 bytes,
}
[crates/bench/benches/special.rs:234:9] &size = SnapshotSize {
    compressed_type: Lz4,
    object_count   : 9,
    file_size      :      442 bytes,
    object_size    :    35989 bytes,
    total_size     :    36431 bytes,
}
[crates/bench/benches/special.rs:234:9] &size = SnapshotSize {
    compressed_type: Snap,
    object_count   : 9,
    file_size      :      447 bytes,
    object_size    :    53486 bytes,
    total_size     :    53933 bytes,
}
special/snapshot/synthetic/save_compression_None
                        time:   [3.4299 ms 3.4705 ms 3.5146 ms]
                        thrpt:  [160.82 MiB/s 162.86 MiB/s 164.79 MiB/s]
Found 1 outliers among 50 measurements (2.00%)
  1 (2.00%) high mild
special/snapshot/synthetic/open_compression_None
                        time:   [670.31 µs 671.53 µs 672.81 µs]
                        thrpt:  [840.07 MiB/s 841.68 MiB/s 843.21 MiB/s]
special/snapshot/synthetic/save_compression_Zstd
                        time:   [4.3128 ms 4.3491 ms 4.3853 ms]
                        thrpt:  [128.89 MiB/s 129.96 MiB/s 131.06 MiB/s]
Found 1 outliers among 50 measurements (2.00%)
  1 (2.00%) high mild
special/snapshot/synthetic/open_compression_Zstd
                        time:   [1.0121 ms 1.0257 ms 1.0401 ms]
                        thrpt:  [543.40 MiB/s 551.04 MiB/s 558.47 MiB/s]
special/snapshot/synthetic/save_compression_Lz4
                        time:   [3.8175 ms 3.8567 ms 3.8971 ms]
                        thrpt:  [145.03 MiB/s 146.56 MiB/s 148.06 MiB/s]
Found 2 outliers among 50 measurements (4.00%)
  1 (2.00%) low mild
  1 (2.00%) high severe
special/snapshot/synthetic/open_compression_Lz4
                        time:   [1.3209 ms 1.3221 ms 1.3236 ms]
                        thrpt:  [427.02 MiB/s 427.52 MiB/s 427.90 MiB/s]
Found 2 outliers among 50 measurements (4.00%)
  2 (4.00%) high severe
special/snapshot/synthetic/save_compression_Snap
                        time:   [4.2330 ms 4.3074 ms 4.3845 ms]
                        thrpt:  [128.91 MiB/s 131.22 MiB/s 133.53 MiB/s]
Found 7 outliers among 50 measurements (14.00%)
  3 (6.00%) low mild
  3 (6.00%) high mild
  1 (2.00%) high severe
special/snapshot/synthetic/open_compression_Snap
                        time:   [1.0540 ms 1.0563 ms 1.0590 ms]
                        thrpt:  [533.74 MiB/s 535.08 MiB/s 536.25 MiB/s]

special/snapshot/synthetic/save_compression_None #2
                        time:   [3.4767 ms 3.5054 ms 3.5347 ms]
                        thrpt:  [159.91 MiB/s 161.24 MiB/s 162.57 MiB/s]
Found 1 outliers among 50 measurements (2.00%)
  1 (2.00%) high mild
special/snapshot/synthetic/open_compression_None #2
                        time:   [673.88 µs 674.12 µs 674.37 µs]
                        thrpt:  [838.14 MiB/s 838.45 MiB/s 838.75 MiB/s]
Found 2 outliers among 50 measurements (4.00%)
  2 (4.00%) high mild
special/snapshot/synthetic/save_compression_Zstd #2
                        time:   [4.2698 ms 4.3035 ms 4.3393 ms]
                        thrpt:  [130.25 MiB/s 131.34 MiB/s 132.37 MiB/s]
Found 1 outliers among 50 measurements (2.00%)
  1 (2.00%) high severe
special/snapshot/synthetic/open_compression_Zstd #2
                        time:   [1.0497 ms 1.0531 ms 1.0569 ms]
                        thrpt:  [534.78 MiB/s 536.72 MiB/s 538.46 MiB/s]
Found 4 outliers among 50 measurements (8.00%)
  3 (6.00%) high mild
  1 (2.00%) high severe
special/snapshot/synthetic/save_compression_Lz4 #2
                        time:   [3.7103 ms 3.7474 ms 3.7875 ms]
                        thrpt:  [149.23 MiB/s 150.83 MiB/s 152.34 MiB/s]
Found 2 outliers among 50 measurements (4.00%)
  1 (2.00%) high mild
  1 (2.00%) high severe
special/snapshot/synthetic/open_compression_Lz4 #2
                        time:   [1.3267 ms 1.3529 ms 1.3953 ms]
                        thrpt:  [405.09 MiB/s 417.78 MiB/s 426.02 MiB/s]
Found 6 outliers among 50 measurements (12.00%)
  2 (4.00%) high mild
  4 (8.00%) high severe
special/snapshot/synthetic/save_compression_Snap #2
                        time:   [4.0390 ms 4.0786 ms 4.1180 ms]
                        thrpt:  [137.25 MiB/s 138.58 MiB/s 139.94 MiB/s]
Found 2 outliers among 50 measurements (4.00%)
  1 (2.00%) low mild
  1 (2.00%) high mild
special/snapshot/synthetic/open_compression_Snap #2
                        time:   [1.0285 ms 1.0347 ms 1.0424 ms]
                        thrpt:  [542.25 MiB/s 546.26 MiB/s 549.56 MiB/s]
Found 7 outliers among 50 measurements (14.00%)
  1 (2.00%) high mild
  6 (12.00%) high severe

@mamcx mamcx added Do not merge Do not merge PRs with this label without coordinating further release-1.0 labels Dec 3, 2024
@mamcx mamcx self-assigned this Dec 3, 2024
@mamcx mamcx requested a review from kim December 3, 2024 18:17
@mamcx mamcx force-pushed the mamcx/snapshot-compress branch from 48b1fe2 to 61d708a Compare January 8, 2025 16:00
@mamcx mamcx force-pushed the mamcx/snapshot-compress branch from 61d708a to e31dca4 Compare March 17, 2025 20:45
@mamcx mamcx changed the title Compress the snapshot & commit log Compress the snapshot Mar 20, 2025
@mamcx mamcx force-pushed the mamcx/snapshot-compress branch from e31dca4 to f011e18 Compare March 26, 2025 16:04
@joshua-spacetime joshua-spacetime linked an issue Mar 26, 2025 that may be closed by this pull request
@coolreader18 coolreader18 mentioned this pull request Mar 27, 2025
2 tasks
@mamcx mamcx marked this pull request as ready for review March 28, 2025 17:04
@bfops bfops added the release-any To be landed in any release window label Mar 31, 2025
@bfops
Copy link
Collaborator

bfops commented Mar 31, 2025

should this still be labeled Do not merge?

@mamcx mamcx removed the Do not merge Do not merge PRs with this label without coordinating further label Mar 31, 2025
@bfops bfops linked an issue Apr 4, 2025 that may be closed by this pull request
@mamcx mamcx force-pushed the mamcx/snapshot-compress branch 3 times, most recently from 0af5cab to 025b8e5 Compare April 8, 2025 19:05
@mamcx mamcx requested review from joshua-spacetime and kim April 8, 2025 22:59
@mamcx mamcx force-pushed the mamcx/snapshot-compress branch 2 times, most recently from 031edd4 to 9ef66ae Compare April 9, 2025 21:51
Copy link
Contributor

@kim kim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@mamcx mamcx force-pushed the mamcx/snapshot-compress branch from 9ef66ae to c27ac79 Compare April 10, 2025 17:33
@mamcx mamcx added this pull request to the merge queue Apr 10, 2025
@mamcx mamcx removed this pull request from the merge queue due to a manual request Apr 10, 2025
@mamcx mamcx enabled auto-merge April 11, 2025 14:56
@mamcx mamcx added this pull request to the merge queue Apr 11, 2025
Merged via the queue into master with commit 3fd7820 Apr 11, 2025
13 of 14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release-any To be landed in any release window

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Snapshot compression [STORAGE USE REDUCTION] Snapshot page compression [STORAGE USE REDUCTION] Commitlog segment compression

6 participants