Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Support for more compression levels for Zstandard codecs #115

Open
sarthakaggarwal97 opened this issue Feb 14, 2024 · 4 comments
Labels
enhancement New feature or request

Comments

@sarthakaggarwal97
Copy link
Collaborator

Is your feature request related to a problem?

Currently, custom-codecs supports 6 compression levels with Zstandard compression codecs. ZSTD library supports compression levels from 1 to 22.

What solution would you like?

We should look into increasing the spectrum of compression levels we support currently.

@sarthakaggarwal97 sarthakaggarwal97 added enhancement New feature or request untriaged labels Feb 14, 2024
@sarthakaggarwal97 sarthakaggarwal97 changed the title [FEATURE] [FEATURE] Support more compression levels for Zstandard codecs Feb 14, 2024
@sarthakaggarwal97 sarthakaggarwal97 changed the title [FEATURE] Support more compression levels for Zstandard codecs [FEATURE] Support for more compression levels for Zstandard codecs Feb 14, 2024
@mgodwan
Copy link
Member

mgodwan commented Feb 15, 2024

ZSTD library supports compression levels from 1 to 22.

ZSTD algorithm should support from -7 to 22. Does the library not expose negative levels?

@sarthakaggarwal97
Copy link
Collaborator Author

@mgodwan we should be able to set negative levels as well. In my previous experiments, the compression ratio of negative levels is almost similar to that of lz4, and so was the performance.
Also, in the Zstd, the compression parameters of negative levels is exactly same, so it would be worth noting the exact difference.

We can run fresh experiments and expand the levels as much as possible.

@mgodwan
Copy link
Member

mgodwan commented Feb 15, 2024

One thing to note is that higher levels can achieve better compression, but at the cost of significant throughput speed and memory overhead. e.g. zstd regards level over 19 as ultra due to the significant memory overhead.
While providing knobs is generally a good idea, I think we need to be careful in how much we expose for the kind of system. e.g. increasing the level to 22 will yield better ratio but speed will reduce a lot which may not be suitable for use case like opensearch given that cost of compute is significantly higher than the storage cost in recent times.

I'd say that it may be good to hear some feedback before increasing the support for more levels so that operators don't fall into traps of attempting to over optimize on this setting (1 to 6 seems to be a good exposed range imho)

$ zstd -e22 -b1 -S test_zstd.json
 1#test_zstd.json    :  72953608 ->   2275485 (x32.06), 1427.0 MB/s  3129.1 MB/s
 2#test_zstd.json    :  72953608 ->   2297113 (x31.76), 1467.0 MB/s, 3026.4 MB/s
 3#test_zstd.json    :  72953608 ->   2324678 (x31.38), 1246.2 MB/s, 3067.8 MB/s
 4#test_zstd.json    :  72953608 ->   2322891 (x31.41), 1233.0 MB/s, 3084.5 MB/s
 5#test_zstd.json    :  72953608 ->   1881209 (x38.78),  319.3 MB/s, 3435.6 MB/s
 6#test_zstd.json    :  72953608 ->   1705027 (x42.79),  218.5 MB/s, 4170.6 MB/s
 7#test_zstd.json    :  72953608 ->   1620936 (x45.01),  198.6 MB/s, 4391.5 MB/s
 8#test_zstd.json    :  72953608 ->   1549335 (x47.09),  166.4 MB/s, 4451.8 MB/s
 9#test_zstd.json    :  72953608 ->   1542736 (x47.29),  154.0 MB/s, 4368.0 MB/s
10#test_zstd.json    :  72953608 ->   1475187 (x49.45),  114.2 MB/s, 4524.6 MB/s
11#test_zstd.json    :  72953608 ->   1426412 (x51.14),   76.3 MB/s, 4654.2 MB/s
12#test_zstd.json    :  72953608 ->   1426178 (x51.15),   71.7 MB/s, 4666.8 MB/s
13#test_zstd.json    :  72953608 ->   1368801 (x53.30),   57.9 MB/s, 4812.5 MB/s
14#test_zstd.json    :  72953608 ->   1271762 (x57.36),   40.2 MB/s, 4997.8 MB/s
15#test_zstd.json    :  72953608 ->   1222523 (x59.67),   32.2 MB/s, 4993.7 MB/s
16#test_zstd.json    :  72953608 ->   1410705 (x51.71),   5.61 MB/s, 4532.0 MB/s
17#test_zstd.json    :  72953608 ->   1370144 (x53.25),   5.29 MB/s, 4379.3 MB/s
18#test_zstd.json    :  72953608 ->   1432867 (x50.91),   5.14 MB/s, 4350.4 MB/s
19#test_zstd.json    :  72953608 ->   1242815 (x58.70),   2.29 MB/s, 4416.5 MB/s
20#test_zstd.json    :  72953608 ->   1226806 (x59.47),   2.24 MB/s, 3776.9 MB/s
21#test_zstd.json    :  72953608 ->   1131111 (x64.50),   0.98 MB/s, 4314.5 MB/s
22#test_zstd.json    :  72953608 ->   1121897 (x65.03),   0.71 MB/s, 4252.4 MB/s

@dblock
Copy link
Member

dblock commented Jun 17, 2024

Catch All Triage - 1 2 3 4 5

@dblock dblock removed the untriaged label Jun 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants