Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

deflate: Improve level 5+6 compression #367

Merged
merged 1 commit into from
Apr 26, 2021
Merged

Conversation

klauspost
Copy link
Owner

Improve deflate level 5+6 compression by checking an additional hash when the best found match ends.

This improves compression in most cases at an acceptable speed loss which brings it will in line with the surrounding parameters.

Level 5 tries one hash at length < 30 and level 6 tries 2 at all lengths.

Before/after pairs...

file	out	level	insize	outsize	millis	mb/s
github-june-2days-2019.json	gzkp	5	6273951764	963122453	31498	189.96
github-june-2days-2019.json	gzkp	5	6273951764	947567306	32795	182.45

github-june-2days-2019.json	gzkp	6	6273951764	949824639	34851	171.68
github-june-2days-2019.json	gzkp	6	6273951764	930428507	37312	160.35

nyc-taxi-data-10M.csv	gzkp	5	3325605752	785784479	24729	128.25
nyc-taxi-data-10M.csv	gzkp	5	3325605752	779343831	27189	116.65

nyc-taxi-data-10M.csv	gzkp	6	3325605752	775719630	26690	118.83
nyc-taxi-data-10M.csv	gzkp	6	3325605752	768153050	29905	106.05

enwik9	gzkp	5	1000000000	338823570	10477	91.02
enwik9	gzkp	5	1000000000	337489137	11353	84.00

enwik9	gzkp	6	1000000000	336549505	10791	88.37
enwik9	gzkp	6	1000000000	334933748	11961	79.73

gob-stream	gzkp	5	1911399616	309832207	8596	212.03
gob-stream	gzkp	5	1911399616	307765377	9101	200.28

gob-stream	gzkp	6	1911399616	308962175	9626	189.35
gob-stream	gzkp	6	1911399616	301609641	10305	176.88

Improve deflate level 5+6 compression by checking an additional hash when the best found match ends.

This improves compression in most cases at an acceptable speed loss which brings it will in line with the surrounding parameters.

Level 5 tries one hash at length < 30 and level 6 tries 2 at all lengths.

Before/after pairs...

```
file	out	level	insize	outsize	millis	mb/s
github-june-2days-2019.json	gzkp	5	6273951764	963122453	31498	189.96
github-june-2days-2019.json	gzkp	5	6273951764	947567306	32795	182.45

github-june-2days-2019.json	gzkp	6	6273951764	949824639	34851	171.68
github-june-2days-2019.json	gzkp	6	6273951764	930428507	37312	160.35

nyc-taxi-data-10M.csv	gzkp	5	3325605752	785784479	24729	128.25
nyc-taxi-data-10M.csv	gzkp	5	3325605752	779343831	27189	116.65

nyc-taxi-data-10M.csv	gzkp	6	3325605752	775719630	26690	118.83
nyc-taxi-data-10M.csv	gzkp	6	3325605752	768153050	29905	106.05

enwik9	gzkp	5	1000000000	338823570	10477	91.02
enwik9	gzkp	5	1000000000	337489137	11353	84.00

enwik9	gzkp	6	1000000000	336549505	10791	88.37
enwik9	gzkp	6	1000000000	334933748	11961	79.73

gob-stream	gzkp	5	1911399616	309832207	8596	212.03
gob-stream	gzkp	5	1911399616	307765377	9101	200.28

gob-stream	gzkp	6	1911399616	308962175	9626	189.35
gob-stream	gzkp	6	1911399616	301609641	10305	176.88
```
@klauspost klauspost merged commit ba2263c into master Apr 26, 2021
@klauspost klauspost deleted the deflate-improve-default branch April 26, 2021 11:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant