Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
zstd: Improve best compression's match selection (#705)
The best encoder selects matches based on the criterion a.est+(a.s-b.s)*bitsPerByte>>10 < b.est+(b.s-a.s)*bitsPerByte>>10 If this were computed on the reals, it would be equivalent to a.est < b.est, so the added terms only capture round-off error (this is also why CSE doesn't eliminate them). Changing the formula to a.est-b.est+(a.s-b.s)*bitsPerByte>>10 < 0 captures the intention better, I think, and improves compression: enwik9 260989017 259699309 -0.4942% silesia/dickens 3233958 3222189 -0.3639% silesia/mozilla 16980973 16912341 -0.4042% silesia/mr 3505223 3505553 0.0094% silesia/nci 2313702 2289871 -1.0300% silesia/ooffice 2915199 2896410 -0.6445% silesia/osdb 3364752 3390871 0.7763% silesia/reymont 1658404 1656006 -0.1446% silesia/samba 4330660 4326783 -0.0895% silesia/sao 5399736 5416932 0.3185% silesia/webster 9987784 9966351 -0.2146% silesia/xml 542081 538378 -0.6831% silesia/x-ray 5756210 5733061 -0.4022% ... as well as throughput: name old speed new speed delta Encoder_EncodeAllSimple/best-8 12.1MB/s ± 1% 12.2MB/s ± 1% +1.17% (p=0.000 n=18+20) Encoder_EncodeAllSimple4K/best-8 10.4MB/s ± 1% 10.5MB/s ± 1% +0.82% (p=0.000 n=20+20)
- Loading branch information