Reduce LZMA dictionary size for small baskets #194

dan131riley · 2016-07-13T14:55:31Z

LZMA by default creates very large hash tables for its dictionaries, e.g., at compression level 4, the hash table is 4Mi 4 byte entries, 16 MiB total. The hash table has to be zeroed before use so it is allocated via calloc(), which means all the pages have to be allocated, mapped and written. ROOT baskets are often much smaller than the default LZMA dictionaries; for small baskets, the large dictionary has very little compression benefit, while zeroing the hash table can be more expensive than the actual compression operation.

Since R__zipLZMA() is actually being used to compress a buffer of known size, not a stream, we can use the size of the buffer to estimate an appropriate size for the dictionary. This PR uses a slightly more advanced part of the LZMA API to set the dictionary size to 1/4 the size of the input buffer, if that is smaller than the default size from the selected preset compression level. In tests with CMS data, this results in less than 1% increase in the output size and (in one test job) a 25% reduction in job total run time, with LZMA compression time reduced by 80% (all of that time that was being spent in memset() zeroing the hash table).

I also tested this with the "Event" test program with Brian's changes from #59. With the same test parameters as Brian ("./Event 4000 6 99 1 1000 2"), I get

ZLIB level-6: 14.4 MB/s
Original LZMA level-6: 2.3 MB/s
Modified LZMA level-6: 3.0 MB/s

With 100 tracks per event (and hence smaller baskets) the improvement is from 2.2 MB/s to 3.9 MB/s.

This change should be fully transparent and backwards compatible.

bbockelm · 2016-07-13T15:32:08Z

@dan131riley - is there a similar interface for zlib?

dan131riley · 2016-07-13T16:04:19Z

@bbockelm

deflateInit2() can be used to set the size of the window and hash table for zlib

http://www.zlib.net/manual.html#Advanced

The default values for zlib give 128KiB for the sliding window and 128KiB for hash table, much more modest than the LZMA sizes. zlib memory initialization doesn't show up in igprof output for cmsRun, so tuning it isn't likely to make a very big difference.

bbockelm · 2016-07-13T16:05:51Z

(Probably worth at least the test!)

bbockelm · 2016-07-13T16:06:40Z

Oh - what are the details of the job that improved? Was it simply reading & rewriting the file? Or part of a fixed workflow?

dan131riley · 2016-07-13T16:25:22Z

The job was full RECO with 8 threads, reading raw data and writing RECO, AOD, MINIAOD and DQMIO. Details are in Slava's slides from yesterday, but it is a generic enough RECO job that it should show up in any standard RECO workflow writing AOD and MINIAOD. The effect on cmsRun timing was definitely exaggerated by having 8 threads, as the occasional long pauses from LZMA sometimes blocked the other threads too (the job time speedups I quoted were wall clock, not cpu).

bbockelm · 2016-07-13T16:35:28Z

yeah, that makes sense. There's a reasonable chance that a zlib improvement would show up in the ROOT-based benchmarks but not in CMSSW.

FWIW - Chris and I have gone over various mechanisms to allow other threads to progress while the call to deflate is in progress. Haven't yet figured out anything that doesn't cause deadlocks...

pcanal · 2016-07-13T20:48:22Z

Pushed into the master. Thanks.

adjust dictionary size downward for small baskets

529abff

pcanal closed this Jul 13, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Reduce LZMA dictionary size for small baskets #194

Reduce LZMA dictionary size for small baskets #194

Uh oh!

dan131riley commented Jul 13, 2016

Uh oh!

bbockelm commented Jul 13, 2016

Uh oh!

dan131riley commented Jul 13, 2016

Uh oh!

bbockelm commented Jul 13, 2016

Uh oh!

bbockelm commented Jul 13, 2016

Uh oh!

dan131riley commented Jul 13, 2016

Uh oh!

bbockelm commented Jul 13, 2016

Uh oh!

pcanal commented Jul 13, 2016

Uh oh!

Uh oh!

Reduce LZMA dictionary size for small baskets #194

Reduce LZMA dictionary size for small baskets #194

Uh oh!

Conversation

dan131riley commented Jul 13, 2016

Uh oh!

bbockelm commented Jul 13, 2016

Uh oh!

dan131riley commented Jul 13, 2016

Uh oh!

bbockelm commented Jul 13, 2016

Uh oh!

bbockelm commented Jul 13, 2016

Uh oh!

dan131riley commented Jul 13, 2016

Uh oh!

bbockelm commented Jul 13, 2016

Uh oh!

pcanal commented Jul 13, 2016

Uh oh!

Uh oh!