-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FST#Compiler allocates too much memory #12598
Comments
I did an experiment: Index 5M random BytesRefs and count the byte usage when
While the bytesrefs are random, it may share little prefix and suffix, I tried to mock some common prefix/suffix for them like:
And here is the result:
We will allocate a 32kb while 99% cases only need 5kb. These results somewhat matches the allocation profile that we rarely need a second block in |
I get similar statistics for wikimediumall and here are the results when
It seems 1k bytes per block is enough here. 99% cases can be covered by single block and we at most need 600+ blocks for single FST. |
Thanks @gf2121 -- this is a great discovery (and thank you https://blunders.io for the awesome integrated profiling in Lucene's nightly benchmarks!).
It's always dangerous to test on random data (even random data that attempts to simulate realistic patterns): you'll draw random conclusions yet act on them as if they were trustworthy! Better to test by indexing true terms that show up in a real-world corpus. Still, I think this issue is indeed compelling -- most FSTs in practice are likely smaller than 32 KB. But one issue to be aware of is shrinking the block size will worsen this issue, where performance of building & immediately using (in HEAP, not via save/load) an FST has poor performance when the FST needs more than one block. Really we need to get away from FST attempting to re-implement an in-memory filesystem, badly! We've seen many problems from this over the years... like poor performance reading bytes in reverse, this issue (over-allocating blocks), the above linked issue (poor performance using a just-built FST). The FST compiler really ought to stream its bytes directly to But until we fix this "correctly" (stop trying to be a |
FWIW I was looking into this a bit when I saw this issue come in. Specifically on Solr 8.11, but as far as I can tell the changes in #12604 apply to 8.x as well. In a 30s async profiler memory allocation profile, can see that ~10% (4% on left and then ~6% near the middle) of total is this same FST byte array. I put up a backport PR for lucene-solr branch_8_11 - apache/lucene-solr#2677 |
Description
https://blunders.io/jfr-demo/indexing-4kb-2023.09.25.18.03.36/allocations-drill-down
The allocation profile for nightly indexing benchmark shows that
FST#ByteStore
occupies a rather high ratio because we always construct aFST#Compiler
each time callingLucene90BlockTreeTermsWriter$TermsWriter#writeBlocks
, which will allocate a new 32KBbyte[]
.As the profile shows, most of memory was allocated from
FST#init
, which meansBytesStore
rarely needs another page to store bytes. So I suspect we are using a too large page here. Maybe we should decrease the page size to 8k or even smaller ?The text was updated successfully, but these errors were encountered: