Reduce FST block size for BlockTreeTermsWriter #12604

gf2121 · 2023-09-28T15:27:55Z

Description

https://blunders.io/jfr-demo/indexing-4kb-2023.09.25.18.03.36/allocations-drill-down

Nightly benchmark shows that FSTCompiler#init allocated most of the memory during indexing. This is because FSTCompiler#init will always allocate 32k bytes as we param bytesPageBits default to 15. I counted the usage of BytesStore (getPosition() when BytesStore#finish called) during the wikimediumall indexing, and the result shows that 99% FST won't even use more than 1k bytes.

BytesStore#finish called: 1000000 times

min: 1
mid: 16
avg: 64.555987
pct75: 28
pct90: 57
pct99: 525
pct999: 4957
pct9999: 29124
max: 631700

This PR proposes to reduce the block size of FST in Lucene90BlockTreeTermsWriter.

closes #12598

gf2121 · 2023-09-29T15:32:46Z

Here is the young GC statistics and allocation profile after indexingwikimedium10m (without facets and dvs)

	main	patch	diff
Time in Young Generation GC	1245	864	-30.60%
Collections	525	321	-38.86%

Baseline Allocation Profile

PERCENT       HEAP SAMPLES  STACK
39.26%        125353M       org.apache.lucene.util.fst.BytesStore#writeByte()
5.33%         17008M        org.apache.lucene.codecs.lucene90.Lucene90PostingsWriter#newTermState()
5.07%         16188M        java.util.Arrays#copyOfRange()
4.33%         13832M        java.lang.StringUTF16#compress()
3.97%         12688M        java.util.HashMap#newNode()
3.43%         10936M        org.apache.lucene.util.ByteBlockPool$DirectTrackingAllocator#getByteBlock()
3.28%         10461M        org.apache.lucene.index.FreqProxTermsWriterPerField$FreqProxPostingsArray#<init>()
2.70%         8635M         java.util.Arrays#copyOf()
2.60%         8307M         org.apache.lucene.index.ParallelPostingsArray#<init>()
2.43%         7743M         org.apache.lucene.util.BytesRef#<init>()
2.02%         6435M         org.apache.lucene.util.LongHeap#<init>()
1.94%         6202M         java.lang.String#<init>()
1.71%         5451M         org.apache.lucene.codecs.lucene90.blocktree.Lucene90BlockTreeTermsWriter$PendingTerm#<init>()
1.54%         4905M         org.apache.lucene.codecs.lucene90.Lucene90NormsProducer#getNorms()
1.50%         4779M         org.apache.lucene.codecs.lucene90.blocktree.Lucene90BlockTreeTermsWriter$TermsWriter#write()
1.26%         4025M         java.util.ArrayList#grow()
1.20%         3835M         org.apache.lucene.util.TimSorter#<init>()
0.99%         3163M         java.util.HashMap#resize()
0.97%         3089M         org.apache.lucene.util.fst.FSTCompiler#<init>()
0.77%         2464M         org.apache.lucene.util.ArrayUtil#growExact()
0.72%         2292M         org.apache.lucene.util.fst.FSTCompiler$UnCompiledNode#<init>()
0.72%         2286M         org.apache.lucene.util.fst.FSTEnum#getArc()
0.63%         1998M         org.apache.lucene.util.BytesRefHash#rehash()
0.59%         1887M         java.util.regex.Matcher#<init>()
0.58%         1848M         org.apache.lucene.codecs.lucene90.PForUtil#encode()
0.55%         1758M         java.text.CalendarBuilder#<init>()
0.52%         1674M         java.text.SimpleDateFormat#subParse()
0.42%         1334M         org.apache.lucene.codecs.CompetitiveImpactAccumulator#getCompetitiveFreqNormPairs()
0.39%         1252M         java.lang.StringBuffer#toString()
0.39%         1251M         java.text.DecimalFormat#parse()

Candidate Allocation Profile

PERCENT       HEAP SAMPLES  STACK
8.56%         16863M        org.apache.lucene.codecs.lucene90.Lucene90PostingsWriter#newTermState()
8.27%         16280M        java.util.Arrays#copyOfRange()
6.89%         13577M        java.lang.StringUTF16#compress()
6.31%         12423M        java.util.HashMap#newNode()
5.59%         11007M        org.apache.lucene.index.FreqProxTermsWriterPerField$FreqProxPostingsArray#<init>()
5.40%         10633M        org.apache.lucene.util.ByteBlockPool$DirectTrackingAllocator#getByteBlock()
4.39%         8637M         org.apache.lucene.index.ParallelPostingsArray#<init>()
4.13%         8129M         java.util.Arrays#copyOf()
3.61%         7117M         org.apache.lucene.util.BytesRef#<init>()
3.55%         6985M         org.apache.lucene.util.LongHeap#<init>()
3.17%         6236M         java.lang.String#<init>()
2.84%         5584M         org.apache.lucene.codecs.lucene90.blocktree.Lucene90BlockTreeTermsWriter$PendingTerm#<init>()
2.59%         5101M         org.apache.lucene.codecs.lucene90.blocktree.Lucene90BlockTreeTermsWriter$TermsWriter#write()
2.52%         4970M         org.apache.lucene.codecs.lucene90.Lucene90NormsProducer#getNorms()
2.10%         4131M         org.apache.lucene.util.fst.BytesStore#writeByte()
1.92%         3788M         org.apache.lucene.util.TimSorter#<init>()
1.72%         3382M         java.util.HashMap#resize()
1.54%         3041M         org.apache.lucene.util.fst.FSTCompiler#<init>()
1.31%         2574M         org.apache.lucene.util.ArrayUtil#growExact()
1.21%         2374M         org.apache.lucene.util.fst.FSTCompiler$UnCompiledNode#<init>()
1.04%         2049M         org.apache.lucene.util.fst.FSTEnum#getArc()
0.98%         1929M         java.text.CalendarBuilder#<init>()
0.98%         1926M         java.util.ArrayList#grow()
0.98%         1926M         org.apache.lucene.util.BytesRefHash#rehash()
0.85%         1677M         org.apache.lucene.codecs.lucene90.PForUtil#encode()
0.84%         1647M         java.text.SimpleDateFormat#subParse()
0.78%         1543M         java.util.regex.Matcher#<init>()
0.71%         1402M         java.text.DecimalFormat#parse()
0.70%         1376M         org.apache.lucene.codecs.CompetitiveImpactAccumulator#getCompetitiveFreqNormPairs()
0.68%         1338M         sun.util.locale.provider.DateFormatSymbolsProviderImpl#getInstance()

gf2121 · 2023-10-02T09:14:46Z

Hi @jpountz ! Would you please take a look at this PR when you have time? Looking forward to getting your suggestions on this topic ~

jpountz · 2023-10-02T09:21:42Z

Oh, interesting find, it makes sense to me but I'm not the most familiar one with this piece of code. @mikemccand or @s1monw what do you think?

s1monw · 2023-10-02T13:15:45Z

This change makes sense to me. @mikemccand WDYT

mikemccand

The only possible risk I see is that when writing truly massive segments with many terms in a given field, a large FST will now require 32X the number of blocks to hold it. This larger array will add GC cost, slow down writing a bit, etc.

But I think that's an OK tradeoff: writing such large segments is already massively more costly than writing tiny segments, so in proportion that added cost is acceptable.

We are not really at risk of exhausting the max size of a java array -- that'd happen if one tried to build a ~2 TB FST, which is not even really feasible today since the FST is still fully heap resident at write time.

Thanks @gf2121!

mikemccand · 2023-10-03T10:07:10Z

lucene/CHANGES.txt

@@ -163,6 +163,8 @@ Optimizations
 * GITHUB#12382: Faster top-level conjunctions on term queries when sorting by
  descending score. (Adrien Grand)

+* GITHUB#12604: Reduce block size of FST BytesStore in BlockTreeTermsWriter. (Guo Feng)


Could you add some words that the end user could understand -- maybe reducing GC load during indexing or so?

s1monw · 2023-10-03T10:23:14Z

@mikemccand maybe we can tradeoff here between segments we write the first time ie through IW and segments we write caused by a merge? it might mitigate your concerns.

gf2121 · 2023-10-03T16:44:03Z

Thanks for all review and suggestions here!

@mikemccand maybe we can tradeoff here between segments we write the first time ie through IW and segments we write caused by a merge? it might mitigate your concerns.

Thanks @s1monw , I really like the idea that we can estimate the page size before building the FST!

A tiny concern is that we could probably build a big FST if IW has a large flush buffer, or we could build small FST when tiny segments merge. This commit tries a way to estimate a more accurate size of the FST.

I did the similar count of BytesStore usage for wikimediumall again:

FST built 1000000 times

min ProfileInfo{bytesUsed=1, estimateSize=0, pageBits=6, pageNum=1}
pct50 ProfileInfo{bytesUsed=16, estimateSize=5, pageBits=6, pageNum=1}
pct75 ProfileInfo{bytesUsed=23, estimateSize=17, pageBits=6, pageNum=1}
pct90 ProfileInfo{bytesUsed=43, estimateSize=44, pageBits=6, pageNum=1}
pct99 ProfileInfo{bytesUsed=539, estimateSize=563, pageBits=10, pageNum=1}
pct999 ProfileInfo{bytesUsed=5026, estimateSize=4641, pageBits=13, pageNum=1}
pct9999 ProfileInfo{bytesUsed=32524, estimateSize=31522, pageBits=15, pageNum=1}
max ProfileInfo{bytesUsed=630865, estimateSize=610855, pageBits=15, pageNum=20}

I also get the percentile info of pageNum. It shows that we are using <= 3 page for 99.99% FSTs, and at most using 20 page for the largest FST. We are doing as good as before for large BytesStore now :)

FST built 1000000 times

min ProfileInfo{bytesUsed=10, estimateSize=3, pageBits=6, pageNum=1}
pct50 ProfileInfo{bytesUsed=347, estimateSize=292, pageBits=9, pageNum=1}
pct75 ProfileInfo{bytesUsed=21, estimateSize=4, pageBits=6, pageNum=1}
pct90 ProfileInfo{bytesUsed=22, estimateSize=8, pageBits=6, pageNum=1}
pct99 ProfileInfo{bytesUsed=37, estimateSize=37, pageBits=6, pageNum=1}
pct999 ProfileInfo{bytesUsed=71, estimateSize=61, pageBits=6, pageNum=2}
pct9999 ProfileInfo{bytesUsed=130, estimateSize=62, pageBits=6, pageNum=3}
max ProfileInfo{bytesUsed=630865, estimateSize=610855, pageBits=15, pageNum=20}

mikemccand

I like this approach! I had forgotten that BlockTree creates little baby FSTs all the way up the prefix tree, so it's creating (and discarding) many FSTs in order to build the final FST, making this change all the more important!

I just left a tiny comment about the CHANGES entry and a few "later" comments -- maybe open follow-on issues for those ideas?

mikemccand · 2023-10-03T19:31:10Z

lucene/CHANGES.txt

@@ -163,6 +163,9 @@ Optimizations
 * GITHUB#12382: Faster top-level conjunctions on term queries when sorting by
  descending score. (Adrien Grand)

+* GITHUB#12604: Estimate the block size of FST BytesStore in BlockTreeTermsWriter
+  to reducing GC load during indexing. (Guo Feng)


reducing -> reduce

mikemccand · 2023-10-03T19:39:28Z

.../core/src/java/org/apache/lucene/codecs/lucene90/blocktree/Lucene90BlockTreeTermsWriter.java

@@ -490,10 +491,22 @@ public void compileIndex(
        }
      }

+      long estimateSize = prefix.length;


Too bad we don't have a writer that uses tiny (like 8 bytes) block at first, but doubles size for each new block (16 bytes, 32 bytes next, etc.). Then we would naturally use log(size) number of blocks without over-allocating.

But then reading bytes is a bit tricky because we'd need to take discrete log (base 2) of the address. Maybe it wouldn't be so bad -- we could do this with Long.numberOfLeadingZeros maybe? But that's a bigger change ... we can do this separately/later.

mikemccand · 2023-10-03T19:40:40Z

.../core/src/java/org/apache/lucene/codecs/lucene90/blocktree/Lucene90BlockTreeTermsWriter.java

@@ -490,10 +491,22 @@ public void compileIndex(
        }
      }

+      long estimateSize = prefix.length;
+      for (PendingBlock block : blocks) {
+        if (block.subIndices != null) {


We also should really explore the TODO above to write vLong in opposite byte order -- this might save quite a bit of storage in the FST since outputs would share more prefixes. Again, separate issue 😀

s1monw

LGTM too

jpountz · 2023-10-10T08:59:42Z

It looks like there's a bit less Young GC in nightly benchmarks since this change was merged, from 6-8 seconds, to consistently below 6s. I pushed an annotation.

gf2121 · 2023-10-10T09:16:15Z

@jpountz Thanks for annotating !

I also checked blunders.io for more details:

GC pause time: 6.38% -> 5.91%
Allocation Rate: 3.7 GiB/s -> 2.6 GiB/s
much more less FST#init in allocation flame graph :)

Before Patch https://blunders.io/jfr-demo/indexing-4kb-2023.10.03.18.03.47/allocations-drill-down
After Patch https://blunders.io/jfr-demo/indexing-4kb-2023.10.04.18.03.40/allocations-drill-down

gf2121 added 3 commits September 28, 2023 23:15

reduce fst block size

edfd968

add CHANGES

43e67cc

fix

521d693

gf2121 requested a review from jpountz October 2, 2023 09:15

mikemccand approved these changes Oct 3, 2023

View reviewed changes

estimate bit

1eb6a99

update changes

a4a4964

gf2121 requested a review from mikemccand October 3, 2023 16:51

mikemccand approved these changes Oct 3, 2023

View reviewed changes

s1monw approved these changes Oct 3, 2023

View reviewed changes

fix change

21b6858

This was referenced Oct 4, 2023

Make FST BytesStore grow smoothly #12619

Open

Write VLong in opposite order for better outputs sharing in the FST #12620

Closed

gf2121 merged commit 9605289 into apache:main Oct 4, 2023
4 checks passed

asfgit pushed a commit that referenced this pull request Oct 4, 2023

Reduce FST block size for BlockTreeTermsWriter (#12604)

cccaa7e

This was referenced Oct 4, 2023

Reduce FST block size for BlockTreeTermsWriter (#12604) apache/lucene-solr#2677

Merged

FST#Compiler allocates too much memory #12598

Closed

s1monw pushed a commit to s1monw/lucene that referenced this pull request Oct 10, 2023

Reduce FST block size for BlockTreeTermsWriter (apache#12604)

edfa815

gf2121 added this to the 9.9.0 milestone Oct 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce FST block size for BlockTreeTermsWriter #12604

Reduce FST block size for BlockTreeTermsWriter #12604

gf2121 commented Sep 28, 2023

gf2121 commented Sep 29, 2023

gf2121 commented Oct 2, 2023

jpountz commented Oct 2, 2023

s1monw commented Oct 2, 2023

mikemccand left a comment

mikemccand Oct 3, 2023

s1monw commented Oct 3, 2023

gf2121 commented Oct 3, 2023 •

edited

Loading

mikemccand left a comment

mikemccand Oct 3, 2023

mikemccand Oct 3, 2023

mikemccand Oct 3, 2023

s1monw left a comment

jpountz commented Oct 10, 2023

gf2121 commented Oct 10, 2023

Reduce FST block size for BlockTreeTermsWriter #12604

Reduce FST block size for BlockTreeTermsWriter #12604

Conversation

gf2121 commented Sep 28, 2023

Description

gf2121 commented Sep 29, 2023

gf2121 commented Oct 2, 2023

jpountz commented Oct 2, 2023

s1monw commented Oct 2, 2023

mikemccand left a comment

Choose a reason for hiding this comment

mikemccand Oct 3, 2023

Choose a reason for hiding this comment

s1monw commented Oct 3, 2023

gf2121 commented Oct 3, 2023 • edited Loading

mikemccand left a comment

Choose a reason for hiding this comment

mikemccand Oct 3, 2023

Choose a reason for hiding this comment

mikemccand Oct 3, 2023

Choose a reason for hiding this comment

mikemccand Oct 3, 2023

Choose a reason for hiding this comment

s1monw left a comment

Choose a reason for hiding this comment

jpountz commented Oct 10, 2023

gf2121 commented Oct 10, 2023

gf2121 commented Oct 3, 2023 •

edited

Loading