Closed
Description
Description
[Spinoff from this comment inspired by Tantivy's FST implementation]
The building of an FST is inherently streamable: the way the FST freezes states as it processes inputs is a write-once, roughly append-only operation. Today, Lucene holds this growing byte[]
entirely in RAM, and once done, writes the whole thing to disk.
Yet at search time, Lucene searches the FST off-heap, doing nearly random backwards IO through IndexInput
.
Let's fix Lucene to stream the FST byte[]
directly to IndexOutput
? This would reduce the RAM required to build so that it is constant regardless of how large an FST you are building / how many input/output pairs you are adding to it.