Skip to content

FSTCompiler.Builder should have an option to stream the FST bytes directly to Directory #12543

Closed
@mikemccand

Description

@mikemccand

Description

[Spinoff from this comment inspired by Tantivy's FST implementation]

The building of an FST is inherently streamable: the way the FST freezes states as it processes inputs is a write-once, roughly append-only operation. Today, Lucene holds this growing byte[] entirely in RAM, and once done, writes the whole thing to disk.

Yet at search time, Lucene searches the FST off-heap, doing nearly random backwards IO through IndexInput.

Let's fix Lucene to stream the FST byte[] directly to IndexOutput? This would reduce the RAM required to build so that it is constant regardless of how large an FST you are building / how many input/output pairs you are adding to it.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions