Skip to content

FSTCompiler.Builder should have an option to stream the FST bytes directly to Directory #12543

@mikemccand

Description

@mikemccand

Description

[Spinoff from this comment inspired by Tantivy's FST implementation]

The building of an FST is inherently streamable: the way the FST freezes states as it processes inputs is a write-once, roughly append-only operation. Today, Lucene holds this growing byte[] entirely in RAM, and once done, writes the whole thing to disk.

Yet at search time, Lucene searches the FST off-heap, doing nearly random backwards IO through IndexInput.

Let's fix Lucene to stream the FST byte[] directly to IndexOutput? This would reduce the RAM required to build so that it is constant regardless of how large an FST you are building / how many input/output pairs you are adding to it.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions