Skip to content

Support memory-mapped on-disk Indices #4

Open
@asg017

Description

@asg017

The underlying Faiss indicis are stored in SQLite shadow tables, which can't be mmaped with the IO_FLAG_MMAP.

One solution: Introduce a new option to store a vss0 column index on disk, allowing mmaped indices for larger-than-memory.

create virtual table articles using vss0(
  headline_embedding(1024) factory="..." on_disk=True,
  description_embedding(1024) factory="..." on_disk=True,
);

Then, your directory would look like:

$ tree .
.
├── my_data.db
├── my_data.db.vss0.articles.description_embedding.faissindex
└── my_data.db.vss0.articles.headline_embedding.faissindex

sqlite3_db_filename() would be useful here.

One problem: It's kindof nice to have all Faiss indices stored on one file in the SQLite database, and this config option would instead mean users would have to move around multiple files around instead of a single SQLite file. But since this is an "optimization" feature that's not enabled by default, I think it'll be ok.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions