Skip to content

Consider allowing TreeFile to use external blob storage #46

@ecton

Description

@ecton

Currently, TreeFiles store blobs/chunks in the same file that nodes are written to. When compacting a database, all of the blobs that are alive must be transferred to the new file.

Over time, this is a lot of wasted IOPS if your application is never deleting data. In this day and age, a common way to operate is to "store everything" and only delete once it becomes a problem.

The main idea of this issue is simple:

  • Change all of the tree file operations to use a new trait ChunkStorage to write non-node chunks. This may require adding a new parameter to each operation.
  • Allow specifying a ChunkStorage implementation when creating a TreeFile/Roots instance.
  • If no ChunkStorage is specified, chunks should be written in-line like they are today.
  • The ChunkStorage implementation can use 63 bits of information to note where the chunk is stored. The 64th bit will be used by Nebari to note that the chunk is stored externally.

The hard part will be compaction. Nebari doesn't keep track of chunks. The way compaction works currently is data is copied when its referenced, otherwise its skipped. To achieve the goal of "not rewriting everything", the ChunkStorage implementation needs to receive enough information to be able to determine on its own how to compact itself, or opt not to. At this time, I'm not sure of a good way to solve this.

More intelligent compaction can be achieved by using TreeFile to implement ChunkStorage. While this causes extra overhead, the TreeFile could return unique "chunk IDs" that are stable, but the actual location on disk can be moved around. This is where the idea of "tiered" storage comes in, as this TreeFile could do many things including:

  • Embed statistics about read frequency of each key, allowing compaction to group frequently used data closer together, or moving infrequently accessed keys to slower storage.
  • Subdivide storage into segments that can be defragmented independently.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    Status

    Todo

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions