Skip to content

Epic: Move-Stable Row Ids #2307

@wjones127

Description

@wjones127

Motivation

When we compaction data files, the row id changes. This causes us to need to update the index files whenever we compact. When the index files are updated, it invalidates them in the cache, degrading query performance. If row ids were stable when rows were moved, this would not happen.

Scope

This epic makes row ids stable after moving. It does not make them stable after updates. Rows that are updated will be deleted and appended under new ids.

A future epic will cover "primary keys", which will be the point at which row ids will be stable after updates in addition to moves. This is kept out of scope for now to keep the workload of this manageable.

Design

In very simple terms:

  1. Add row ids as auto-incrementing u64 id. The manifest will track max_row_id and assign in similar process as fragment ids are assigned during the commit loop.
  2. Each fragment metadata will contain a small row id index. This index maps from row id to row address. (Row address is what we currently call _rowid.) In most cases, such as after an append, this will be a simple range of values (max_row_id + 1)..(physical_rows + max_row_id + 1).
  3. Deletion files will be superceded by tombstones contained in the row id index. This cuts down on total number of files to manage.
  4. A new feature flag will be introduced to make sure older readers don't try to interpret these new row ids.

Plan

The following tasks have been moved into the Primary Keys epic:

  • Follow ups for stabilization
    • Replace custom bitmap implementation
    • Finalize serialization format
    • Optimize row id access given real benchmarks
  • External files and cleanup
    • Write out external files if large enough
    • Cleanup implementation

Week of August 12

Metadata

Metadata

Assignees

Labels

epicA collection of issues with a certain theme

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions