Skip to content

Vortex Layouts #1676

Closed
Closed
@gatesn

Description

[edit] I've hijacked the "Refactor VortexFileWriter" issue to cover the broader work towards vortex layouts.

  • Add better Debug impl for LayoutData that shows the full tree. Requires a visitor.
  • Expression splitting (so we can push-down expressions to struct fields and recombine the results)
  • Layout statistics for a given row mask, including whether the results are exact / inexact.
  • Change the segment read/write APIs to use aligned ByteBuffer. Have ArrayParts choose a reasonable alignment.
  • Configure the writer to include/exclude padding
  • Make it easy to run segment reader against memory mapped file
  • Cache segments
  • Coalesce segments
  • Add option to dedupe segments at write time
  • Add option to LZ4 compress flatbuffers

The description here is WIP

Vortex Layouts

Vortex Layouts should be seen as symmetrical to Vortex Arrays, except that the data is stored externally and they can be lazily queried with push-down pruning. They are fully independent of the storage system, requiring only a key(u32)/value(bytes) API.

A secondary benefit of separating the layout logic from storage, is that we remove all I/O from this code path allowing us to have much better control over when, where and how we interact with I/O (as well as removing all traces of async Rust from the layout logic).

Built-in Layouts

Our initial prototype ships with three layouts:

  • Flat - an atomic array (of any DType). The serialized form does not include the dtype (to prevent duplication).
  • Chunked - row-based partitioning of an array.
  • Struct - col-based partitioning of a struct array.

Additional layouts for the future:

  • Dictionary - pull out a shared dictionary across another child layout, e.g. store values as flat layout, store codes in a chunked layout to use one dictionary across all chunks of a column.
  • FlatStruct - same as Struct, except with flattened nested fields.

As with everything in Vortex, these will be extensible by consumers of the Rust library.

Layout Strategies

The power of layouts is in how we choose them. I think by default Vortex will ship with a StructOfChunks layout strategy that first splits into columns, then each column is independently chunked based on a target byte size (rather than row-count). This can be pretty useful when storing columns with large values. It means the returned batches are sized based on the data that's scanned, and doesn't suffer the write-time skew that e.g. Parquet might introduce with its "ChunksOfStruct" (row-groups of columns) strategy.

A LayoutStrategy is fed a series of ArrayData batches that represent a row-wise chunk of an array. Each strategy may manipulate, buffer, split, this batch however it likes and pass it to some child strategies, eventually culminating in a FlatStrategy that simply serializes the data. At the end of this process, the layout strategy returns the LayoutData (remember the analogy to ArrayData?).

Layout Scan

Scanning a layout is sort of analogous to an array compute function. It's probably the only "compute function" we support on layouts for a while (maybe we support sum/min/max etc?).

From a LayoutData, we construct a LayoutScan, and from this we construct a RangeScan per horizontal "split" of the file that we wish to read. This two-level approach is so for a single scan operator layouts can share some state across ranges, e.g. DictLayout would want to share the canonicalized values array.


The old issue:

Tracking a bunch of breaking changes to the format and refactoring of the writer

  • VortexFileWriter to pass chunks into an abstract LayoutWriter.
    • We ship a default LayoutWriter that is struct-of-chunked.
  • Impl LayoutWriter for struct, chunked, flat, and later dict layouts.
  • Layout flat buffer to hold BlockIds instead of offset/length. This allows Layout to be abstract over the storage format and not assume linear bytes (e.g. can store in an arbitrary block device).
  • Postscript to point to DType + FileLayout and not assume contiguous bytes.
  • Ensure we don't write extra padding / framing, e.g. we sometimes use IPCMessage and MessageWriter for writing to the file, even though this adds framing for a streaming format.
  • Allow configurable padding in the file to support mem-mapping with zero-copy.
  • Allow configurable compression for buffers.

Metadata

Assignees

Labels

epicwire-breakIncludes a break to the serialized IPC or file format

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions