Skip to content

[Feature Request] Star Tree File Formats #14837

Closed
@sarthakaggarwal97

Description

Is your feature request related to a problem? Please describe

This issue is to discuss file formats to store the star trees and its associated meta. There could be multiple implementations of composite index, star-tree being one of them.

Required Files:

  1. Composite Index Metadata (.cim) ~ This file will store the metadata related to the Composite Index. This will primarily used to initialize the meta around star tree, and give the offsets to read the respective star tree.
  2. Composite Index Data (.cid) ~ This file will store the actual Star Tree data structure. The Star Tree data will be serialized and stored in this file.
  3. Composite Index Data Doc Values (.cidvd) ~ to store doc values of the star tree dimensions and metrics
  4. Composite Index Metadata Doc Values (.cidvm) ~ to store doc values metadata

Note: These files are extensible to store as many data structures as possible while not limiting itself to star tree. The idea is, if a new data structure comes based on composite index, we would be able to store it.

Composite Index Meta (cim)

Header

  1. Composite Index Marker
  2. Version
  3. Composite Index Field Name
  4. Composite Index Field Type (here Star Tree)

Metadata

  1. Number of dimensions
  2. Dimension Field Names
  3. Number of metric entries (field - metric pairs)
  4. Metric Entries
  5. Segment Aggregated Document Count
  6. Max Leaf Docs
  7. Number of skip star node creation dimensions
  8. Skip star node creation dimensions
  9. Star Tree Build Mode (OnHeap / Offheap)
  10. Data File Pointer (where respective star-tree data is stored)
  11. Data Length (length of the star tree)

Composite Index Data: (cid)

Header

  1. Composite Index Marker
  2. Version
  3. Number of nodes

Star Node:

  1. dimension_id
  2. dimension_value
  3. start_doc_id
  4. end_doc_id
  5. aggregate_doc_id
  6. is_star
  7. first_child
  8. last_child

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Labels

Type

No type

Projects

  • Status

    ✅ Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions