-
Notifications
You must be signed in to change notification settings - Fork 623
Add documentation for star tree index feature #8598
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
95a47ac
78b4c41
0e0483a
dcf47cf
8ecd473
d8357ae
ffdc6dc
b3b5783
05edca0
69387a2
06848eb
5f51c3a
e5cf72d
47de351
759a258
db0e127
f4d3a79
b4205dd
4aea8bf
1dd9302
f7ef88f
37c6f11
704212a
fe891e6
01d1eef
521fbb0
6a5d89e
d249946
6ce9d22
c0c5ec0
e8bdea5
3e372f7
19eaad0
f98e02d
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,199 @@ | ||
| --- | ||
| layout: default | ||
| title: Star-tree | ||
| nav_order: 61 | ||
| parent: Supported field types | ||
| --- | ||
|
|
||
| # Star-tree field type | ||
|
|
||
| This is an experimental feature and is not recommended for use in a production environment. For updates on the progress of the feature or if you want to leave feedback, join the discussion on the [OpenSearch forum](https://forum.opensearch.org/). | ||
| {: .warning} | ||
|
|
||
| A [star-tree index](https://docs.pinot.apache.org/basics/indexing/star-tree-index) precomputes aggregations, accelerating the performance of aggregation queries. | ||
| If a star-tree index is configured as part of an index mapping, the star-tree index is created and maintained as data is ingested in real time. | ||
|
|
||
| OpenSearch will automatically use the star-tree index to optimize aggregations if the queried fields are part of star-tree index dimension fields and the aggregations are on star-tree index metric fields. No changes are required in the query syntax or the request parameters. | ||
|
|
||
| For more information, see [Star-tree index]({{site.url}}{{site.baseurl}}/search-plugins/star-tree-index/). | ||
|
|
||
| ## Prerequisites | ||
|
|
||
| To use a star-tree index, follow the instructions in [Enabling a star-tree index]({{site.url}}{{site.baseurl}}/search-plugins/star-tree-index#enabling-a-star-tree-index). | ||
|
|
||
| ## Limitations | ||
|
|
||
| The star-tree index feature has the following limitations: | ||
|
|
||
| - A star-tree index should only be enabled on indexes whose data is not updated or deleted because standard updates and deletions are not accounted for in a star-tree index. | ||
| - Currently, only `one` star-tree index can be created per index. Support for multiple star-trees will be added in a future version. | ||
|
|
||
| ## Examples | ||
|
|
||
| The following examples show how to use a star-tree index. | ||
|
|
||
| ### Star-tree index mappings | ||
|
|
||
natebower marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| Define star-tree index mappings in the `composite` section in `mappings`. | ||
|
|
||
| The following example API request creates a corresponding star-tree index for all `request_aggs`. To compute metric aggregations for `request_size` and `latency` fields with queries on `port` and `status` fields, configure the following mappings: | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How about following : The following example API request creates a corresponding star-tree index configuration under "all |
||
|
|
||
| ```json | ||
| PUT logs | ||
| { | ||
| "settings": { | ||
| "index.number_of_shards": 1, | ||
| "index.number_of_replicas": 0, | ||
| "index.composite_index": true | ||
| }, | ||
| "mappings": { | ||
| "composite": { | ||
| "request_aggs": { | ||
| "type": "star_tree", | ||
| "config": { | ||
| "max_leaf_docs": 10000, | ||
| "skip_star_node_creation_for_dimensions": [ | ||
| "port" | ||
| ], | ||
| "ordered_dimensions": [ | ||
| { | ||
| "name": "status" | ||
| }, | ||
| { | ||
| "name": "port" | ||
| } | ||
| ], | ||
| "metrics": [ | ||
| { | ||
| "name": "request_size", | ||
| "stats": [ | ||
| "sum", | ||
| "value_count", | ||
| "min", | ||
| "max" | ||
| ] | ||
| }, | ||
| { | ||
| "name": "latency", | ||
| "stats": [ | ||
| "sum", | ||
| "value_count", | ||
| "min", | ||
| "max" | ||
| ] | ||
| } | ||
| ] | ||
| } | ||
| } | ||
| }, | ||
| "properties": { | ||
| "status": { | ||
| "type": "integer" | ||
| }, | ||
| "port": { | ||
| "type": "integer" | ||
| }, | ||
| "request_size": { | ||
| "type": "integer" | ||
| }, | ||
| "latency": { | ||
| "type": "scaled_float", | ||
| "scaling_factor": 10 | ||
| } | ||
| } | ||
| } | ||
| } | ||
| ``` | ||
|
|
||
|
|
||
|
|
||
| ## Star-tree mapping parameters | ||
|
|
||
| Specify any star-tree configuration mapping options in the `config` section. Parameters cannot be modified without reindexing documents. | ||
|
|
||
| The star-tree `config` section supports the following property. | ||
|
|
||
| | Parameter | Required/Optional | Description | | ||
| | :--- | :--- | :--- | | ||
| | `name` | Required | The name of the field. The field name should be present in the `properties` section as part of the index `mapping`. Ensure that the `doc_values` setting is `enabled` for any associated fields. | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Confused on this - Under config , user can specify There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'll remove this. We have the definitions for |
||
|
|
||
| ### Ordered dimensions | ||
|
|
||
bharath-techie marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| The `ordered_dimensions` parameter contains fields based on which metrics will be aggregated in a star-tree index. The star-tree index will be selected for querying only if all the fields in the query are part of the `ordered_dimensions`. | ||
|
|
||
Naarcha-AWS marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| When using the `ordered_dimesions` parameter, follow these best practices: | ||
|
|
||
| - The order of dimensions matters. You can define the dimensions ordered from the highest cardinality to the lowest cardinality for efficient storage and query pruning. | ||
| - Avoid using high-cardinality fields as dimensions. High-cardinality fields adversely affect storage space, indexing throughput, and query performance. | ||
| - Currently, fields supported by the `ordered_dimensions` parameter are all [numeric field types](https://opensearch.org/docs/latest/field-types/supported-field-types/numeric/), with the exception of `unsigned_long`. For more information, see [GitHub issue #15231](https://github.com/opensearch-project/OpenSearch/issues/15231). | ||
| - Support for other field types, such as `keyword` and `ip`, will be added in future versions. For more information, see [GitHub issue #16232](https://github.com/opensearch-project/OpenSearch/issues/16232). | ||
| - A minimum of `2` and a maximum of `10` dimensions are supported per star-tree index. | ||
|
|
||
Naarcha-AWS marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| The `ordered_dimensions` parameter supports the following property. | ||
|
|
||
| | Parameter | Required/Optional | Description | | ||
| | :--- | :--- | :--- | | ||
| | `name` | Required | The name of the field. The field name should be present in the `properties` section as part of the index `mapping`. Ensure that the `doc_values` setting is `enabled` for any associated fields. | | ||
|
|
||
|
|
||
| ### Metrics | ||
|
|
||
| Configure any metric fields on which you need to perform aggregations. `Metrics` are required as part of a star-tree configuration. | ||
|
|
||
| When using `metrics`, follow these best practices: | ||
|
|
||
| - Currently, fields supported by `metrics` are all [numeric field types](https://opensearch.org/docs/latest/field-types/supported-field-types/numeric/), with the exception of `unsigned_long`. For more information, see [GitHub issue #15231](https://github.com/opensearch-project/OpenSearch/issues/15231). | ||
| - Supported metric aggregations include `Min`, `Max`, `Sum`, `Avg`, and `Value_count`. | ||
| - `Avg` is a derived metric based on `Sum` and `Value_count` and is not indexed when a query is run. The remaining base metrics are indexed. | ||
| - A maximum of `100` base metrics are supported per star-tree index. | ||
Naarcha-AWS marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| If `Min`, `Max`, `Sum`, and `Value_count` are defined as `metrics` for each field, then up to 25 such fields can be configured, as shown in the following example: | ||
|
|
||
| ```json | ||
| { | ||
| "metrics": [ | ||
| { | ||
| "name": "field1", | ||
| "stats": [ | ||
| "sum", | ||
| "value_count", | ||
| "min", | ||
| "max" | ||
| ], | ||
| ..., | ||
| ..., | ||
| "name": "field25", | ||
| "stats": [ | ||
| "sum", | ||
| "value_count", | ||
| "min", | ||
| "max" | ||
| ] | ||
| } | ||
| ] | ||
| } | ||
| ``` | ||
|
|
||
|
|
||
| #### Properties | ||
|
|
||
| The `metrics` parameter supports the following properties. | ||
|
|
||
| | Parameter | Required/Optional | Description | | ||
| | :--- | :--- | :--- | | ||
| | `name` | Required | The name of the field. The field name should be present in the `properties` section as part of the index `mapping`. Ensure that the `doc_values` setting is `enabled` for any associated fields. | | ||
| | `stats` | Optional | A list of metric aggregations computed for each field. You can choose between `Min`, `Max`, `Sum`, `Avg`, and `Value Count`.<br/>Default is `Sum` and `Value_count`.<br/>`Avg` is a derived metric statistic that will automatically be supported in queries if `Sum` and `Value_Count` are present as part of metric `stats`. | ||
|
|
||
| ### Star-tree configuration parameters | ||
|
|
||
| The following parameters are optional and cannot be modified following index creation. | ||
|
|
||
| | Parameter | Description | | ||
| | :--- | :--- | | ||
Naarcha-AWS marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| | `max_leaf_docs` | The maximum number of star-tree documents that a leaf node can point to. After the maximum number of documents is reached, the nodes will be split based on the value of the next dimension. Default is `10000`. A lower value will use more storage but result in faster query performance. Inversely, a higher value will use less storage but result in slower query performance. For more information, see [Star-tree indexing structure]({{site.url}}{{site.baseurl}}/search-plugins/star-tree-index/#star-tree-index-structure). | | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
How about |
||
| | `skip_star_node_creation_for_dimensions` | A list of dimensions for which a star-tree index will skip star node creation. When `true`, this reduces storage size at the expense of query performance. Default is `false`. For more information about star nodes, see [Star-tree indexing structure]({{site.url}}{{site.baseurl}}/search-plugins/star-tree-index/#star-tree-index-structure). | | ||
|
|
||
| ## Supported queries and aggregations | ||
|
|
||
| For more information about supported queries and aggregations, see [Supported queries and aggregations for a star-tree index]({{site.url}}{{site.baseurl}}/search-plugins/star-tree-index/#supported-queries-and-aggregations). | ||
|
|
||
Naarcha-AWS marked this conversation as resolved.
Show resolved
Hide resolved
|
||
Uh oh!
There was an error while loading. Please reload this page.