Skip to content

Commit 10da98b

Browse files
bharath-techieNaarcha-AWSnatebower
authored andcommitted
Add documentation for star tree index feature (opensearch-project#8598)
* Adding documentation for star tree index feature Signed-off-by: Bharathwaj G <bharath78910@gmail.com> * addressing comments Signed-off-by: Bharathwaj G <bharath78910@gmail.com> * addressing comments Signed-off-by: Bharathwaj G <bharath78910@gmail.com> * fixes and addressing comments Signed-off-by: Bharathwaj G <bharath78910@gmail.com> * addressing comments Signed-off-by: Bharathwaj G <bharath78910@gmail.com> * addressing comments Signed-off-by: Bharathwaj G <bharath78910@gmail.com> * addressing comments Signed-off-by: Bharathwaj G <bharath78910@gmail.com> * fixing json Signed-off-by: Bharathwaj G <bharath78910@gmail.com> * fixing json Signed-off-by: Bharathwaj G <bharath78910@gmail.com> * addressing comments Signed-off-by: Bharathwaj G <bharath78910@gmail.com> * addressing comments Signed-off-by: Bharathwaj G <bharath78910@gmail.com> * Add edits for star tree field page Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Add index edit Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Update improving-search-performance.md Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Update star-tree-index.md Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Update star-tree.md Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Apply suggestions from code review Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Apply suggestions from code review Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Apply suggestions from code review Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Apply suggestions from code review Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Update _field-types/supported-field-types/star-tree.md Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Apply suggestions from code review Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Nathan Bower <nbower@amazon.com> Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Nathan Bower <nbower@amazon.com> Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Nathan Bower <nbower@amazon.com> Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Nathan Bower <nbower@amazon.com> Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Apply suggestions from code review Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Nathan Bower <nbower@amazon.com> Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Update star-tree-index.md Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> --------- Signed-off-by: Bharathwaj G <bharath78910@gmail.com> Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Co-authored-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Co-authored-by: Nathan Bower <nbower@amazon.com> Signed-off-by: Eric Pugh <epugh@opensourceconnections.com>
1 parent c7d715a commit 10da98b

File tree

5 files changed

+393
-1
lines changed

5 files changed

+393
-1
lines changed

_field-types/supported-field-types/index.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,7 @@ IP | [`ip`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/ip/):
3030
k-NN vector | [`knn_vector`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector/): Allows indexing a k-NN vector into OpenSearch and performing different kinds of k-NN search.
3131
Percolator | [`percolator`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/percolator/): Specifies to treat this field as a query.
3232
Derived | [`derived`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/derived/): Creates new fields dynamically by executing scripts on existing fields.
33+
Star-tree | [`star_tree`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/star-tree/): Precomputes aggregations and stores them in a [star-tree index](https://docs.pinot.apache.org/basics/indexing/star-tree-index), accelerating the performance of aggregation queries.
3334

3435
## Arrays
3536

Lines changed: 199 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,199 @@
1+
---
2+
layout: default
3+
title: Star-tree
4+
nav_order: 61
5+
parent: Supported field types
6+
---
7+
8+
# Star-tree field type
9+
10+
This is an experimental feature and is not recommended for use in a production environment. For updates on the progress of the feature or if you want to leave feedback, join the discussion on the [OpenSearch forum](https://forum.opensearch.org/).
11+
{: .warning}
12+
13+
A [star-tree index](https://docs.pinot.apache.org/basics/indexing/star-tree-index) precomputes aggregations, accelerating the performance of aggregation queries.
14+
If a star-tree index is configured as part of an index mapping, the star-tree index is created and maintained as data is ingested in real time.
15+
16+
OpenSearch will automatically use the star-tree index to optimize aggregations if the queried fields are part of star-tree index dimension fields and the aggregations are on star-tree index metric fields. No changes are required in the query syntax or the request parameters.
17+
18+
For more information, see [Star-tree index]({{site.url}}{{site.baseurl}}/search-plugins/star-tree-index/).
19+
20+
## Prerequisites
21+
22+
To use a star-tree index, follow the instructions in [Enabling a star-tree index]({{site.url}}{{site.baseurl}}/search-plugins/star-tree-index#enabling-a-star-tree-index).
23+
24+
## Limitations
25+
26+
The star-tree index feature has the following limitations:
27+
28+
- A star-tree index should only be enabled on indexes whose data is not updated or deleted because standard updates and deletions are not accounted for in a star-tree index.
29+
- Currently, only `one` star-tree index can be created per index. Support for multiple star-trees will be added in a future version.
30+
31+
## Examples
32+
33+
The following examples show how to use a star-tree index.
34+
35+
### Star-tree index mappings
36+
37+
Define star-tree index mappings in the `composite` section in `mappings`.
38+
39+
The following example API request creates a corresponding star-tree index for all `request_aggs`. To compute metric aggregations for `request_size` and `latency` fields with queries on `port` and `status` fields, configure the following mappings:
40+
41+
```json
42+
PUT logs
43+
{
44+
"settings": {
45+
"index.number_of_shards": 1,
46+
"index.number_of_replicas": 0,
47+
"index.composite_index": true
48+
},
49+
"mappings": {
50+
"composite": {
51+
"request_aggs": {
52+
"type": "star_tree",
53+
"config": {
54+
"max_leaf_docs": 10000,
55+
"skip_star_node_creation_for_dimensions": [
56+
"port"
57+
],
58+
"ordered_dimensions": [
59+
{
60+
"name": "status"
61+
},
62+
{
63+
"name": "port"
64+
}
65+
],
66+
"metrics": [
67+
{
68+
"name": "request_size",
69+
"stats": [
70+
"sum",
71+
"value_count",
72+
"min",
73+
"max"
74+
]
75+
},
76+
{
77+
"name": "latency",
78+
"stats": [
79+
"sum",
80+
"value_count",
81+
"min",
82+
"max"
83+
]
84+
}
85+
]
86+
}
87+
}
88+
},
89+
"properties": {
90+
"status": {
91+
"type": "integer"
92+
},
93+
"port": {
94+
"type": "integer"
95+
},
96+
"request_size": {
97+
"type": "integer"
98+
},
99+
"latency": {
100+
"type": "scaled_float",
101+
"scaling_factor": 10
102+
}
103+
}
104+
}
105+
}
106+
```
107+
108+
109+
110+
## Star-tree mapping parameters
111+
112+
Specify any star-tree configuration mapping options in the `config` section. Parameters cannot be modified without reindexing documents.
113+
114+
The star-tree `config` section supports the following property.
115+
116+
| Parameter | Required/Optional | Description |
117+
| :--- | :--- | :--- |
118+
| `name` | Required | The name of the field. The field name should be present in the `properties` section as part of the index `mapping`. Ensure that the `doc_values` setting is `enabled` for any associated fields.
119+
120+
### Ordered dimensions
121+
122+
The `ordered_dimensions` parameter contains fields based on which metrics will be aggregated in a star-tree index. The star-tree index will be selected for querying only if all the fields in the query are part of the `ordered_dimensions`.
123+
124+
When using the `ordered_dimesions` parameter, follow these best practices:
125+
126+
- The order of dimensions matters. You can define the dimensions ordered from the highest cardinality to the lowest cardinality for efficient storage and query pruning.
127+
- Avoid using high-cardinality fields as dimensions. High-cardinality fields adversely affect storage space, indexing throughput, and query performance.
128+
- Currently, fields supported by the `ordered_dimensions` parameter are all [numeric field types](https://opensearch.org/docs/latest/field-types/supported-field-types/numeric/), with the exception of `unsigned_long`. For more information, see [GitHub issue #15231](https://github.com/opensearch-project/OpenSearch/issues/15231).
129+
- Support for other field types, such as `keyword` and `ip`, will be added in future versions. For more information, see [GitHub issue #16232](https://github.com/opensearch-project/OpenSearch/issues/16232).
130+
- A minimum of `2` and a maximum of `10` dimensions are supported per star-tree index.
131+
132+
The `ordered_dimensions` parameter supports the following property.
133+
134+
| Parameter | Required/Optional | Description |
135+
| :--- | :--- | :--- |
136+
| `name` | Required | The name of the field. The field name should be present in the `properties` section as part of the index `mapping`. Ensure that the `doc_values` setting is `enabled` for any associated fields. |
137+
138+
139+
### Metrics
140+
141+
Configure any metric fields on which you need to perform aggregations. `Metrics` are required as part of a star-tree configuration.
142+
143+
When using `metrics`, follow these best practices:
144+
145+
- Currently, fields supported by `metrics` are all [numeric field types](https://opensearch.org/docs/latest/field-types/supported-field-types/numeric/), with the exception of `unsigned_long`. For more information, see [GitHub issue #15231](https://github.com/opensearch-project/OpenSearch/issues/15231).
146+
- Supported metric aggregations include `Min`, `Max`, `Sum`, `Avg`, and `Value_count`.
147+
- `Avg` is a derived metric based on `Sum` and `Value_count` and is not indexed when a query is run. The remaining base metrics are indexed.
148+
- A maximum of `100` base metrics are supported per star-tree index.
149+
150+
If `Min`, `Max`, `Sum`, and `Value_count` are defined as `metrics` for each field, then up to 25 such fields can be configured, as shown in the following example:
151+
152+
```json
153+
{
154+
"metrics": [
155+
{
156+
"name": "field1",
157+
"stats": [
158+
"sum",
159+
"value_count",
160+
"min",
161+
"max"
162+
],
163+
...,
164+
...,
165+
"name": "field25",
166+
"stats": [
167+
"sum",
168+
"value_count",
169+
"min",
170+
"max"
171+
]
172+
}
173+
]
174+
}
175+
```
176+
177+
178+
#### Properties
179+
180+
The `metrics` parameter supports the following properties.
181+
182+
| Parameter | Required/Optional | Description |
183+
| :--- | :--- | :--- |
184+
| `name` | Required | The name of the field. The field name should be present in the `properties` section as part of the index `mapping`. Ensure that the `doc_values` setting is `enabled` for any associated fields. |
185+
| `stats` | Optional | A list of metric aggregations computed for each field. You can choose between `Min`, `Max`, `Sum`, `Avg`, and `Value Count`.<br/>Default is `Sum` and `Value_count`.<br/>`Avg` is a derived metric statistic that will automatically be supported in queries if `Sum` and `Value_Count` are present as part of metric `stats`.
186+
187+
### Star-tree configuration parameters
188+
189+
The following parameters are optional and cannot be modified following index creation.
190+
191+
| Parameter | Description |
192+
| :--- | :--- |
193+
| `max_leaf_docs` | The maximum number of star-tree documents that a leaf node can point to. After the maximum number of documents is reached, the nodes will be split based on the value of the next dimension. Default is `10000`. A lower value will use more storage but result in faster query performance. Inversely, a higher value will use less storage but result in slower query performance. For more information, see [Star-tree indexing structure]({{site.url}}{{site.baseurl}}/search-plugins/star-tree-index/#star-tree-index-structure). |
194+
| `skip_star_node_creation_for_dimensions` | A list of dimensions for which a star-tree index will skip star node creation. When `true`, this reduces storage size at the expense of query performance. Default is `false`. For more information about star nodes, see [Star-tree indexing structure]({{site.url}}{{site.baseurl}}/search-plugins/star-tree-index/#star-tree-index-structure). |
195+
196+
## Supported queries and aggregations
197+
198+
For more information about supported queries and aggregations, see [Supported queries and aggregations for a star-tree index]({{site.url}}{{site.baseurl}}/search-plugins/star-tree-index/#supported-queries-and-aggregations).
199+

_search-plugins/improving-search-performance.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,4 +11,6 @@ OpenSearch offers several ways to improve search performance:
1111

1212
- Run resource-intensive queries asynchronously with [asynchronous search]({{site.url}}{{site.baseurl}}/search-plugins/async/).
1313

14-
- Search segments concurrently using [concurrent segment search]({{site.url}}{{site.baseurl}}/search-plugins/concurrent-segment-search/).
14+
- Search segments concurrently using [concurrent segment search]({{site.url}}{{site.baseurl}}/search-plugins/concurrent-segment-search/).
15+
16+
- Improve aggregation performance using a [star-tree index]({{site.url}}{{site.baseurl}}/search-plugins/star-tree-index/).

0 commit comments

Comments
 (0)