-
Notifications
You must be signed in to change notification settings - Fork 480
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
“fix#829-PreAgg-field_doc_count” #839
Merged
Merged
Changes from 1 commit
Commits
Show all changes
5 commits
Select commit
Hold shift + click to select a range
f3541d3
“fix#829-PreAgg-field_doc_count”
cwillum 882ffb6
“fix#829-PreAgg-field_doc_count”
cwillum 9e6ea0b
“fix#829-PreAgg-field_doc_count”
cwillum 055af42
“fix#829-PreAgg-field_doc_count”
cwillum 9af08d2
“fix#829-PreAgg-field_doc_count”
cwillum File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -74,6 +74,98 @@ The `terms` aggregation requests each shard for its top 3 unique terms. The coor | |
|
||
This is especially true if `size` is set to a low number. Because the default size is 10, an error is unlikely to happen. If you don’t need high accuracy and want to increase the performance, you can reduce the size. | ||
|
||
### Account for pre-aggregated data | ||
|
||
While the `doc_count` field provides a representation of the number of individual documents aggregated in a bucket, the field by itself does not have a way to account for documents that store pre-aggregated data, such as `histogram`. To account for pre-aggregated data and accurately calculate the number of documents in a bucket, you can use the `_doc_count` field to add the number of documents in a single summary field. When a document includes the `_doc_count` field, all bucket aggregations recognize its value and increase the bucket `doc_count` proportionately. Keep these considerations in mind when using the `_doc_count` field: | ||
|
||
* The field does not support nested arrays; only positive integers can be used. | ||
* If a document does not contain the `_doc_count` field, aggregation uses the document to increase the count by 1. | ||
|
||
OpenSearch features that rely on an accurate document count illustrate the importance of using the `_doc_count` field. To get a better sense for how this field can support other search functionality, see [Index rollups](https://opensearch.org/docs/latest/im-plugin/index-rollups/index/), an OpenSearch feature for the Index Management plugin that stores documents with pre-aggregated data in rollup indexes. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Change "To get a better sense for" to "For information on" (or something similar). Change "Index Management" to "Index Management (IM)". |
||
{: .tip} | ||
|
||
### Example | ||
|
||
We use the [Create index](https://opensearch.org/docs/latest/opensearch/rest-api/index-apis/create-index/) API to create an index with mappings that include: | ||
* `example_histogram`, which stores histogram data as percentages | ||
* `example_text`, which stores the title of the histogram. | ||
|
||
```json | ||
PUT example_index | ||
{ | ||
"mappings" : { | ||
"properties" : { | ||
"example_histogram" : { | ||
"type" : "histogram" | ||
}, | ||
"example_text" : { | ||
"type" : "keyword" | ||
} | ||
} | ||
} | ||
} | ||
``` | ||
|
||
The [index](https://opensearch.org/docs/latest/opensearch/index-data/) API is then used to store pre-aggregated data for `histogram_1` and `histogram_2`. | ||
|
||
```json | ||
PUT example_index/_doc/1 | ||
{ | ||
"example_text" : "histogram_1", | ||
"example_histogram" : { | ||
"values" : [0.1, 0.2, 0.3, 0.4, 0.5], | ||
"counts" : [4, 8, 22, 14, 5] | ||
}, | ||
"_doc_count": 47 | ||
} | ||
|
||
PUT example_index/_doc/2 | ||
{ | ||
"example_text" : "histogram_2", | ||
"example_histogram" : { | ||
"values" : [0.1, 0.25, 0.35, 0.4, 0.45, 0.5], | ||
"counts" : [9, 19, 6, 5, 8, 3] | ||
}, | ||
"_doc_count": 71 | ||
} | ||
``` | ||
Run `terms` aggregation on example_index. | ||
|
||
```json | ||
GET /_search | ||
{ | ||
"aggs" : { | ||
"histogram_titles" : { | ||
"terms" : { "field" : "example_text" } | ||
} | ||
} | ||
} | ||
``` | ||
|
||
The request provides the following response: | ||
|
||
```json | ||
{ | ||
... | ||
"aggregations" : { | ||
"histogram_titles" : { | ||
"doc_count_error_upper_bound": 0, | ||
"sum_other_doc_count": 0, | ||
"buckets" : [ | ||
{ | ||
"key" : "histogram_2", | ||
"doc_count" : 71 | ||
}, | ||
{ | ||
"key" : "histogram_1", | ||
"doc_count" : 47 | ||
} | ||
] | ||
} | ||
} | ||
} | ||
``` | ||
|
||
## Multi-terms | ||
|
||
Similar to the `terms` bucket aggregation, you can also search for multiple terms using the `multi_terms` aggregation. Multi-terms aggregations are useful when you need to sort by document count, or when you need to sort by a metric aggregation on a composite key and get the top `n` results. For example, you could search for a specific number of documents (e.g., 1000) and the number of servers per location that show CPU usage greater than 90%. The top number of results would be returned for this multi-term query. | ||
|
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we rephrase this to simplify?
While the
doc_count
field represents the number of individual documents aggregated in a bucket, the field by itself does not account for documents that store pre-aggregated data, such ashistogram
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also,
When a document includes the
_doc_count
field, all bucket aggregations increase the bucketdoc_count
by the value of_doc_count
.