-
Notifications
You must be signed in to change notification settings - Fork 480
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
“fix#829-PreAgg-field_doc_count” #839
Conversation
Signed-off-by: cwillum <cwmmoore@amazon.com>
@petardz Thanks for your input on documentation for this issue. Could you review this content for technical accuracy? Thanks. |
_opensearch/bucket-agg.md
Outdated
@@ -74,6 +74,98 @@ The `terms` aggregation requests each shard for its top 3 unique terms. The coor | |||
|
|||
This is especially true if `size` is set to a low number. Because the default size is 10, an error is unlikely to happen. If you don’t need high accuracy and want to increase the performance, you can reduce the size. | |||
|
|||
### Account for pre-aggregated data | |||
|
|||
While the `doc_count` field provides a representation of the number of individual documents aggregated in a bucket, the field by itself does not have a way to account for documents that store pre-aggregated data, such as `histogram`. To account for pre-aggregated data and accurately calculate the number of documents in a bucket, you can use the `_doc_count` field to add the number of documents in a single summary field. When a document includes the `_doc_count` field, all bucket aggregations recognize its value and increase the bucket `doc_count` proportionately. Keep these considerations in mind when using the `_doc_count` field: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we rephrase this to simplify?
While the doc_count
field represents the number of individual documents aggregated in a bucket, the field by itself does not account for documents that store pre-aggregated data, such as histogram
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also,
When a document includes the _doc_count
field, all bucket aggregations increase the bucket doc_count
by the value of _doc_count
.
@cwillum: Based on the conversation from this PR(opensearch-project/OpenSearch#3985) it seems like this code wasn't included in OpenSearch 1.0. Furthermore, the PR only contains a "backport 2.x" label, meaning that the change will be backported to the latest minor version of OpenSearch. We shouldn't need to backport on this one. Once the PR is approved and gone through editorial review, add the "5- Done and waiting to merge" label to this PR. |
Signed-off-by: cwillum <cwmmoore@amazon.com>
@cwillum I'm sorry, link with example I provided earlier is not compatible with opensearch. Here is a one synthetic example:
Response:
Notice how _doc_count was used when calculating doc_count of buckets |
@cwillum: please use the following example:
Response:
|
Signed-off-by: cwillum <cwmmoore@amazon.com>
Signed-off-by: cwillum <cwmmoore@amazon.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cwillum Just one comment for you. Let me know if you have any questions. Thanks!
_opensearch/bucket-agg.md
Outdated
* The field does not support nested arrays; only positive integers can be used. | ||
* If a document does not contain the `_doc_count` field, aggregation uses the document to increase the count by 1. | ||
|
||
OpenSearch features that rely on an accurate document count illustrate the importance of using the `_doc_count` field. To get a better sense for how this field can support other search functionality, see [Index rollups](https://opensearch.org/docs/latest/im-plugin/index-rollups/index/), an OpenSearch feature for the Index Management plugin that stores documents with pre-aggregated data in rollup indexes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change "To get a better sense for" to "For information on" (or something similar). Change "Index Management" to "Index Management (IM)".
Signed-off-by: cwillum <cwmmoore@amazon.com>
Signed-off-by: cwillum cwmmoore@amazon.com
Fixes #829
Description
Add documentation to Bucket Aggregation describing the use of the
_doc_count
field for computing documents that store pre-aggregated data.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.