-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding multi_term aggregator support #2687
Conversation
❌ Gradle Check failure cfff6355d63d1e38c8ae5b8fe8fcc719022b46ce |
❌ Gradle Check failure 18bedad58ea629263acbbe56ca6c57fae0237d7d |
✅ Gradle Check success 5720bc2cc46a19cf0a60cd8f2848be1eca4b0760 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for submitting this! It's looking good but it's HUGE!
I think we should slim down the code footprint a bit by deriving from common base classes and avoiding copy/paste.
server/src/internalClusterTest/java/org/opensearch/search/aggregations/bucket/MultiTermsIT.java
Outdated
Show resolved
Hide resolved
...r/src/main/java/org/opensearch/search/aggregations/support/MultiTermsValuesSourceConfig.java
Outdated
Show resolved
Hide resolved
❌ Gradle Check failure dd72ad4347e3e7cbc2ca736edfb18ba7f7616c5c |
Signed-off-by: Peng Huo <penghuo@gmail.com>
Signed-off-by: Peng Huo <penghuo@gmail.com>
Signed-off-by: Peng Huo <penghuo@gmail.com>
Signed-off-by: Peng Huo <penghuo@gmail.com>
Signed-off-by: Peng Huo <penghuo@gmail.com>
Signed-off-by: Peng Huo <penghuo@gmail.com>
dd72ad4
to
a35ef76
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry this took a while! It was a biggie. I like that we extended the base test case and MultiValuesSourceFieldConfig. I think it's cleaner. This LGTM!
server/src/main/java/org/opensearch/search/aggregations/bucket/terms/ParsedTerms.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/opensearch/search/aggregations/bucket/terms/MultiTermsAggregator.java
Outdated
Show resolved
Hide resolved
.../main/java/org/opensearch/search/aggregations/bucket/terms/MultiTermsAggregationFactory.java
Outdated
Show resolved
Hide resolved
server/src/internalClusterTest/java/org/opensearch/search/aggregations/bucket/MultiTermsIT.java
Show resolved
Hide resolved
Signed-off-by: Peng Huo <penghuo@gmail.com>
@penghuo LGTM, have a question, may be @nknize could also chime in: since this feature has significant performance impact (at the moment), should we guard it behind the setting (enable / disable) so the users would make the conscious decision of using it? Also, how large were the dataset which the feature were benchmarked against? (I don't see it anywhere). |
Need to run
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per @reta question re: performance let's add javadocs in MultiTermsAggregationBuilder
clearly describing how to use the agg and the known performance issues.
.../main/java/org/opensearch/search/aggregations/bucket/terms/MultiTermsAggregationBuilder.java
Outdated
Show resolved
Hide resolved
Since this is a transient computation and nothing is persisted (e.g., not subject to on disk breaking changes) I'm not as concerned about a feature flag. If we plan to make a lot of changes to the public facing API then a feature flag would be more appropriate. |
Signed-off-by: Peng Huo <penghuo@gmail.com>
Add Test Env section. We use index logs-201998 (esrally http_logs track) |
Signed-off-by: Peng Huo <penghuo@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great. Thank you @penghuo! LGTM
I'd like to give @reta one more go at a review then we can merge to main and backport to 2.1. |
Adds a new multi_term aggregation. The current implementation focuses on adding new type aggregates. Performance (latency) is suboptimal in this iteration, mainly because of brute force encoding/decoding a list of values into bucket keys. A performance improvement change will be made as a follow on. Signed-off-by: Peng Huo <penghuo@gmail.com> (cherry picked from commit 03fbca3)
Adds a new multi_term aggregation. The current implementation focuses on adding new type aggregates. Performance (latency) is suboptimal in this iteration, mainly because of brute force encoding/decoding a list of values into bucket keys. A performance improvement change will be made as a follow on. Signed-off-by: Peng Huo <penghuo@gmail.com> (cherry picked from commit 03fbca3) Co-authored-by: Peng Huo <penghuo@gmail.com>
Signed-off-by: Peng Huo penghuo@gmail.com
Description
Adding multi_terms aggregator support.
To Reviewers
Limitation
The current implementation focuses on adding new type aggregates. Performance (latency) is not good. This solution is slow, mainly because of simply encoding/decoding a list of values into bucket keys. A performance improvement PR will be released at a later date.
Difference between terms and multi_terms aggregation
false/true
instead of"false"/"true"
Demo
Correctness Test
UT
add UT for each new added class.
IT
Performance Test
Test Environment
OpenSearch multi_terms vs script
Goal
queries
Test Result
Conclusion
Latency summary
multi_terms vs painless
OpenSearch multi_terms vs terms
Goal
Queries
Test Result
Conclusion
Latency summary
terms vs multi_terms
Issues Resolved
#1629
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.