-
Notifications
You must be signed in to change notification settings - Fork 698
(feat) Add aggregations framework to enable numeric analytics on search results #2244
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
(feat) Add aggregations framework to enable numeric analytics on search results #2244
Conversation
Enable powerful analytics and data exploration capabilities that go beyond
simple faceting. Users can now compute metrics (sum, avg, min, max, count,
sumsquares, stats) across search results and group them by field values or
ranges with nested sub-aggregations for multi-dimensional analysis.
This addresses the need for:
- Computing statistics across filtered result sets (e.g., "average price of
products matching 'laptop'")
- Multi-level grouping and metrics (e.g., "total sales per region per category")
- Complex analytics queries without requiring separate aggregation passes
Key features:
- Metric aggregations: sum, avg, min, max, count, sumsquares, stats
- Bucket aggregations: terms (group by values), range (group by ranges)
- Nested sub-aggregations for multi-dimensional analytics
- Computed efficiently during query execution using visitor pattern
- Fully backward compatible - Facets API unchanged
Example - average price per brand:
byBrand := bleve.NewTermsAggregation("brand", 10)
byBrand.AddSubAggregation("avg_price", bleve.NewAggregationRequest("avg", "price"))
searchRequest.Aggregations = bleve.AggregationsRequest{"by_brand": byBrand}
Enable search-as-you-type style aggregations where bucket terms dynamically
match user input. Users can now aggregate by field values that match what's
being typed in a search box, making autosuggestions cleaner and more focused
(e.g., as user types "ste", show matching authors, titles, categories all
filtered to terms starting with "ste").
This addresses the need for:
- Dynamic faceted autosuggestions that update as users type
- Filtering high-cardinality fields to relevant matches only
- Consistent filtering API between facets and aggregations (ports existing
facet filtering feature)
Performance benefits:
- Zero-allocation filtering - only matching terms convert from []byte to string
- Filters apply before bucket creation and sub-aggregation processing
- Fast prefix checks with bytes.HasPrefix before regex evaluation
Key changes:
- Add TermPrefix and TermPattern fields to AggregationRequest
- Pre-compile regex patterns in NewTermsAggregation (now returns error)
- Add NewTermsAggregationWithFilter helper
Example - autocomplete aggregation:
agg, _ := bleve.NewTermsAggregationWithFilter("brand", 10, userInput, "")
|
I promise this is my last big change @abhinavdangeti |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces a comprehensive aggregations framework to enable numeric analytics and data exploration on search results. The implementation adds metric aggregations (sum, avg, min, max, count, sumsquares, stats) and bucket aggregations (terms, range) with support for nested sub-aggregations. Additionally, it ports the prefix and regex filtering feature from PR #2242 to enable dynamic term filtering in aggregations.
Key changes include:
- New aggregation API with AggregationRequest and AggregationsRequest types that integrate seamlessly with the existing SearchRequest
- Visitor pattern-based implementation that computes aggregations during query execution with zero additional I/O overhead
- Support for multi-level nested aggregations enabling complex analytical queries
Reviewed Changes
Copilot reviewed 15 out of 15 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| search_no_knn.go | Adds Aggregations field to SearchRequest for non-KNN searches |
| search_knn.go | Adds Aggregations field to SearchRequest for KNN-enabled searches |
| search.go | Defines AggregationRequest, AggregationsRequest types and helper constructors; adds Aggregations field to SearchResult |
| search/collector/topn.go | Integrates aggregations into the collector with SetAggregationsBuilder, field deduplication, and visitor callbacks |
| search/aggregations_builder.go | Implements the core AggregationsBuilder that manages multiple aggregation builders and coordinates field visits |
| search/aggregations_builder_test.go | Unit tests for AggregationResults.Merge functionality covering various aggregation types |
| search/aggregation/numeric_aggregation.go | Implements metric aggregations (sum, avg, min, max, count, sumsquares, stats) with proper numeric decoding |
| search/aggregation/numeric_aggregation_test.go | Comprehensive unit tests for all metric aggregations including edge cases |
| search/aggregation/bucket_aggregation.go | Implements bucket aggregations (terms, range) with sub-aggregation support and term filtering |
| search/aggregation/optimized_numeric_aggregation.go | Provides infrastructure for segment-level optimization (currently placeholder with bugs in implementation) |
| index/scorch/segment_aggregation_stats.go | Implements segment-level statistics caching for future optimizations |
| index_impl.go | Adds buildAggregation function to convert AggregationRequest to AggregationBuilder and wires aggregations into search execution |
| aggregation_test.go | Integration tests for metric aggregations using real index |
| bucket_aggregation_test.go | Integration tests for bucket aggregations with sub-aggregations |
| docs/aggregations.md | Comprehensive documentation covering architecture, API, examples, and performance characteristics |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
bad5c51 to
8a44955
Compare
Fixes bug in nested bucket aggregations where metric values were duplicated due to duplicate field registration in SubAggregationFields(). Also fixes StartDoc/EndDoc lifecycle for bucket sub-aggregations and min/max comparison logic in optimized aggregations. Adds Clone() method to AggregationBuilder interface for proper deep copying of nested aggregation hierarchies. Adopts setter pattern for aggregation filters (SetPrefixFilter, SetRegexFilter).
8a44955 to
5723569
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 15 out of 15 changed files in this pull request and generated 10 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Thanks @ajroetker , allow us to review your work here. |
- Fix double-counting in bucket aggregations with sawValue guard - Remove unused count fields from Sum and SumSquares aggregations - Move StatsResult to search package for cleaner stats merging - Add field deduplication and validation for term filters
Also properly adds support for average for merging
|
@abhinavdangeti I've also got implementations for histograms, data histograms, geo hashing buckets, geo distance buckets, and cardinality (via hyperloglog++ sketches) if it would be more helpful to include them or leave them to a later PR for consideration. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 16 out of 16 changed files in this pull request and generated 8 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| ranges := []*bleve.numericRange{ | ||
| {Name: "low", Min: nil, Max: &mid}, | ||
| {Name: "medium", Min: &mid, Max: &max}, | ||
| {Name: "high", Min: &max, Max: nil}, | ||
| } | ||
|
|
||
| agg := bleve.NewRangeAggregation("price", ranges) | ||
| ``` |
Copilot
AI
Nov 21, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The type bleve.numericRange is not exported (it's lowercase). The correct type reference in the documentation should be the actual internal type name. Since this is user-facing documentation, consider providing a clearer example that doesn't reference the unexported type directly, or note that users should use the helper functions provided.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Addresses #2243 and ports #2242 to this new aggregations style as well.