Skip to content

[Star Tree][Search][RFC] Parse aggregation request to resolve via star tree data structure #14871

@sandeshkr419

Description

@sandeshkr419

With support of star-tree composite of indices, we would to resolve certain aggregation & search paths via star-tree itself. Thinking of 2 possibilities:

  1. Introduce a new search request which explicitly specifies star tree in request construct - this means user learning a new capability or query type. The biggest pitfall in this approach is that the query has to be formed in a certain way as well that it executes via star-tree.
  2. Internally identify query shapes, which can be resolved using star tree and then internally parse the request to a StarTreeQuery & StarTreeAggregator - in this way the user does not has to worry about the queries which can be or cannot be resolved using star tree. In this way, the default search path can follow in case the query cannot be resolved via star tree. This does not involves user intervention once they have created the star tree during index creation. Discussion to disable search via star-tree is continued on [Search] [Star Tree] Option/Param to Disable search via star-tree #14872

I'm in support if approach 2/ as in this way we do not introduce a new overhead for search users to reframe their queries to star-tree. Also, as a feature in development, the full search capabilities of using star tree will be developed incrementally.

For 2/, we would want to keep the search request & search response intact. One such request/response to start building up the framework for star tree request execution will be an aggregation request with groupby/nested aggregation.

In a default search path execution, the query and aggregation path are independently executed. In the star tree code path, the query and aggregation will be tightly coupled and this requires decision making on setting up correct star-tree query & aggregation pair during request parsing itself.

Been thinking something similar to a poc I did here, to create a query/aggregation pair with request parsing itself.

Sample Search Aggregation Request:

{
    "size": 0,
    "aggs": {
                "group_by_clientip": {
                    "terms": {
                        "field": "status"
                    },
                    "aggs": {
                        "max_status": {
                            "sum": {
                                "field": "size"
                            }
                        }
                    }
        }
    }
}

Sample Response Expected:

(this is non-star tree response):

{
    "took": 1971,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 1018,
            "relation": "eq"
        },
        "max_score": null,
        "hits": []
    },
    "aggregations": {
        "group_by_clientip": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 484,
            "buckets": [
                {
                    "key": 209,
                    "doc_count": 63,
                    "max_status": {
                        "value": 39844.0
                    }
                },
                {
                    "key": 217,
                    "doc_count": 58,
                    "max_status": {
                        "value": 35754.0
                    }
                },
                {
                    "key": 201,
                    "doc_count": 53,
                    "max_status": {
                        "value": 32154.0
                    }
                },
                {
                    "key": 220,
                    "doc_count": 53,
                    "max_status": {
                        "value": 28525.0
                    }
                },
                {
                    "key": 208,
                    "doc_count": 52,
                    "max_status": {
                        "value": 33543.0
                    }
                },
                {
                    "key": 210,
                    "doc_count": 52,
                    "max_status": {
                        "value": 29480.0
                    }
                },
                {
                    "key": 213,
                    "doc_count": 52,
                    "max_status": {
                        "value": 28841.0
                    }
                },
                {
                    "key": 202,
                    "doc_count": 51,
                    "max_status": {
                        "value": 31026.0
                    }
                },
                {
                    "key": 216,
                    "doc_count": 51,
                    "max_status": {
                        "value": 31242.0
                    }
                },
                {
                    "key": 204,
                    "doc_count": 49,
                    "max_status": {
                        "value": 30802.0
                    }
                }
            ]
        }
    }
}

Metadata

Metadata

Assignees

Labels

Roadmap:SearchProject-wide roadmap labelSearchSearch query, autocomplete ...etcv2.18.0Issues and PRs related to version 2.18.0

Type

No type

Projects

Status

✅ Done

Status

Done

Status

New

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions