Description
Elasticsearch version (bin/elasticsearch --version
): 7.5.2
Plugins installed: []
JVM version (java -version
): 7.5.2
OS version (uname -a
if on a Unix-like system): docker
Description of the problem including expected versus actual behavior:
Bucket Aggregation size setting should never throw too_many_buckets_exception if size is less than respect search.max_buckets. If I have a simple terms aggregation (no nesting) then I'd think it would always return the max number of buckets as determined by the size property. I get that for accuracy more records might be returned from various shards which may be over the 10k limit but my end result returned should be <= 10k as defined by the size property.
TLDR: I don't care what queries happen behind the scenes to get me my 10k buckets. All I care about is I get my 10k buckets which is valid size as it's <= search.max_buckets
:-)
Steps to reproduce:
Assuming I have more than 10k unique document ids..
POST /events/_search
{
"aggs": {
"terms_id": {
"meta": {
"@field_type": "keyword"
},
"terms": {
"field": "id",
"size": 10000
}
}
}
Should return 10k unique buckets with an id in each bucket..
What happens is:
{
"error": {
"root_cause": [
{
"type": "too_many_buckets_exception",
"reason": "Trying to create too many buckets. Must be less than or equal to: [10000] but was [10001]. This limit can be set by changing the [search.max_buckets] cluster level setting.",
"max_buckets": 10000
}
],
"type": "search_phase_execution_exception",
"reason": "all shards failed",
"phase": "query",
"grouped": true,
"failed_shards": [
{
"shard": 0,
"index": "events",
"node": "8v-46gnFQWGp2EsalBzwYw",
"reason": {
"type": "too_many_buckets_exception",
"reason": "Trying to create too many buckets. Must be less than or equal to: [10000] but was [10001]. This limit can be set by changing the [search.max_buckets] cluster level setting.",
"max_buckets": 10000
}
}
]
},
"status": 503
}
Reasoning:
I'd love to learn more why this happens, if we could get a detailed response on this choice that would be greatly appreciated. I know I wasn't the only one as it was discussed here too: https://discuss.elastic.co/t/large-aggregate-too-many-buckets-exception/189091/15
If I'm understanding this issue correctly, wouldn't the following scenario also throw this error. Let's say I have two shards and shard 1 contains 10k+ unique ids and shard2 contains 10k+ different unique ids. The combination of both of them being queried would return 20k buckets that need to be merged down into the respected bucket size of 10k. But creating 1 bucket over the max behind the scenes would throw this error.