Skip to content

Add support for "missing" to all bucket aggregations #5324

Closed
@roytmana

Description

@roytmana

NEED: In many (if not majority cases) when present users with business analytics, the user would want to see numbers for complete data set. No matter how you aggregate it should present the same data with the same number of documents. Inability to handle "missing" values exclude those from analysis making analyzed data set incomplete and grand totals dependent on which field(s) the aggregation is done. It is impossible to explain to the users why the lower level totals do not add up to the upper level ones!

WORKAROUND: Currently field based bucket aggregations (term, range etc) have no way to aggregate missing values. The only way is to use missing aggregation on the same level and the same field as the term aggregation itself. It is easy enough when dealing with one level aggregations but if you have 2-3 level aggregation number of "missing" aggregations (and complete lower level aggregation to be repeated in them) mushrooms very quickly to the point that the query is huge, convoluted and not debuggable. It may affect performance as well. Also fetched date needs to be heavily post-processed to extract multiple levels aggregation buckets from under various "missing" elements and put them inline with the regular aggregation values. Below please see a simple query to do 2 level aggregation with just one sum metrics

PROPOSAL: I would suggest that any aggregation operating on a field should have a missing option. If missing config is specified, aggregation should accumulate missing values under that value and honor any nested aggregations within. It should never assume any value like 0 or _missing since it may clash with actual keys. If it is not specified the aggregation should skip missing values as it does now.

This approach makes it entirely compatible with existing logic and give developers complete control over whether to aggregate missing and under what key. In cases when it is not needed (and not specified) there will be no performance overhead. But when it needed it will work faster as we would not need to do missing aggregation and aggregations under it separately (same goes for "other" aggregation)

To be honest, I would love to see the same handling for "other" - documents that have not been included in aggregation due to the aggregation size constraints. Again the same rationale - ability to slice complete data set regardless of aggregation structure. It is just as needed as "missing" and just as troublesome to calculate but
I could understand if you did not add it as it may be not compatible with your algorithms but PLEASE PLEASE add "missing" handling at least

{
      "total": {
        "sum": {
          "field": "money.totals.obligationTotal"
        }
      },
      "missing": {
        "missing": {
          "field": "division"
        },
        "aggs": {
          "total": {
            "sum": {
              "field": "money.totals.obligationTotal"
            }
          },
          "missing": {
            "missing": {
              "field": "fy"
            }
          },
          "group": {
            "terms": {
              "field": "fy",
              "order": { "_term": "asc" }
            },
            "aggs": {
              "total": {
                "sum": {
                  "field": "money.totals.obligationTotal"
                }
              }
            }
          }
        }
      },
      "group": {
        "terms": {
          "field": "division",
          "order": { "_term": "asc" },
          size:100
        },
        "aggs": {
          "total": {
            "sum": {
              "field": "money.totals.obligationTotal"
            }
          },
          "missing": {
            "missing": {
              "field": "fy"
            },
            "aggs": {
              "total": {
                "sum": {
                  "field": "money.totals.obligationTotal"
                }
              }
            }
          },
          "group": {
            "terms": {
              "field": "fy",
              "order": { "_term": "asc" }
            },
            "aggs": {
              "total": {
                "sum": {
                  "field": "money.totals.obligationTotal"
                }
              }
            }
          }
        }
      }
    }

cc @uboness, @jpountz

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions