Skip to content

Conflation Aggregator #11460

Closed as not planned
Closed as not planned
@nknize

Description

@nknize

Overview

GeoDistance search achieves the most basic spatial search use-case for "Find all points of interest within R units of known location X". This is great when the starting location is already known (e.g., cell phone GPS, travelling location of interest) but doesn't work for conflation use-cases such as: "Find all CVS Pharmacies that are within 1 mile of a Walgreens pharmacy? Or all Home Depot's that are within 2 blocks of a Lowes?" (After all, competition is essential to a free enterprise economy). The current search toolbox (query, filter, aggregations, reducer, percolator) doesn't lend itself well to these types of queries without writing a bit of complex scripting.

This feature will add - what we'll temporarily call - a ConflationAggregator. The purpose of this aggregator is to achieve the above use case. A primary aggregator defines the primary list of buckets (e.g., Home Depots). A list of secondary <filter, aggregator, query> defines the operation or post-filter to perform using the documents in the result set of the primary.

Conflation Aggregation Structure

Below is an initial cut at the colocation grammar. There can be one-to-many secondary filters to achieve multi-conflation queries, such as: "Find all Home Depots within 10 miles of a Lowes or Ace Hardware". The design is intended to be flexible enough to avoid limiting this aggregation to geo only queries.

"aggs" : {
  "<aggregation_name"> : {
    "conflation" : {
      "primary" : {
        "<filter> | <parent aggregation>"
      },
      "secondary" : [
        "<secondary_name>" : {
          "<filter> | <child aggregation>"
        }
      ]
    } 
  }
}

Example

Query

This is an initial rough idea on how to use the conflation aggregator to achieve a complex geodistance query like: "Find all Home Depots that are within 10 miles of a Lowes"

{
  "query": {"match_all": {}}, 
  "aggs": {
    "HomeDepots" : {
      "conflation" : {
        "primary" : {
          "filter" : {
            "bool" : {
              "must" : [
                {"term" : {"name" : "Home Depot"}},
                {"term" : {"businessType": "Home Improvement"}}
              ]
            }
          }
        },
        "secondary": [
          "Lowes": {
            "filter": {
              "and" : {
                "filters" : [
                  "term" : {"name" : "Lowes"},
                  "geo_distance": {
                    "field": "location",
                    "origin_field": "HomeDepots.location",
                    "ranges": [
                      {"from" : 0, "to" : 10}
                    ]
                  }
                ]
              } 
            }
          }
        ]
      }
    }
  }
}

Result

"aggregations": {
  "HomeDepots": {
     "buckets": [
        {
           "key" : "Home Depot",
           "id" : 1034,
           "doc_count": 2
        },
        {
           "key" : "Home Depot",
           "id" : 3432,
           "doc_count": 2
        },
        {
           "key" : "Home Depot",
           "id" : 5644,
           "doc_count": 1
        },
        {
           "key" : "Home Depot",
           "id" : 8999,
           "doc_count": 1
        },
        {
           "key" : "Home Depot",
           "id" : 10232,
           "doc_count": 2
        }
     ]
  }
}

This issue is open for discussion around use-cases (non-geo), design, naming, etc.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions