Description
Overview
GeoDistance search achieves the most basic spatial search use-case for "Find all points of interest within R units of known location X". This is great when the starting location is already known (e.g., cell phone GPS, travelling location of interest) but doesn't work for conflation use-cases such as: "Find all CVS Pharmacies that are within 1 mile of a Walgreens pharmacy? Or all Home Depot's that are within 2 blocks of a Lowes?" (After all, competition is essential to a free enterprise economy). The current search toolbox (query, filter, aggregations, reducer, percolator) doesn't lend itself well to these types of queries without writing a bit of complex scripting.
This feature will add - what we'll temporarily call - a ConflationAggregator. The purpose of this aggregator is to achieve the above use case. A primary aggregator defines the primary list of buckets (e.g., Home Depots). A list of secondary <filter, aggregator, query> defines the operation or post-filter to perform using the documents in the result set of the primary.
Conflation Aggregation Structure
Below is an initial cut at the colocation grammar. There can be one-to-many secondary filters to achieve multi-conflation queries, such as: "Find all Home Depots within 10 miles of a Lowes or Ace Hardware". The design is intended to be flexible enough to avoid limiting this aggregation to geo only queries.
"aggs" : {
"<aggregation_name"> : {
"conflation" : {
"primary" : {
"<filter> | <parent aggregation>"
},
"secondary" : [
"<secondary_name>" : {
"<filter> | <child aggregation>"
}
]
}
}
}
Example
Query
This is an initial rough idea on how to use the conflation aggregator to achieve a complex geodistance query like: "Find all Home Depots that are within 10 miles of a Lowes"
{
"query": {"match_all": {}},
"aggs": {
"HomeDepots" : {
"conflation" : {
"primary" : {
"filter" : {
"bool" : {
"must" : [
{"term" : {"name" : "Home Depot"}},
{"term" : {"businessType": "Home Improvement"}}
]
}
}
},
"secondary": [
"Lowes": {
"filter": {
"and" : {
"filters" : [
"term" : {"name" : "Lowes"},
"geo_distance": {
"field": "location",
"origin_field": "HomeDepots.location",
"ranges": [
{"from" : 0, "to" : 10}
]
}
]
}
}
}
]
}
}
}
}
Result
"aggregations": {
"HomeDepots": {
"buckets": [
{
"key" : "Home Depot",
"id" : 1034,
"doc_count": 2
},
{
"key" : "Home Depot",
"id" : 3432,
"doc_count": 2
},
{
"key" : "Home Depot",
"id" : 5644,
"doc_count": 1
},
{
"key" : "Home Depot",
"id" : 8999,
"doc_count": 1
},
{
"key" : "Home Depot",
"id" : 10232,
"doc_count": 2
}
]
}
}
This issue is open for discussion around use-cases (non-geo), design, naming, etc.