Description
Relevancy rewriters and rankers mechanism
The purpose of this mechanism is to allow a concise and standard way of defining search relevancy occurring on both
query rewrite side and results ranking
This proposal is the collaboration of the
The capability of chaining multiple search relevancy rewriters and possibly results rerankers would allow the following :
- Combine different aspect of relevancy rewriting into a single chain
- Create a common standard for search relevancy related plugin components
- Easily allow comparing query results under different ranking solutions
- Simplify integrating such plugins into the search-relevancy dashboard using dedicated API
Chain Components
Chain operators
Each chain element is an operator which transforms the query content and send it upstream to the next operator - we will
call them Transformers.
The expectation from a transformer is to have no additional side-effects apart from the query transformation.
Chain payload
The chain's payload is the query itself. Each transformer is expected to transform the query in such a way that is
processable by the next transformer.
Chain termination step
The chain is terminated with a terminal step which is no longer emitting the query to upstream components of the chain.
This termination step is likely an actual execution of the query against the underlying search engine.
Chain footsteps
Once a chain is executing, it leaves a trail for each transformer that is operating in the form of specific train info.
Chain execution
The chain order will be defined as part of the query extension, if such definition is not found under the query
extension, the fallback will be the
specific query's index mapping definition of the rewriter (under the mapping's metadata)
Rewriter Transformations
The chain mechanism is actually a composition of query interceptors. These query interceptors purpose will be of
chaining the individual
query rewriter plugin one to the other in a sequential manner.
Rankers Transformations
The chain mechanism is terminated once a termination step is called. Such termination step is the ranker operator.
The ranker operator takes the query input and performs the actual query against the database and ranks the results
according to its own internal reasoning.
We currently don't support paging in the chaining termination step and therefore this step does not allow paging of
the results.
Configuration
Each transformation/operator may use the next levels of configuration:
- Pluging level configuration
- Index level configuration
- Query level configuration
Pluging level configuration
This level of configuration is supported by the Plugin API of opensearch and may be used for static related
configuration of the component.
Implementation of this capability can make use of the BaseRestHandler endpoint extension mechanism.
For example querqy uses such endpoint for it's rewrite rules definition:
PUT /_plugins/_querqy/rewriter/common_rules
{
"class": "querqy.opensearch.rewriter.SimpleCommonRulesRewriterFactory",
"config": {
"rules" : "request =>\nSYNONYM: GET"
}
}
Index level configuration
This level of configuration is supported by the using the index mapping meta DSL which is an existing part of the
mapping DSL.
Example usage of the index mapping configuration:
New chain mapping DSL
For backwards compatibility we will use the index mapping **_meta **_field to preserve the configuration information
related both to the rewriters and rankers.
The chain parts will reside under the generic concepts:
** - rankers - **ranker list of plugins configuration
** - rewriters - **rewriter list of plugins configuration
Metadata under my_index/_mapping
{
"_meta": {
"rankers": [
{
"name": "kendra",
"properties": {
"title_fields": [
"title"
],
"body_fields": [
"published",
"description"
]
}
}
]
}
}
The order of the ranker/rewriter is explicit and the chain will dispatch accordingly (unless another directive appears
under the query chain-directive )
Query level configuration
This level of configuration is supported by using the query extension DSL. This section will have a new chain DSL
structure. In a similar manner to the _"meta" section of the mapping DSL, the "ext" will contain the rankers &
rewriters list.
Extension under _search
{
"query": {
},
"ext": {
"rewriters": [
{
"name": "querqy",
"properties": {
"querqy": {
"matching_query": {
"must_match": {
"query": "rambo"
},
"multi_match": {
"query": "rambo",
"fields": [
"field1",
"field2"
]
}
},
"query_fields": [
"title^3.0",
"brand^2.1",
"shortSummary"
]
}
}
}
],
"rankers": [
{
"name": "kendra",
"properties": {
"title_fields": [
"title"
],
"body_fields": [
"published",
"description"
]
}
}
]
}
}
The order of the ranker/rewriter is explicit and the chain will dispatch accordingly (unless another directive appears
This is a flow chart visualization of the chain steps:
############ ############ ############# #############
# _Search # # querqy # # kendra # # Results #
# -query # # -rewrite# # -execute # # - 1 #
# ... # ---------> # query# ---------> # search # ---------># - 2 #
# # # # # -rank # # - 3 #
############ # # # results # # - 4 #
############ ############# #############
/\
||
||
||
||
\/
###############
# opensearch #
# -run-query #
###############
Chain Context
Search Relevancy Context Information
In order for the rewriter and ranker chain to be able to track and be informed of all the modifications each step is
performing an execution context is needed.
This context will have the next fields that can be applied to any future plugin that needs to perform rewrites or
ranking
- context (information about the current execution parameters)
-
params section is an input to each and every ranker and rewriter that it may use it for its own needs*
- query - the original query that is to be carried forward down the chain
-
execution (execution related content that is generated throughout the pipeline)
- id auto-generated unique id describing the chain instance)
- rewriters rewriter list of plugin query configuration
- rankers ranker list of plugins query configuration
- exclude remove rewriters/rankers that appear in the default index configuration
-
This execution section may have additional internal fields which are related to the execution flow itself and are
subject to future changes*
This context will be attached to the query DSL under the _ext section.
POST my_index/_search
{
"query": {
"match_all": {}
},
"ext": {
"context": {
"params": {
"query": {
"match_all": {}
}
}
},
"execution": {
"id": "ABC123",
"rewriters": [
{
"name": "querqy",
"properties": {
"querqy": {
"matching_query": {
"must_match": {
"query": "rambo"
},
"multi_match": {
"query": "rambo",
"fields": [
"field1",
"field2"
]
}
},
"query_fields": [
"title^3.0",
"brand^2.1",
"shortSummary"
]
}
}
}
],
"rankers": [
{
"name": "kendra",
"properties": {
"title_fields": [
"title"
],
"body_fields": [
"published",
"description"
]
}
}
]
}
}
}
Activating Query rewriter / rerankers
During the lifetime of the index, once a query is running against an index - the following steps will occur:
-
verify the index if search-relevancy activated
- create a chain flow control component which will drive the chain of rewriters & rerankers
create the search-relevancy context information (or use existing one if such was created)
- create a chain flow control component which will drive the chain of rewriters & rerankers
-
for each rewrite step in the rewriters list :
- dispatch execution to the plugin
- plugin receives the params section as parameters
- plugin changes the query
- plugin may add additional information on its execution step under ext->context->rewriters->$name$->info
- returns execution to the chain flow control
-
for each semantic-ranker step in the rankers list:
- dispatch execution to the plugin
- plugin receives the params section as parameters
- plugin performs the ranking logic
- returns newly ranked results to the caller
In case the rewriter/ranker doesn't appear in the query ext section, but it does appear in the relevant index **
mapping** section -
the configuration details from the index mapping section will be copied into the query relevant ext section.
To disable a rewriter/ranker from being activated on a query in cases where the index mapping indicate it is a part of
the chain,
add their name to exclude list under the execution section.
Example
Configuration Stage
Step 0: Create plugins configuration settings
PUT /_plugins/_querqy/rewriter
{
"common_rules": [
{
"class": "querqy.opensearch.rewriter.SimpleCommonRulesRewriterFactory",
"config": {
"rules": "request =>\nSYNONYM: GET"
}
}
]
}
PUT /_plugins/_kendra
{
"config": {
"endpoint": [
"127.0.0.1",
"0.0.0.0"
]
}
}
Step 1: Create mapping for index my_index
PUT my_index/_mapping
{
"_meta": {
"rankers": [
{
"nane":"kendra", "properties": {
"title_fields": [
"title"
],
"body_fields": [
"published",
"description"
]
}
}
]
}
}
Query Stage
Step 2: original request from user : “rambo”
Step 2.1: Structured query from application coming to OpenSearch (this is done by the customer’s application)
POST my_index/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"topic": "hobby"
}
}
],
"filter": [
{
"range": {
"dateField": {
"gte": "now-12d",
"lte": "now-10d"
}
}
}
]
}
}
}
The chain flow control intercepts the index search request and will dispatch the request for each the query rewriter
{
"query": {
"bool": {
"must": [
{
"match": {
"topic": "hobby"
}
}
],
"filter": [
{
"range": {
"dateField": {
"gte": "now-12d",
"lte": "now-10d"
}
}
}
]
}
},
"ext": {
"context": {
"params": {
"query": {
"bool": {
"must": [
{
"match": {
"topic": "hobby"
}
}
],
"filter": [
{
"range": {
"dateField": {
"gte": "now-12d",
"lte": "now-10d"
}
}
}
]
}
}
},
// this section is generated for the chain if not given by user
"execution": {
"id": "A1b2c",
"rankers": [
{
"name": "kendra",
"properties": {
"title_fields": [
"title"
],
"body_fields": [
"published",
"description"
]
}
}
],
"rewriters": [
{
"name": "querqy",
"properties": {
"query": {
"querqy": {
"matching_query": {
"query": "notebook"
},
"query_fields": [
"title^3.0",
"brand^2.1",
"shortSummary"
]
}
}
}
}
]
}
}
}
}
Step 3: First rewriter (Querqy) is dispatched and generates the new query (query rewrite)
{
"query": {
//todo - put here the query after being re-written by querqy
},
"ext": {
"context": {
"params": {
"query": {
"bool": {
"must": [
{
"match": {
"topic": "hobby"
}
}
],
"filter": [
{
"range": {
"dateField": {
"gte": "now-12d",
"lte": "now-10d"
}
}
}
]
}
}
},
"execution": {
"id": "A1b2c",
"rankers": [
{
"name": "kendra",
"properties": {
"title_fields": [
"title"
],
"body_fields": [
"published",
"description"
]
}
}
],
"rewriters": [
{
"name": "querqy",
"properties": {
"query": {
"querqy": {
"matching_query": {
"query": "notebook"
},
"query_fields": [
"title^3.0",
"brand^2.1",
"shortSummary"
]
}
},
"info" : { } // additional info that querqy may add after query rewrite
}
}
]
}
}
}
}
Step 3: chain flow control has no additional rewrites to dispatch - so it will dispatch to the rankers. The first ranker in the chain will review the context params and take the necessary information .
After it will complete its action it will have the results ranked according to its internal reasoning
{
"query": {
//todo - put here the query after being re-written by querqy
},
"ext": {
"context": {
"params": {
"query": {
"bool": {
"must": [
{
"match": {
"topic": "hobby"
}
}
],
"filter": [
{
"range": {
"dateField": {
"gte": "now-12d",
"lte": "now-10d"
}
}
}
]
}
}
},
"execution": {
"id": "A1b2c",
"rankers": [
{
"name": "kendra",
"properties": {
"title_fields": [
"title"
],
"body_fields": [
"published",
"description"
]
}
}
],
"rewriters": [
{
"name": "querqy",
"properties": {
"query": {
"querqy": {
"matching_query": {
"query": "notebook"
},
"query_fields": [
"title^3.0",
"brand^2.1",
"shortSummary"
]
}
},
"info" : { }
}
}
]
}
}
}
}
Response Stage
Step 4: Reranking work after the rewrite chain is completed - returning the results to the original calling service
ranker search results json
{
"took" : 0,
"timed_out" : false,
"ext": { // this ext section is suggested to be added here as part of the results.
"context": {
"params": {
"query": {
"bool": {
"must": [
{
"match": {
"topic": "hobby"
}
}
],
"filter": [
{
"range": {
"dateField": {
"gte": "now-12d",
"lte": "now-10d"
}
}
}
]
}
}
},
"execution": {
"id": "A1b2c",
"rankers": [
{
"name": "kendra",
"properties": {
"title_fields": [
"title"
],
"body_fields": [
"published",
"description"
]
}
}
],
"rewriters": [
{
"name": "querqy",
"properties": {
"query": {
"querqy": {
"matching_query": {
"query": "notebook"
},
"query_fields": [
"title^3.0",
"brand^2.1",
"shortSummary"
]
}
},
"info" : { }
}
}
]
}
}
},
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 1.8773359,
"hits" : [
{
"_index" : "employees",
"_type" : "_doc",
"_id" : "4",
"_score" : 1.8773359,
"_source" : {
"id" : 4,
"name" : "Alan Thomas",
"email" : "athomas2@example.com",
"gender" : "male",
"ip_address" : "200.47.210.95",
"date_of_birth" : "11/12/1985",
"company" : "Yamaha",
"position" : "Resources Manager",
"experience" : 12,
"country" : "China",
"phrase" : "Emulation of roots heuristic coherent systems",
"salary" : 300000
}
}
]
}
}
The response DSL dosn't contain such ext part - this RFC is suggesting to add such a section to the results.