[PROPOSAL] Search Semantic Chaining Mechanisms 

## Relevancy rewriters and rankers mechanism

The purpose of this mechanism is to allow a concise and standard way of defining search relevancy occurring on both
query rewrite side and results ranking

This proposal is the collaboration of the

- @kevinawskendra
- @ps48
- @anirudha
- @mahita
- @pankhuri


The capability of chaining multiple search relevancy rewriters and possibly results rerankers would allow the following :

* Combine different aspect of relevancy rewriting into a single chain
* Create a common standard for search relevancy related plugin components
* Easily allow comparing query results under different ranking solutions
* Simplify integrating such plugins into the search-relevancy dashboard using dedicated API

------------
### Chain Components

**Chain operators**
Each chain element is an operator which transforms the query content and send it upstream to the next operator - we will
call them Transformers.

The expectation from a transformer is to have no additional side-effects apart from the query transformation.

**Chain payload**
The chain's payload is the query itself. Each transformer is expected to transform the query in such a way that is
processable by the next transformer.

**Chain termination step**
The chain is terminated with a terminal step which is no longer emitting the query to upstream components of the chain.
This termination step is likely an actual execution of the query against the underlying search engine.

**Chain footsteps**
Once a chain is executing, it leaves a trail for each transformer that is operating in the form of specific train info.

**Chain execution**
The chain order will be defined as part of the query extension, if such definition is not found under the query
extension, the fallback will be the
specific query's index mapping definition of the rewriter (under the mapping's metadata)

#### Rewriter Transformations

The chain mechanism is actually a composition of query interceptors. These query interceptors purpose will be of
chaining the individual
query rewriter plugin one to the other in a sequential manner.

#### Rankers Transformations

The chain mechanism is terminated once a termination step is called. Such termination step is the ranker operator.
The ranker operator takes the query input and performs the actual query against the database and ranks the results
according to its own internal reasoning.

_**We currently don't support paging in the chaining termination step and therefore this step does not allow paging of
the results.**_

#### Configuration

Each transformation/operator may use the next levels of configuration:

* Pluging level configuration
* Index level configuration
* Query level configuration

#### Pluging level configuration

This level of configuration is supported by the Plugin API of opensearch and may be used for static related
configuration of the component.
Implementation of this capability can make use of the BaseRestHandler endpoint extension mechanism.

For example **_querqy_** uses such endpoint for it's rewrite rules definition:

PUT  /_plugins/_querqy/rewriter/common_rules
```json
{
  "class": "querqy.opensearch.rewriter.SimpleCommonRulesRewriterFactory",
  "config": {
      "rules" : "request =>\nSYNONYM: GET"
  }
}
```

#### Index level configuration

This level of configuration is supported by the using the index mapping meta DSL which is an existing part of the
mapping DSL.
Example usage of the index mapping configuration:

**_New chain mapping DSL_**
For backwards compatibility we will use the index mapping ***_meta* **_field to preserve the configuration information
related both to the rewriters and rankers.

The chain parts will reside under the generic concepts:
** *- rankers -* **ranker list of plugins configuration
** *- rewriters -* **rewriter list of plugins configuration

Metadata under my_index/_mapping

```json
{
  "_meta": {
    "rankers": [
      {
        "name": "kendra",
        "properties": {
          "title_fields": [
            "title"
          ],
          "body_fields": [
            "published",
            "description"
          ]
        }
      }
    ]
  }
}
```

The order of the ranker/rewriter is explicit and the chain will dispatch accordingly (unless another directive appears
under the query chain-directive )

#### Query level configuration

This level of configuration is supported by using the query extension DSL. This section will have a new chain DSL
structure. In a similar manner to the _"_meta"_ section of the mapping DSL, the _"ext"_ will contain the rankers &
rewriters list.

_**Extension under _search**_

```json
{
  "query": {
  },
  "ext": {
    "rewriters": [
      {
        "name": "querqy",
        "properties": {
          "querqy": {
            "matching_query": {
              "must_match": {
                "query": "rambo"
              },
              "multi_match": {
                "query": "rambo",
                "fields": [
                  "field1",
                  "field2"
                ]
              }
            },
            "query_fields": [
              "title^3.0",
              "brand^2.1",
              "shortSummary"
            ]
          }
        }
      }
    ],
    "rankers": [
      {
        "name": "kendra",
        "properties": {
          "title_fields": [
            "title"
          ],
          "body_fields": [
            "published",
            "description"
          ]
        }
      }
    ]
  }
}
```

The order of the ranker/rewriter is explicit and the chain will dispatch accordingly (unless another directive appears

This is a flow chart visualization of the chain steps:

```text
############                 ############             #############           #############
# _Search  #                 #  querqy  #             #  kendra   #           #  Results  #
#   -query #                 #  -rewrite#             #  -execute #           #    -   1  #
#      ... #   --------->    #     query#  ---------> #    search # --------->#    -   2  #   
#          #                 #          #             #  -rank    #           #    -   3  #
############                 #          #             #   results #           #    -   4  #
                             ############             #############           #############
                                                           /\
                                                           ||
                                                           || 
                                                           || 
                                                           || 
                                                           \/ 
                                                      ###############
                                                      # opensearch  #  
                                                      #  -run-query #   
                                                      ###############
                                                      
```

### Chain Context

**Search Relevancy Context Information**
In order for the rewriter and ranker chain to be able to track and be informed of all the modifications each step is
performing an execution context is needed.

This context will have the next fields that can be applied to any future plugin that needs to perform rewrites or
ranking

- **context** (information about the current execution parameters)
    - **params** section is an input to each and every ranker and rewriter that it may use it for its own needs*
        - **query** - the original query that is to be carried forward down the chain

    - **execution** (execution related content that is generated throughout the pipeline)
        - **id**          **auto-generated** unique id describing the chain instance)
        - **rewriters**   rewriter list of plugin query configuration
        - **rankers**     ranker list of plugins query configuration
        - **exclude**     remove rewriters/rankers that appear in the default index configuration

This **execution**  section may have additional internal fields which are related to the execution flow itself and are
subject to future changes*

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

This **context** will be attached to the query DSL under the **_ext** section.

POST my_index/_search

```json
{
  "query": {
    "match_all": {}
  },
  "ext": {
    "context": {
      "params": {
        "query": {
          "match_all": {}
        }
      }
    },
    "execution": {
      "id": "ABC123",
      "rewriters": [
        {
          "name": "querqy",
          "properties": {
            "querqy": {
              "matching_query": {
                "must_match": {
                  "query": "rambo"
                },
                "multi_match": {
                  "query": "rambo",
                  "fields": [
                    "field1",
                    "field2"
                  ]
                }
              },
              "query_fields": [
                "title^3.0",
                "brand^2.1",
                "shortSummary"
              ]
            }
          }
        }
      ],
      "rankers": [
        {
          "name": "kendra",
          "properties": {
            "title_fields": [
              "title"
            ],
            "body_fields": [
              "published",
              "description"
            ]
          }
        }
      ]
    }
  }
}

```

**Activating Query rewriter / rerankers**

During the lifetime of the index, once a query is running against an index - the following steps will occur:

1) verify the index if search-relevancy activated
    1) create a chain flow control component which will drive the chain of rewriters & rerankers
       create the search-relevancy context information (or use existing one if such was created)

2) for each rewrite step in the rewriters list :
    1) dispatch execution to the plugin
    2) plugin receives the **params** section as parameters
    3) plugin changes the query
    4) plugin may add additional information on its execution step under _ext->context->rewriters->$name$->info_
    5) returns execution to the chain flow control

3) for each semantic-ranker step in the rankers list:
    1) dispatch execution to the plugin
    2) plugin receives the **params** section as parameters
    3) plugin performs the ranking logic
    4) returns newly ranked results to the caller

In case the rewriter/ranker doesn't appear in the query **ext** section, but it does appear in the relevant index **
mapping** section -
the configuration details from the index mapping section will be copied into the query relevant  **ext** section.

To disable a rewriter/ranker from being activated on a query in cases where the index mapping indicate it is a part of
the chain,
add their name to exclude list under the execution section.

### **Example**

**Configuration Stage**
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Step 0: Create plugins configuration settings

PUT /_plugins/_querqy/rewriter

```json
{
  "common_rules": [
    {
      "class": "querqy.opensearch.rewriter.SimpleCommonRulesRewriterFactory",
      "config": {
        "rules": "request =>\nSYNONYM: GET"
      }
    }
  ]
}
```

PUT /_plugins/_kendra

```json
{
  "config": {
    "endpoint": [
      "127.0.0.1",
      "0.0.0.0"
    ]
  }
}
```

Step 1: Create mapping for index my_index

PUT my_index/_mapping

```json
{
  "_meta": {
    "rankers": [
      {
        "nane":"kendra", "properties": {
          "title_fields": [
            "title"
          ],
          "body_fields": [
            "published",
            "description"
          ]
        }
      }
    ]
  }
}
```

**Query Stage**
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Step 2: original request from user : **“rambo”**

Step 2.1: Structured query from application coming to OpenSearch (this is done by the customer’s application)

POST my_index/_search

```json
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "topic": "hobby"
          }
        }
      ],
      "filter": [
        {
          "range": {
            "dateField": {
              "gte": "now-12d",
              "lte": "now-10d"
            }
          }
        }
      ]
    }
  }
}

```

The chain flow control intercepts the index search request and will dispatch the request for each the query rewriter

```json
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "topic": "hobby"
          }
        }
      ],
      "filter": [
        {
          "range": {
            "dateField": {
              "gte": "now-12d",
              "lte": "now-10d"
            }
          }
        }
      ]
    }
  },
  "ext": {
    "context": {
      "params": {
        "query": {
          "bool": {
            "must": [
              {
                "match": {
                  "topic": "hobby"
                }
              }
            ],
            "filter": [
              {
                "range": {
                  "dateField": {
                    "gte": "now-12d",
                    "lte": "now-10d"
                  }
                }
              }
            ]
          }
        }
      },
      // this section is generated for the chain if not given by user 
      "execution": { 
        "id": "A1b2c", 
        "rankers": [
          {
            "name": "kendra",
            "properties": {
              "title_fields": [
                "title"
              ],
              "body_fields": [
                "published",
                "description"
              ]
            }
          }
        ],
        "rewriters": [
          {
            "name": "querqy",
            "properties": {
              "query": {
                "querqy": {
                  "matching_query": {
                    "query": "notebook"
                  },
                  "query_fields": [
                    "title^3.0",
                    "brand^2.1",
                    "shortSummary"
                  ]
                }
              }
            }
          }
        ]
      }
    }
  }
}
```

Step 3: First rewriter (Querqy) is dispatched and generates the new query (query rewrite)

```json
{
  "query": {
    //todo - put here the query after being re-written by querqy    
  },
  "ext": {
    "context": {
      "params": {
        "query": {
          "bool": {
            "must": [
              {
                "match": {
                  "topic": "hobby"
                }
              }
            ],
            "filter": [
              {
                "range": {
                  "dateField": {
                    "gte": "now-12d",
                    "lte": "now-10d"
                  }
                }
              }
            ]
          }
        }
      },
      "execution": {
        "id": "A1b2c",
        "rankers": [
          {
            "name": "kendra",
            "properties": {
              "title_fields": [
                "title"
              ],
              "body_fields": [
                "published",
                "description"
              ]
            }
          }
        ],
        "rewriters": [
          {
            "name": "querqy",
            "properties": {
              "query": {
                "querqy": {
                  "matching_query": {
                    "query": "notebook"
                  },
                  "query_fields": [
                    "title^3.0",
                    "brand^2.1",
                    "shortSummary"
                  ]
                }
              },
              "info" : { } // additional info that querqy may add after query rewrite
            }
          }
        ]
      }
    }
  }
}
```

**Step 3:** chain flow control has no additional rewrites to dispatch - so it will dispatch to the rankers. The first ranker in the chain will review the context params and take the necessary information .

After it will complete its action it will have the results ranked according to its internal reasoning

```json
{
  "query": {
    //todo - put here the query after being re-written by querqy    
  },
  "ext": {
    "context": {
      "params": {
        "query": {
          "bool": {
            "must": [
              {
                "match": {
                  "topic": "hobby"
                }
              }
            ],
            "filter": [
              {
                "range": {
                  "dateField": {
                    "gte": "now-12d",
                    "lte": "now-10d"
                  }
                }
              }
            ]
          }
        }
      },
      "execution": {
        "id": "A1b2c",
        "rankers": [
          {
            "name": "kendra",
            "properties": {
              "title_fields": [
                "title"
              ],
              "body_fields": [
                "published",
                "description"
              ]
            }
          }
        ],
        "rewriters": [
          {
            "name": "querqy",
            "properties": {
              "query": {
                "querqy": {
                  "matching_query": {
                    "query": "notebook"
                  },
                  "query_fields": [
                    "title^3.0",
                    "brand^2.1",
                    "shortSummary"
                  ]
                }
              },
              "info" : { } 
            }
          }
        ]
      }
    }
  }
}
```

**Response Stage**
* * *
Step 4: Reranking work after the rewrite chain is completed - returning the results to the original calling service

ranker search results json
```json
{
  "took" : 0,
  "timed_out" : false,
   "ext": {  // this ext section is suggested to be added here as part of the results.
     "context": {
       "params": {
         "query": {
           "bool": {
             "must": [
               {
                 "match": {
                   "topic": "hobby"
                 }
               }
             ],
             "filter": [
               {
                 "range": {
                   "dateField": {
                     "gte": "now-12d",
                     "lte": "now-10d"
                   }
                 }
               }
             ]
           }
         }
       },
       "execution": {
         "id": "A1b2c",
         "rankers": [
           {
             "name": "kendra",
             "properties": {
               "title_fields": [
                 "title"
               ],
               "body_fields": [
                 "published",
                 "description"
               ]
             }
           }
         ],
         "rewriters": [
           {
             "name": "querqy",
             "properties": {
               "query": {
                 "querqy": {
                   "matching_query": {
                     "query": "notebook"
                   },
                   "query_fields": [
                     "title^3.0",
                     "brand^2.1",
                     "shortSummary"
                   ]
                 }
               },
               "info" : { }
             }
           }
         ]
       }
     }
   },
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 1.8773359,
    "hits" : [
      {
        "_index" : "employees",
        "_type" : "_doc",
        "_id" : "4",
        "_score" : 1.8773359,
        "_source" : {
          "id" : 4,
          "name" : "Alan Thomas",
          "email" : "athomas2@example.com",
          "gender" : "male",
          "ip_address" : "200.47.210.95",
          "date_of_birth" : "11/12/1985",
          "company" : "Yamaha",
          "position" : "Resources Manager",
          "experience" : 12,
          "country" : "China",
          "phrase" : "Emulation of roots heuristic coherent systems",
          "salary" : 300000
        }
      }
    ]
  }
}
```


 **The response DSL dosn't contain such **ext** part - this RFC is suggesting to add such a section to the results.** 




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[PROPOSAL] Search Semantic Chaining Mechanisms #12

Relevancy rewriters and rankers mechanism

Chain Components

Rewriter Transformations

Rankers Transformations

Configuration

Pluging level configuration

Index level configuration

Query level configuration

Chain Context

Example

Configuration Stage

Query Stage

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development