Skip to content

Commit f43dcfa

Browse files
brianf-awskolchfa-awsnatebower
authored
Adds documentation about byField rerank processor (#8593)
* Adds documentation about byField rerank processor Signed-off-by: Brian Flores <iflorbri@amazon.com> * Polishes example and fixes spelling mistakes Signed-off-by: Brian Flores <iflorbri@amazon.com> * Apply suggestions from code review Co-authored-by: Nathan Bower <nbower@amazon.com> Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> * Doc review Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> * Apply suggestions from code review Co-authored-by: Nathan Bower <nbower@amazon.com> Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> * Polish example to work with curl request If you use postman or dev tools it wont work since there are qoutes in the index this had to be changed. Also it had to be made clear where the search pipeline would be applied in doing a search Signed-off-by: Brian Flores <iflorbri@amazon.com> * added book-index endpoint to rerank-processor.md Signed-off-by: Brian Flores <iflorbri@amazon.com> --------- Signed-off-by: Brian Flores <iflorbri@amazon.com> Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Co-authored-by: Nathan Bower <nbower@amazon.com> Co-authored-by: Fanit Kolchina <kolchfa@amazon.com>
1 parent a500ef7 commit f43dcfa

File tree

4 files changed

+438
-127
lines changed

4 files changed

+438
-127
lines changed

_search-plugins/search-pipelines/rerank-processor.md

Lines changed: 97 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -11,33 +11,49 @@ grand_parent: Search pipelines
1111
Introduced 2.12
1212
{: .label .label-purple }
1313

14-
The `rerank` search request processor intercepts search results and passes them to a cross-encoder model to be reranked. The model reranks the results, taking into account the scoring context. Then the processor orders documents in the search results based on their new scores.
14+
The `rerank` search response processor intercepts and reranks search results. The processor orders documents in the search results based on their new scores.
15+
16+
OpenSearch supports the following rerank types.
17+
18+
Type | Description | Earliest available version
19+
:--- | :--- | :---
20+
[`ml_opensearch`](#the-ml_opensearch-rerank-type) | Applies an OpenSearch-provided cross-encoder model. | 2.12
21+
[`by_field`](#the-by_field-rerank-type) | Applies reranking based on a user-provided field. | 2.18
1522

1623
## Request body fields
1724

1825
The following table lists all available request fields.
1926

20-
Field | Data type | Description
21-
:--- | :--- | :---
22-
`<reranker_type>` | Object | The reranker type provides the rerank processor with static information needed across all reranking calls. Required.
23-
`context` | Object | Provides the rerank processor with information necessary for generating reranking context at query time.
24-
`tag` | String | The processor's identifier. Optional.
25-
`description` | String | A description of the processor. Optional.
26-
`ignore_failure` | Boolean | If `true`, OpenSearch [ignores any failure]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/creating-search-pipeline/#ignoring-processor-failures) of this processor and continues to run the remaining processors in the search pipeline. Optional. Default is `false`.
27+
Field | Data type | Required/Optional | Description
28+
:--- | :--- | :--- | :---
29+
`<rerank_type>` | Object | Required | The rerank type for document reranking. Valid values are `ml-opensearch` and `by_field`.
30+
`context` | Object | Required for the `ml_opensearch` rerank type. Optional and does not affect the results for the `by_field` rerank type. | Provides the `rerank` processor with information necessary for reranking at query time.
31+
`tag` | String | Optional | The processor's identifier.
32+
`description` | String | Optional | A description of the processor.
33+
`ignore_failure` | Boolean | Optional | If `true`, OpenSearch [ignores any failure]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/creating-search-pipeline/#ignoring-processor-failures) of this processor and continues to run the remaining processors in the search pipeline. Default is `false`.
34+
35+
<!-- vale off -->
36+
## The ml_opensearch rerank type
37+
<!-- vale on -->
38+
Introduced 2.12
39+
{: .label .label-purple }
2740

28-
### The `ml_opensearch` reranker type
41+
To rerank results using a cross-encoder model, specify the `ml_opensearch` rerank type.
2942

30-
The `ml_opensearch` reranker type is designed to work with the cross-encoder model provided by OpenSearch. For this reranker type, specify the following fields.
43+
### Prerequisite
44+
45+
Before using the `ml_opensearch` rerank type, you must configure a cross-encoder model. For information about using an OpenSearch-provided model, see [Cross-encoder models]({{site.url}}{{site.baseurl}}/ml-commons-plugin/pretrained-models/#cross-encoder-models). For information about using a custom model, see [Custom local models]({{site.url}}{{site.baseurl}}/ml-commons-plugin/custom-local-models/).
46+
47+
The `ml_opensearch` rerank type supports the following fields. All fields are required.
3148

3249
Field | Data type | Description
3350
:--- | :--- | :---
34-
`ml_opensearch` | Object | Provides the rerank processor with model information. Required.
35-
`ml_opensearch.model_id` | String | The model ID for the cross-encoder model. Required. For more information, see [Using ML models]({{site.url}}{{site.baseurl}}/ml-commons-plugin/using-ml-models/).
36-
`context.document_fields` | Array | An array of document fields that specifies the fields from which to retrieve context for the cross-encoder model. Required.
51+
`ml_opensearch.model_id` | String | The model ID of the cross-encoder model for reranking. For more information, see [Using ML models]({{site.url}}{{site.baseurl}}/ml-commons-plugin/using-ml-models/).
52+
`context.document_fields` | Array | An array of document fields that specifies the fields from which to retrieve context for the cross-encoder model.
3753

38-
## Example
54+
### Example
3955

40-
The following example demonstrates using a search pipeline with a `rerank` processor.
56+
The following example demonstrates using a search pipeline with a `rerank` processor implemented using the `ml_opensearch` rerank type. For a complete example, see [Reranking using a cross-encoder model]({{site.url}}{{site.baseurl}}/search-plugins/search-relevance/rerank-cross-encoder/).
4157

4258
### Creating a search pipeline
4359

@@ -108,11 +124,71 @@ POST /_search?search_pipeline=rerank_pipeline
108124
```
109125
{% include copy-curl.html %}
110126

111-
The `query_context` object contains the following fields.
127+
The `query_context` object contains the following fields. You must provide either `query_text` or `query_text_path` but cannot provide both simultaneously.
128+
129+
Field name | Required/Optional | Description
130+
:--- | :--- | :---
131+
`query_text` | Exactly one of `query_text` or `query_text_path` is required. | The natural language text of the question that you want to use to rerank the search results.
132+
`query_text_path` | Exactly one of `query_text` or `query_text_path` is required. | The full JSON path to the text of the question that you want to use to rerank the search results. The maximum number of characters allowed in the path is `1000`.
133+
134+
135+
<!-- vale off -->
136+
## The by_field rerank type
137+
<!-- vale on -->
138+
Introduced 2.18
139+
{: .label .label-purple }
140+
141+
To rerank results by a document field, specify the `by_field` rerank type.
142+
143+
The `by_field` object supports the following fields.
144+
145+
Field | Data type | Required/Optional | Description
146+
:--- | :--- | :--- | :---
147+
`target_field` | String | Required | Specifies the field name or a dot path to the field containing the score to use for reranking.
148+
`remove_target_field` | Boolean | Optional | If `true`, the response does not include the `target_field` used to perform reranking. Default is `false`.
149+
`keep_previous_score` | Boolean | Optional | If `true`, the response includes a `previous_score` field, which contains the score calculated before reranking and can be useful when debugging. Default is `false`.
150+
151+
### Example
152+
153+
The following example demonstrates using a search pipeline with a `rerank` processor implemented using the `by_field` rerank type. For a complete example, see [Reranking by a document field]({{site.url}}{{site.baseurl}}/search-plugins/search-relevance/rerank-by-field/).
154+
155+
### Creating a search pipeline
156+
157+
The following request creates a search pipeline with a `by_field` rerank type response processor that ranks the documents by the `reviews.stars` field and specifies to return the original document score:
158+
159+
```json
160+
PUT /_search/pipeline/rerank_byfield_pipeline
161+
{
162+
"response_processors": [
163+
{
164+
"rerank": {
165+
"by_field": {
166+
"target_field": "reviews.stars",
167+
"keep_previous_score" : true
168+
}
169+
}
170+
}
171+
]
172+
}
173+
```
174+
{% include copy-curl.html %}
175+
176+
### Using the search pipeline
177+
178+
To apply the search pipeline to a query, provide the search pipeline name in the query parameter:
179+
180+
```json
181+
POST /book-index/_search?search_pipeline=rerank_byfield_pipeline
182+
{
183+
"query": {
184+
"match_all": {}
185+
}
186+
}
187+
```
188+
{% include copy-curl.html %}
112189

113-
Field name | Description
114-
:--- | :---
115-
`query_text` | The natural language text of the question that you want to use to rerank the search results. Either `query_text` or `query_text_path` (not both) is required.
116-
`query_text_path` | The full JSON path to the text of the question that you want to use to rerank the search results. Either `query_text` or `query_text_path` (not both) is required. The maximum number of characters in the path is `1000`.
190+
## Next steps
117191

118-
For more information about setting up reranking, see [Reranking search results]({{site.url}}{{site.baseurl}}/search-plugins/search-relevance/reranking-search-results/).
192+
- Learn more about [reranking search results]({{site.url}}{{site.baseurl}}/search-plugins/search-relevance/reranking-search-results/).
193+
- See a complete example of [reranking using a cross-encoder model]({{site.url}}{{site.baseurl}}/search-plugins/search-relevance/rerank-cross-encoder/).
194+
- See a complete example of [reranking by a document field]({{site.url}}{{site.baseurl}}/search-plugins/search-relevance/rerank-by-field/).
Lines changed: 208 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,208 @@
1+
---
2+
layout: default
3+
title: Reranking by a field
4+
parent: Reranking search results
5+
grand_parent: Search relevance
6+
has_children: false
7+
nav_order: 20
8+
---
9+
10+
# Reranking search results by a field
11+
Introduced 2.18
12+
{: .label .label-purple }
13+
14+
You can use a `by_field` rerank type to rerank search results by a document field. Reranking search results by a field is useful if a model has already run and produced a numerical score for your documents or if a previous search response processor was applied and you want to rerank documents differently based on an aggregated field.
15+
16+
To implement reranking, you need to configure a [search pipeline]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/index/) that runs at search time. The search pipeline intercepts search results and applies the [`rerank` processor]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/rerank-processor/) to them. The `rerank` processor evaluates the search results and sorts them based on the new scores obtained from a document field.
17+
18+
## Running a search with reranking
19+
20+
To run a search with reranking, follow these steps:
21+
22+
1. [Configure a search pipeline](#step-1-configure-a-search-pipeline).
23+
1. [Create an index for ingestion](#step-2-create-an-index-for-ingestion).
24+
1. [Ingest documents into the index](#step-3-ingest-documents-into-the-index).
25+
1. [Search using reranking](#step-4-search-using-reranking).
26+
27+
## Step 1: Configure a search pipeline
28+
29+
Configure a search pipeline with a [`rerank` processor]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/rerank-processor/) and specify the `by_field` rerank type. The pipeline sorts by the `reviews.stars` field (specified by a complete dot path to the field) and returns the original query scores for all documents along with their new scores:
30+
31+
```json
32+
PUT /_search/pipeline/rerank_byfield_pipeline
33+
{
34+
"response_processors": [
35+
{
36+
"rerank": {
37+
"by_field": {
38+
"target_field": "reviews.stars",
39+
"keep_previous_score" : true
40+
}
41+
}
42+
}
43+
]
44+
}
45+
```
46+
{% include copy-curl.html %}
47+
48+
For more information about the request fields, see [Request fields]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/rerank-processor/#request-body-fields).
49+
50+
## Step 2: Create an index for ingestion
51+
52+
In order to use the `rerank` processor defined in your pipeline, create an OpenSearch index and add the pipeline created in the previous step as the default pipeline:
53+
54+
```json
55+
PUT /book-index
56+
{
57+
"settings": {
58+
"index.search.default_pipeline" : "rerank_byfield_pipeline"
59+
},
60+
"mappings": {
61+
"properties": {
62+
"title": {
63+
"type": "text"
64+
},
65+
"author": {
66+
"type": "text"
67+
},
68+
"genre": {
69+
"type": "keyword"
70+
},
71+
"reviews": {
72+
"properties": {
73+
"stars": {
74+
"type": "float"
75+
}
76+
}
77+
},
78+
"description": {
79+
"type": "text"
80+
}
81+
}
82+
}
83+
}
84+
```
85+
{% include copy-curl.html %}
86+
87+
## Step 3: Ingest documents into the index
88+
89+
To ingest documents into the index created in the previous step, send the following bulk request:
90+
91+
```json
92+
POST /_bulk
93+
{ "index": { "_index": "book-index", "_id": "1" } }
94+
{ "title": "The Lost City", "author": "Jane Doe", "genre": "Adventure Fiction", "reviews": { "stars": 4.2 }, "description": "An exhilarating journey through a hidden civilization in the Amazon rainforest." }
95+
{ "index": { "_index": "book-index", "_id": "2" } }
96+
{ "title": "Whispers of the Past", "author": "John Smith", "genre": "Historical Mystery", "reviews": { "stars": 4.7 }, "description": "A gripping tale set in Victorian England, unraveling a century-old mystery." }
97+
{ "index": { "_index": "book-index", "_id": "3" } }
98+
{ "title": "Starlit Dreams", "author": "Emily Clark", "genre": "Science Fiction", "reviews": { "stars": 4.5 }, "description": "In a future where dreams can be shared, one girl discovers her imaginations power." }
99+
{ "index": { "_index": "book-index", "_id": "4" } }
100+
{ "title": "The Enchanted Garden", "author": "Alice Green", "genre": "Fantasy", "reviews": { "stars": 4.8 }, "description": "A magical garden holds the key to a young girls destiny and friendship." }
101+
102+
```
103+
{% include copy-curl.html %}
104+
105+
## Step 4: Search using reranking
106+
107+
As an example, run a `match_all` query on your index:
108+
109+
```json
110+
POST /book-index/_search
111+
{
112+
"query": {
113+
"match_all": {}
114+
}
115+
}
116+
```
117+
{% include copy-curl.html %}
118+
119+
The response contains documents sorted in descending order based on the `reviews.starts` field. Each document contains the original query score in the `previous_score` field:
120+
121+
```json
122+
{
123+
"took": 33,
124+
"timed_out": false,
125+
"_shards": {
126+
"total": 1,
127+
"successful": 1,
128+
"skipped": 0,
129+
"failed": 0
130+
},
131+
"hits": {
132+
"total": {
133+
"value": 4,
134+
"relation": "eq"
135+
},
136+
"max_score": 4.8,
137+
"hits": [
138+
{
139+
"_index": "book-index",
140+
"_id": "4",
141+
"_score": 4.8,
142+
"_source": {
143+
"reviews": {
144+
"stars": 4.8
145+
},
146+
"author": "Alice Green",
147+
"genre": "Fantasy",
148+
"description": "A magical garden holds the key to a young girls destiny and friendship.",
149+
"previous_score": 1,
150+
"title": "The Enchanted Garden"
151+
}
152+
},
153+
{
154+
"_index": "book-index",
155+
"_id": "2",
156+
"_score": 4.7,
157+
"_source": {
158+
"reviews": {
159+
"stars": 4.7
160+
},
161+
"author": "John Smith",
162+
"genre": "Historical Mystery",
163+
"description": "A gripping tale set in Victorian England, unraveling a century-old mystery.",
164+
"previous_score": 1,
165+
"title": "Whispers of the Past"
166+
}
167+
},
168+
{
169+
"_index": "book-index",
170+
"_id": "3",
171+
"_score": 4.5,
172+
"_source": {
173+
"reviews": {
174+
"stars": 4.5
175+
},
176+
"author": "Emily Clark",
177+
"genre": "Science Fiction",
178+
"description": "In a future where dreams can be shared, one girl discovers her imaginations power.",
179+
"previous_score": 1,
180+
"title": "Starlit Dreams"
181+
}
182+
},
183+
{
184+
"_index": "book-index",
185+
"_id": "1",
186+
"_score": 4.2,
187+
"_source": {
188+
"reviews": {
189+
"stars": 4.2
190+
},
191+
"author": "Jane Doe",
192+
"genre": "Adventure Fiction",
193+
"description": "An exhilarating journey through a hidden civilization in the Amazon rainforest.",
194+
"previous_score": 1,
195+
"title": "The Lost City"
196+
}
197+
}
198+
]
199+
},
200+
"profile": {
201+
"shards": []
202+
}
203+
}
204+
```
205+
206+
## Next steps
207+
208+
- Learn more about the [`rerank` processor]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/rerank-processor/).

0 commit comments

Comments
 (0)