Skip to content
This repository was archived by the owner on Aug 16, 2022. It is now read-only.

Commit f7c687f

Browse files
authored
Merge pull request #475 from jmazanec15/hamming-distance
Add separate hamming distance example
2 parents 9f99cfc + 0047f09 commit f7c687f

File tree

1 file changed

+105
-3
lines changed

1 file changed

+105
-3
lines changed

docs/knn/knn-score-script.md

Lines changed: 105 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ has_math: true
1010
# Exact k-NN with Scoring Script
1111
The k-NN plugin implements the Elasticsearch score script plugin that you can use to find the exact k-nearest neighbors to a given query point. Using the k-NN score script, you can apply a filter on an index before executing the nearest neighbor search. This is useful for dynamic search cases where the index body may vary based on other conditions. Because this approach executes a brute force search, it does not scale as well as the [Approximate approach](../approximate-knn). In some cases, it may be better to think about refactoring your workflow or index structure to use the Approximate approach instead of this approach.
1212

13-
## Getting started with the score script
13+
## Getting started with the score script for vectors
1414

1515
Similar to approximate nearest neighbor search, in order to use the score script on a body of vectors, you must first create an index with one or more `knn_vector` fields. If you intend to just use the script score approach (and not the approximate approach) `index.knn` can be set to `false` and `index.knn.space_type` does not need to be set. The space type can be chosen during search. See the [spaces section](#spaces) to see what spaces the k-NN score script suppports. Here is an example that creates an index with two `knn_vector` fields:
1616

@@ -32,8 +32,6 @@ PUT my-knn-index-1
3232
}
3333
```
3434

35-
*Note* -- For binary spaces, such as the Hamming bit space, `type` needs to be either `binary` or `long`. The binary data then needs to be encoded either as a base64 string or as a long (if the data is 64 bits or less).
36-
3735
If you *only* want to use the score script, you can omit `"index.knn": true`. The benefit of this approach is faster indexing speed and lower memory usage, but you lose the ability to perform standard k-NN queries on the index.
3836
{: .tip}
3937

@@ -172,6 +170,110 @@ GET my-knn-index-2/_search
172170
}
173171
```
174172

173+
## Getting started with the score script for binary data
174+
The k-NN score script also allows you to run k-NN search on your binary data with the Hamming distance space.
175+
In order to use Hamming distance, the field of interest must have either a `binary` or `long` field type. If you're using `binary` type, the data must be a base64-encoded string.
176+
177+
This example shows how to use the Hamming distance space with a `binary` field type:
178+
179+
```json
180+
PUT my-index
181+
{
182+
"mappings": {
183+
"properties": {
184+
"my_binary": {
185+
"type": "binary",
186+
"doc_values": true
187+
},
188+
"color": {
189+
"type": "keyword"
190+
}
191+
}
192+
}
193+
}
194+
```
195+
196+
Then add some documents:
197+
198+
```json
199+
POST _bulk
200+
{ "index": { "_index": "my-index", "_id": "1" } }
201+
{ "my_binary": "SGVsbG8gV29ybGQh", "color" : "RED" }
202+
{ "index": { "_index": "my-index", "_id": "2" } }
203+
{ "my_binary": "ay1OTiBjdXN0b20gc2NvcmluZyE=", "color" : "RED" }
204+
{ "index": { "_index": "my-index", "_id": "3" } }
205+
{ "my_binary": "V2VsY29tZSB0byBrLU5O", "color" : "RED" }
206+
{ "index": { "_index": "my-index", "_id": "4" } }
207+
{ "my_binary": "SSBob3BlIHRoaXMgaXMgaGVscGZ1bA==", "color" : "BLUE" }
208+
{ "index": { "_index": "my-index", "_id": "5" } }
209+
{ "my_binary": "QSBjb3VwbGUgbW9yZSBkb2NzLi4u", "color" : "BLUE" }
210+
{ "index": { "_index": "my-index", "_id": "6" } }
211+
{ "my_binary": "TGFzdCBvbmUh", "color" : "BLUE" }
212+
213+
```
214+
215+
Finally, use the `script_score` query to pre-filter your documents before identifying nearest neighbors:
216+
217+
```json
218+
GET my-index/_search
219+
{
220+
"size": 2,
221+
"query": {
222+
"script_score": {
223+
"query": {
224+
"bool": {
225+
"filter": {
226+
"term": {
227+
"color": "BLUE"
228+
}
229+
}
230+
}
231+
},
232+
"script": {
233+
"lang": "knn",
234+
"source": "knn_score",
235+
"params": {
236+
"field": "my_binary",
237+
"query_value": "U29tZXRoaW5nIEltIGxvb2tpbmcgZm9y",
238+
"space_type": "hammingbit"
239+
}
240+
}
241+
}
242+
}
243+
}
244+
```
245+
246+
Similarly, you can encode your data with the `long` field and run a search:
247+
248+
```json
249+
GET my-long-index/_search
250+
{
251+
"size": 2,
252+
"query": {
253+
"script_score": {
254+
"query": {
255+
"bool": {
256+
"filter": {
257+
"term": {
258+
"color": "BLUE"
259+
}
260+
}
261+
}
262+
},
263+
"script": {
264+
"lang": "knn",
265+
"source": "knn_score",
266+
"params": {
267+
"field": "my_long",
268+
"query_value": 23,
269+
"space_type": "hammingbit"
270+
}
271+
}
272+
}
273+
}
274+
}
275+
```
276+
175277
## Spaces
176278

177279
A space corresponds to the function used to measure the distance between 2 points in order to determine the k-nearest neighbors. From the k-NN perspective, a lower score equates to a closer and better result. This is the opposite of how Elasticsearch scores results, where a greater score equates to a better result. We include the conversions to Elasticsearch scores in the table below:

0 commit comments

Comments
 (0)