-
Notifications
You must be signed in to change notification settings - Fork 187
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Added k-nn user guide and samples. (#449)
* Added k-nn user guide and samples. Signed-off-by: dblock <dblock@amazon.com> * Added async samples. Signed-off-by: dblock <dblock@amazon.com> * Renamed Lucene Filters with Efficient Filters. Signed-off-by: dblock <dblock@amazon.com> * Fixing TOC from Lucene filters to Efficient filters Signed-off-by: Vacha Shah <vachshah@amazon.com> --------- Signed-off-by: dblock <dblock@amazon.com> Signed-off-by: Vacha Shah <vachshah@amazon.com> Co-authored-by: Vacha Shah <vachshah@amazon.com>
- Loading branch information
Showing
12 changed files
with
1,265 additions
and
10 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,152 @@ | ||
- [Asynchronous I/O](#asynchronous-io) | ||
- [Setup](#setup) | ||
- [Async Loop](#async-loop) | ||
- [Connect to OpenSearch](#connect-to-opensearch) | ||
- [Create an Index](#create-an-index) | ||
- [Index Documents](#index-documents) | ||
- [Refresh the Index](#refresh-the-index) | ||
- [Search](#search) | ||
- [Delete Documents](#delete-documents) | ||
- [Delete the Index](#delete-the-index) | ||
|
||
# Asynchronous I/O | ||
|
||
This client supports asynchronous I/O that improves performance and increases throughput. See [hello-async.py](../samples/hello/hello-async.py) or [knn-async-basics.py](../samples/knn/knn-async-basics.py) for a working asynchronous sample. | ||
|
||
## Setup | ||
|
||
To add the async client to your project, install it using [pip](https://pip.pypa.io/): | ||
|
||
```bash | ||
pip install opensearch-py[async] | ||
``` | ||
|
||
In general, we recommend using a package manager, such as [poetry](https://python-poetry.org/docs/), for your projects. This is the package manager used for [samples](../samples). The following example includes `opensearch-py[async]` in `pyproject.toml`. | ||
|
||
```toml | ||
[tool.poetry.dependencies] | ||
opensearch-py = { path = "../", extras=["async"] } | ||
``` | ||
|
||
## Async Loop | ||
|
||
```python | ||
import asyncio | ||
|
||
async def main(): | ||
client = AsyncOpenSearch(...) | ||
try: | ||
# your code here | ||
finally: | ||
client.close() | ||
|
||
if __name__ == "__main__": | ||
loop = asyncio.new_event_loop() | ||
asyncio.set_event_loop(loop) | ||
loop.run_until_complete(main()) | ||
loop.close() | ||
``` | ||
|
||
## Connect to OpenSearch | ||
|
||
```python | ||
host = 'localhost' | ||
port = 9200 | ||
auth = ('admin', 'admin') # For testing only. Don't store credentials in code. | ||
|
||
client = AsyncOpenSearch( | ||
hosts = [{'host': host, 'port': port}], | ||
http_auth = auth, | ||
use_ssl = True, | ||
verify_certs = False, | ||
ssl_show_warn = False | ||
) | ||
|
||
info = await client.info() | ||
print(f"Welcome to {info['version']['distribution']} {info['version']['number']}!") | ||
``` | ||
|
||
## Create an Index | ||
|
||
```python | ||
index_name = 'test-index' | ||
|
||
index_body = { | ||
'settings': { | ||
'index': { | ||
'number_of_shards': 4 | ||
} | ||
} | ||
} | ||
|
||
if not await client.indices.exists(index=index_name): | ||
await client.indices.create( | ||
index_name, | ||
body=index_body | ||
) | ||
``` | ||
|
||
## Index Documents | ||
|
||
```python | ||
await asyncio.gather(*[ | ||
client.index( | ||
index = index_name, | ||
body = { | ||
'title': f"Moneyball {i}", | ||
'director': 'Bennett Miller', | ||
'year': '2011' | ||
}, | ||
id = i | ||
) for i in range(10) | ||
]) | ||
``` | ||
|
||
## Refresh the Index | ||
|
||
```python | ||
await client.indices.refresh(index=index_name) | ||
``` | ||
|
||
## Search | ||
|
||
```python | ||
q = 'miller' | ||
|
||
query = { | ||
'size': 5, | ||
'query': { | ||
'multi_match': { | ||
'query': q, | ||
'fields': ['title^2', 'director'] | ||
} | ||
} | ||
} | ||
|
||
results = await client.search( | ||
body = query, | ||
index = index_name | ||
) | ||
|
||
for hit in results["hits"]["hits"]: | ||
print(hit) | ||
``` | ||
|
||
## Delete Documents | ||
|
||
```python | ||
await asyncio.gather(*[ | ||
client.delete( | ||
index = index_name, | ||
id = i | ||
) for i in range(10) | ||
]) | ||
``` | ||
|
||
## Delete the Index | ||
|
||
```python | ||
await client.indices.delete( | ||
index = index_name | ||
) | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,117 @@ | ||
- [k-NN Plugin](#k-nn-plugin) | ||
- [Basic Approximate k-NN](#basic-approximate-k-nn) | ||
- [Create an Index](#create-an-index) | ||
- [Index Vectors](#index-vectors) | ||
- [Search for Nearest Neighbors](#search-for-nearest-neighbors) | ||
- [Approximate k-NN with a Boolean Filter](#approximate-k-nn-with-a-boolean-filter) | ||
- [Approximate k-NN with an Efficient Filter](#approximate-k-nn-with-an-efficient-filter) | ||
|
||
# k-NN Plugin | ||
|
||
Short for k-nearest neighbors, the k-NN plugin enables users to search for the k-nearest neighbors to a query point across an index of vectors. See [documentation](https://opensearch.org/docs/latest/search-plugins/knn/index/) for more information. | ||
|
||
## Basic Approximate k-NN | ||
|
||
In the following example we create a 5-dimensional k-NN index with random data. You can find a synchronous version of this working sample in [samples/knn/knn-basics.py](../../samples/knn/knn-basics.py) and an asynchronous one in [samples/knn/knn-async-basics.py](../../samples/knn/knn-async-basics.py). | ||
|
||
```bash | ||
$ poetry run knn/knn-basics.py | ||
|
||
Searching for [0.61, 0.05, 0.16, 0.75, 0.49] ... | ||
{'_index': 'my-index', '_id': '3', '_score': 0.9252405, '_source': {'values': [0.64, 0.3, 0.27, 0.68, 0.51]}} | ||
{'_index': 'my-index', '_id': '4', '_score': 0.802375, '_source': {'values': [0.49, 0.39, 0.21, 0.42, 0.42]}} | ||
{'_index': 'my-index', '_id': '8', '_score': 0.7826564, '_source': {'values': [0.33, 0.33, 0.42, 0.97, 0.56]}} | ||
``` | ||
|
||
### Create an Index | ||
|
||
```python | ||
dimensions = 5 | ||
client.indices.create(index_name, | ||
body={ | ||
"settings":{ | ||
"index.knn": True | ||
}, | ||
"mappings":{ | ||
"properties": { | ||
"values": { | ||
"type": "knn_vector", | ||
"dimension": dimensions | ||
}, | ||
} | ||
} | ||
} | ||
) | ||
``` | ||
|
||
### Index Vectors | ||
|
||
Create 10 random vectors and insert them using the bulk API. | ||
|
||
```python | ||
vectors = [] | ||
for i in range(10): | ||
vec = [] | ||
for j in range(dimensions): | ||
vec.append(round(random.uniform(0, 1), 2)) | ||
|
||
vectors.append({ | ||
"_index": index_name, | ||
"_id": i, | ||
"values": vec, | ||
}) | ||
|
||
helpers.bulk(client, vectors) | ||
|
||
client.indices.refresh(index=index_name) | ||
``` | ||
|
||
### Search for Nearest Neighbors | ||
|
||
Create a random vector of the same size and search for its nearest neighbors. | ||
|
||
```python | ||
vec = [] | ||
for j in range(dimensions): | ||
vec.append(round(random.uniform(0, 1), 2)) | ||
|
||
search_query = { | ||
"query": { | ||
"knn": { | ||
"values": { | ||
"vector": vec, | ||
"k": 3 | ||
} | ||
} | ||
} | ||
} | ||
|
||
results = client.search(index=index_name, body=search_query) | ||
for hit in results["hits"]["hits"]: | ||
print(hit) | ||
``` | ||
|
||
## Approximate k-NN with a Boolean Filter | ||
|
||
In [the boolean-filter.py sample](../../samples/knn/knn-boolean-filter.py) we create a 5-dimensional k-NN index with random data and a `metadata` field that contains a book genre (e.g. `fiction`). The search query is a k-NN search filtered by genre. The filter clause is outside the k-NN query clause and is applied after the k-NN search. | ||
|
||
```bash | ||
$ poetry run knn/knn-boolean-filter.py | ||
|
||
Searching for [0.08, 0.42, 0.04, 0.76, 0.41] with the 'romance' genre ... | ||
|
||
{'_index': 'my-index', '_id': '445', '_score': 0.95886475, '_source': {'values': [0.2, 0.54, 0.08, 0.87, 0.43], 'metadata': {'genre': 'romance'}}} | ||
{'_index': 'my-index', '_id': '2816', '_score': 0.95256233, '_source': {'values': [0.22, 0.36, 0.01, 0.75, 0.57], 'metadata': {'genre': 'romance'}}} | ||
``` | ||
|
||
## Approximate k-NN with an Efficient Filter | ||
|
||
In [the lucene-filter.py sample](../../samples/knn/knn-efficient-filter.py) we implement the example in [the k-NN documentation](https://opensearch.org/docs/latest/search-plugins/knn/filter-search-knn/), which creates an index that uses the Lucene engine and HNSW as the method in the mapping, containing hotel location and parking data, then search for the top three hotels near the location with the coordinates `[5, 4]` that are rated between 8 and 10, inclusive, and provide parking. | ||
|
||
```bash | ||
$ poetry run knn/knn-efficient-filter.py | ||
|
||
{'_index': 'hotels-index', '_id': '3', '_score': 0.72992706, '_source': {'location': [4.9, 3.4], 'parking': 'true', 'rating': 9}} | ||
{'_index': 'hotels-index', '_id': '6', '_score': 0.3012048, '_source': {'location': [6.4, 3.4], 'parking': 'true', 'rating': 9}} | ||
{'_index': 'hotels-index', '_id': '5', '_score': 0.24154587, '_source': {'location': [3.3, 4.5], 'parking': 'true', 'rating': 8}} | ||
``` |
Oops, something went wrong.