LangStream supports using OpenSearch as a vector database.
Learn more about performing vector search with OpenSearch in the official documentation
Only OpenSearch 2.x is officially supported.
Create a vector-database
resource in your configuration.yaml file.
A single resource is bound to a single index.
resources:
- type: "vector-database"
name: "OpenSearch"
configuration:
service: "opensearch"
username: "${secrets.opensearch.username}"
password: "${secrets.opensearch.password}"
host: "${secrets.opensearch.host}"
port: "${secrets.opensearch.port}"
index-name: "my-index-000"
resources:
- type: "vector-database"
name: "OpenSearch"
configuration:
service: "opensearch"
username: "${secrets.opensearch.username}"
password: "${secrets.opensearch.password}"
host: "${secrets.opensearch.host}"
region: "${secrets.opensearch.region}"
index-name: "my-index-000"
username
is the AWS Access Keypassword
is the AWS Secret Keyhost
is the endpoint provided by AWS. e.g. for AWS OpenSearch serverless it looks like this: xxxx..aoss.amazonaws.comregion
is the AWS region. It has to match with the one used in the endpoint
To bind the application to the OpenSearch index creation at startup, you must use the opensearch-index
asset type.
You can configure settings
and mappings
as you prefer. Other configuration fields are not supported.
This is an example mixing normal fields with vector fields. The knn
plugin is required in the target OpenSearch instance.
- name: "os-index"
asset-type: "opensearch-index"
creation-mode: create-if-not-exists
config:
datasource: "OpenSearch"
settings: |
{
"index": {
"knn": true,
"knn.algo_param.ef_search": 100
}
}
mappings: |
{
"properties": {
"content": {
"type": "text"
},
"embeddings": {
"type": "knn_vector",
"dimension": 1536
}
}
}
Refer to the settings documentation for the settings
field.
Refer to the mappings documentation for the mappings
field.
Use the query-vector-db
agent with the following parameters to perform searches on the index created above :
- name: "lookup-related-documents"
type: "query-vector-db"
configuration:
datasource: "OpenSearch"
query: |
{
"size": 1,
"query": {
"knn": {
"embeddings": {
"vector": ?,
"k": 1
}
}
}
}
fields:
- "value.question_embeddings"
output-field: "value.related_documents"
You can use the '?' symbol as a placeholder for the fields.
The query
is the body sent to OpenSearch. Refer to the documentation to learn which parameters are supported.
Note that the query will be executed on the configured index. Multi-index queries are not supported, but you can declare multiple datasources and query different indexes in the same application.
The output-field
will contain the query result.
The result is an array with the following elements:
id
: the document IDdocument
: the document sourcescore
: the document scoreindex
: the index name
For example, if you want to keep only one relevant field from the first result, use the compute
agent after the search:
- name: "lookup-related-documents"
type: "query-vector-db"
configuration:
datasource: "OpenSearch"
query: |
{
"size": 1,
"query": {
"match_all": {}
}
}
output-field: "value.related_documents"
only-first: true
- name: "Format response"
type: compute
configuration:
fields:
- name: "value"
type: STRING
expression: "value.related_documents.document.content"
Use the vector-db-sink
agent to index data, with the following parameters:
- name: "Write to Solr"
type: "vector-db-sink"
input: chunks-topic
configuration:
datasource: "OpenSearch"
bulk-parameters:
timeout: 2m
fields:
- name: "id"
expression: "fn:concat(value.filename, value.chunk_id)"
- name: "embeddings"
expression: "fn:toListOfFloat(value.embeddings_vector)"
- name: "text"
expression: "value.text"
All indexing is performed using the Bulk operation.
You can customize the bulk parameters with the bulk-parameters
property.
The request will be flushed depending on flush-interval
and batch-size
parameters.
Check out the full configuration properties in the API Reference page.