Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
208 changes: 198 additions & 10 deletions docs/ai/conceptual/vector-databases.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,14 @@
---
title: "Using Vector Databases to Extend LLM Capabilities"
description: "Learn how vector databases extend LLM capabilities by storing and processing embeddings in .NET."
title: "Vector databases for .NET AI apps"
description: "Learn how vector databases extend LLM capabilities by storing and processing embeddings in .NET, and how to use Microsoft.Extensions.VectorData to build semantic search features."
ms.topic: concept-article
ms.date: 05/29/2025
ms.date: 02/24/2026
ai-usage: ai-assisted
---

# Vector databases for .NET + AI
# Vector databases for .NET AI apps

Vector databases are designed to store and manage vector [embeddings](embeddings.md). Embeddings are numeric representations of non-numeric data that preserve semantic meaning. Words, documents, images, audio, and other types of data can all be vectorized. You can use embeddings to help an AI model understand the meaning of inputs so that it can perform comparisons and transformations, such as summarizing text, finding contextually related data, or creating images from text descriptions.
Vector databases store and manage vector [*embeddings*](embeddings.md). Embeddings are numeric representations of data that preserve semantic meaning. Words, documents, images, audio, and other types of data can all be vectorized. You can use embeddings to help an AI model understand the meaning of inputs so that it can perform comparisons and transformations, such as summarizing text, finding contextually related data, or creating images from text descriptions.

For example, you can use a vector database to:

Expand All @@ -19,28 +20,215 @@ For example, you can use a vector database to:

## Understand vector search

Vector databases provide vector search capabilities to find similar items based on their data characteristics rather than by exact matches on a property field. Vector search works by analyzing the vector representations of your data that you created using an AI embedding model such the [Azure OpenAI embedding models](/azure/ai-services/openai/concepts/models#embeddings-models). The search process measures the distance between the data vectors and your query vector. The data vectors that are closest to your query vector are the ones that are found to be most similar semantically.
Vector databases provide vector search capabilities to find similar items based on their data characteristics rather than by exact matches on a property field. Vector search works by analyzing the vector representations of your data that you created using an AI embedding model such as the [Azure OpenAI embedding models](/azure/ai-services/openai/concepts/models#embeddings-models). The search process measures the distance between the data vectors and your query vector. The data vectors that are closest to your query vector are the ones that are most similar semantically.

Some services such as [Azure Cosmos DB for MongoDB vCore](/azure/cosmos-db/mongodb/vcore/vector-search) provide native vector search capabilities for your data. Other databases can be enhanced with vector search by indexing the stored data using a service such as Azure AI Search, which can scan and index your data to provide vector search capabilities.

## Vector search workflows with .NET and OpenAI

Vector databases and their search features are especially useful in [RAG pattern](rag.md) workflows with Azure OpenAI. This pattern allows you to augment or enhance your AI model with additional semantically rich knowledge of your data. A common AI workflow using vector databases might include the following steps:
Vector databases and their search features are especially useful in [RAG pattern](rag.md) workflows with Azure OpenAI. This pattern allows you to augment or enhance your AI model with additional semantically rich knowledge of your data. A common AI workflow using vector databases includes the following steps:

1. Create embeddings for your data using an OpenAI embedding model.
1. Store and index the embeddings in a vector database or search service.
1. Convert user prompts from your application to embeddings.
1. Run a vector search across your data, comparing the user prompt embedding to the embeddings your database.
1. Use a language model such as GPT-35 or GPT-4 to assemble a user friendly completion from the vector search results.
1. Run a vector search across your data, comparing the user prompt embedding to the embeddings in your database.
1. Use a language model such as GPT-4 to assemble a user-friendly completion from the vector search results.

Visit the [Implement Azure OpenAI with RAG using vector search in a .NET app](../tutorials/tutorial-ai-vector-search.md) tutorial for a hands-on example of this flow.

Other benefits of the RAG pattern include:

- Generate contextually relevant and accurate responses to user prompts from AI models.
- Overcome LLM tokens limits - the heavy lifting is done through the database vector search.
- Overcome LLM token limitsthe heavy lifting is done through the database vector search.
- Reduce the costs from frequent fine-tuning on updated data.

## The Microsoft.Extensions.VectorData library

The [📦 Microsoft.Extensions.VectorData.Abstractions](https://www.nuget.org/packages/Microsoft.Extensions.VectorData.Abstractions) package provides a unified layer of abstractions for interacting with vector stores in .NET. These abstractions let you write code against a single API and swap out the underlying vector store with minimal changes to your application.

The library provides the following key capabilities:

- **Unified data model**: Define your data model once using .NET attributes and use it across any supported vector store.
- **CRUD operations**: Create, read, update, and delete records in a vector store.
- **Vector and text search**: Query records by semantic similarity using vector search, or by keyword using text search.
- **Collection management**: Create, list, and delete collections (tables or indices) in a vector store.

### Define a data model

To store records in a vector store, define a .NET class and annotate its properties with the following attributes from the <xref:Microsoft.Extensions.VectorData> namespace:

- <xref:Microsoft.Extensions.VectorData.VectorStoreKeyAttribute>: Marks the property that uniquely identifies each record (the primary key).
- <xref:Microsoft.Extensions.VectorData.VectorStoreDataAttribute>: Marks properties that contain regular data (strings, numbers, and so on) to store and optionally filter on.
- <xref:Microsoft.Extensions.VectorData.VectorStoreVectorAttribute>: Marks properties that store embedding vectors. You specify the number of dimensions and the distance function to use for similarity comparisons.

The following example defines a data model for a hotel:

```csharp
using Microsoft.Extensions.VectorData;

public class Hotel
{
[VectorStoreKey]
public int HotelId { get; set; }

[VectorStoreData(IsIndexed = true)]
public string? HotelName { get; set; }

[VectorStoreData(IsFullTextIndexed = true)]
public string? Description { get; set; }

[VectorStoreVector(Dimensions: 1536, DistanceFunction = DistanceFunction.CosineSimilarity, IndexKind = IndexKind.Hnsw)]
public ReadOnlyMemory<float>? DescriptionEmbedding { get; set; }

[VectorStoreData(IsIndexed = true)]
public string[]? Tags { get; set; }
}
```

The `Dimensions` parameter must match the output size of the embedding model you use. For example, `text-embedding-3-small` produces 1536-dimensional vectors, while `text-embedding-3-large` produces 3072-dimensional vectors.

### Automatic embedding generation

Instead of generating embeddings yourself before each upsert, you can declare your vector property as a `string` type and configure an `IEmbeddingGenerator` on the vector store. The store then generates the embedding automatically when you upsert a record, using the string value you provide:

```csharp
public class FinanceInfo
{
[VectorStoreKey]
public int Key { get; set; }

[VectorStoreData]
public string Text { get; set; } = "";

// Declare as string to enable automatic embedding generation on upsert.
[VectorStoreVector(1536)]
public string EmbeddingSource { get; set; } = "";
}
```

You can configure the `IEmbeddingGenerator` at the vector store or collection level, or on individual vector properties. With auto-embedding, you can also pass a `string` directly to `SearchAsync` instead of a precomputed vector—the store generates the search embedding for you. For more information, see [Use built-in embedding generation](../how-to/use-vector-stores.md#use-built-in-embedding-generation).

### Key abstractions

The `Microsoft.Extensions.VectorData.Abstractions` library exposes the following main abstract classes:

- <xref:Microsoft.Extensions.VectorData.VectorStore>: The top-level class for a vector database. Use it to retrieve and manage collections.
- <xref:Microsoft.Extensions.VectorData.VectorStoreCollection`2>: Represents a named collection of records within a vector store. Use it to perform CRUD and search operations. Also inherits from `IVectorSearchable<TRecord>`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

technical nit:

Suggested change
- <xref:Microsoft.Extensions.VectorData.VectorStoreCollection`2>: Represents a named collection of records within a vector store. Use it to perform CRUD and search operations. Also inherits from `IVectorSearchable<TRecord>`.
- <xref:Microsoft.Extensions.VectorData.VectorStoreCollection`2>: Represents a named collection of records within a vector store. Use it to perform CRUD and search operations. It implements `IVectorSearchable<TRecord>`.

- `IKeywordHybridSearchable<TRecord>`: Implemented by collections that support hybrid search combining vector similarity with keyword matching.

The following example shows how to get a collection from a vector store and upsert (insert or update) records:

```csharp
// Get or create a collection named "hotels".
VectorStoreCollection<int, Hotel> collection =
vectorStore.GetCollection<int, Hotel>("hotels");

// Ensure the collection exists in the database.
await collection.EnsureCollectionExistsAsync();

// Upsert a record.
await collection.UpsertAsync(new Hotel
{
HotelId = 1,
HotelName = "Seaside Retreat",
Description = "A peaceful hotel on the coast with stunning ocean views.",
DescriptionEmbedding = await embeddingGenerator.GenerateVectorAsync(
"A peaceful hotel on the coast with stunning ocean views."),
Tags = ["beach", "ocean", "relaxation"]
});
```

### Perform vector search

Use the `SearchAsync` method to search for semantically similar records. Pass in an embedding vector for your query and specify the number of results to return:

```csharp
// Generate an embedding for the search query.
ReadOnlyMemory<float> queryEmbedding =
await embeddingGenerator.GenerateVectorAsync("beachfront hotel");

// Search for the top 3 most similar hotels.
IAsyncEnumerable<VectorSearchResult<Hotel>> results =
collection.SearchAsync(queryEmbedding, top: 3);

await foreach (VectorSearchResult<Hotel> result in results)
{
Console.WriteLine($"Hotel: {result.Record.HotelName}");
Console.WriteLine($"Score: {result.Score}");
}
```

### Filter search results

Use the <xref:Microsoft.Extensions.VectorData.VectorSearchOptions`1> class to filter vector search results by field values. Only properties marked with `IsIndexed = true` in `[VectorStoreData]` can be used in filters:

```csharp
var searchOptions = new VectorSearchOptions<Hotel>
{
Filter = hotel => hotel.HotelName == "Seaside Retreat"
};

IAsyncEnumerable<VectorSearchResult<Hotel>> results =
collection.SearchAsync(queryEmbedding, top: 3, searchOptions);
```

Filters are expressed as LINQ expressions and compiled into the query syntax of the underlying database. The supported operations vary by connector.

### Hybrid search

Some connectors support *hybrid search*, which combines vector similarity with full-text keyword matching. To use hybrid search, check whether your collection implements `IKeywordHybridSearchable<TRecord>` and use the `HybridSearchAsync` method:

```csharp
if (collection is IKeywordHybridSearchable<Hotel> hybridSearch)
{
var results = hybridSearch.HybridSearchAsync(
queryEmbedding,
keywords: ["ocean", "beach"],
top: 3);

await foreach (var result in results)
{
Console.WriteLine($"Hotel: {result.Record.HotelName}, Score: {result.Score}");
}
}
```

For hybrid search to work, the data model must have at least one vector property and one string property with `IsFullTextIndexed = true`.

## Available vector store connectors

The `Microsoft.Extensions.VectorData.Abstractions` package defines the abstractions, and separate connector packages implement them for specific vector databases. Choose the connector that matches your vector database:

| Connector | NuGet package |
|---|---|
| In-memory (for testing/development) | [Microsoft.SemanticKernel.Connectors.InMemory](https://www.nuget.org/packages/Microsoft.SemanticKernel.Connectors.InMemory) |
| Azure AI Search | [Microsoft.SemanticKernel.Connectors.AzureAISearch](https://www.nuget.org/packages/Microsoft.SemanticKernel.Connectors.AzureAISearch) |
| Azure Cosmos DB (NoSQL) | [Microsoft.SemanticKernel.Connectors.CosmosNoSQL](https://www.nuget.org/packages/Microsoft.SemanticKernel.Connectors.CosmosNoSQL) |
| Azure Cosmos DB (MongoDB) | [Microsoft.SemanticKernel.Connectors.CosmosMongoDB](https://www.nuget.org/packages/Microsoft.SemanticKernel.Connectors.CosmosMongoDB) |
| Azure SQL / SQL Server | [Microsoft.SemanticKernel.Connectors.SqlServer](https://www.nuget.org/packages/Microsoft.SemanticKernel.Connectors.SqlServer) |
| Couchbase | [CouchbaseVectorStore.SemanticKernel](https://www.nuget.org/packages/CouchbaseVectorStore.SemanticKernel) |
| Elasticsearch | [Elastic.SemanticKernel.Connectors.Elasticsearch](https://www.nuget.org/packages/Elastic.SemanticKernel.Connectors.Elasticsearch) |
| MongoDB | [Microsoft.SemanticKernel.Connectors.MongoDB](https://www.nuget.org/packages/Microsoft.SemanticKernel.Connectors.MongoDB) |
| Oracle | [OracleVectorStore.SemanticKernel](https://www.nuget.org/packages/OracleVectorStore.SemanticKernel) |
| Pinecone | [Microsoft.SemanticKernel.Connectors.Pinecone](https://www.nuget.org/packages/Microsoft.SemanticKernel.Connectors.Pinecone) |
| PostgreSQL (pgvector) | [Microsoft.SemanticKernel.Connectors.PgVector](https://www.nuget.org/packages/Microsoft.SemanticKernel.Connectors.PgVector) |
| Qdrant | [Microsoft.SemanticKernel.Connectors.Qdrant](https://www.nuget.org/packages/Microsoft.SemanticKernel.Connectors.Qdrant) |
| Redis | [Microsoft.SemanticKernel.Connectors.Redis](https://www.nuget.org/packages/Microsoft.SemanticKernel.Connectors.Redis) |
| SQLite | [Microsoft.SemanticKernel.Connectors.SqliteVec](https://www.nuget.org/packages/Microsoft.SemanticKernel.Connectors.SqliteVec) |
| Weaviate | [Microsoft.SemanticKernel.Connectors.Weaviate](https://www.nuget.org/packages/Microsoft.SemanticKernel.Connectors.Weaviate) |

All connectors implement the same `VectorStore` and `VectorStoreCollection<TKey, TRecord>` abstract classes, so you can switch between them without changing your application logic.

> [!TIP]
> Use the in-memory connector (`Microsoft.SemanticKernel.Connectors.InMemory`) during development and testing. It doesn't require any external service or configuration, and you can swap it out for a production connector later.

> [!IMPORTANT]
> Not all connectors are maintained by the Microsoft Semantic Kernel project. When evaluating a connector, review its quality, licensing, support, and compatibility to ensure it meets your requirements.

## Related content

- [Build a .NET AI vector search app](../quickstarts/build-vector-search-app.md)
- [Implement Azure OpenAI with RAG using vector search in a .NET app](../tutorials/tutorial-ai-vector-search.md)
- [Use vector stores in .NET AI apps](../how-to/use-vector-stores.md)
- [Data ingestion](data-ingestion.md)
- [Embeddings in .NET](embeddings.md)
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
using Microsoft.Extensions.AI;
using Microsoft.Extensions.VectorData;
using Microsoft.SemanticKernel.Connectors.InMemory;

// <AutoEmbeddingDataModel>
// When the vector property type is string (not ReadOnlyMemory<float>),
// the vector store automatically generates embeddings using the configured IEmbeddingGenerator.
public class FinanceInfo
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like it could / should be a record?

Suggested change
public class FinanceInfo
public record class FinanceInfo

{
[VectorStoreKey]
public int Key { get; set; }

[VectorStoreData]
public string Text { get; set; } = "";

// The string value placed here before upsert is automatically converted to a vector.
[VectorStoreVector(1536)]
public string EmbeddingSource { get; set; } = "";
Comment on lines +14 to +18
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In both cases, required might be better than the empty string initialization:

Suggested change
public string Text { get; set; } = "";
// The string value placed here before upsert is automatically converted to a vector.
[VectorStoreVector(1536)]
public string EmbeddingSource { get; set; } = "";
public required string Text { get; set; } = "";
// The string value placed here before upsert is automatically converted to a vector.
[VectorStoreVector(1536)]
public required string EmbeddingSource { get; set; } = "";

}
// </AutoEmbeddingDataModel>

public static class AutoEmbeddingExample
{
public static async Task RunAsync(IEmbeddingGenerator<string, Embedding<float>> embeddingGenerator)
{
// <AutoEmbeddingVectorStore>
// Configure the embedding generator at the vector store level.
// All collections in this store will use it unless overridden.
var vectorStore = new InMemoryVectorStore(new InMemoryVectorStoreOptions
{
EmbeddingGenerator = embeddingGenerator
});

var collection = vectorStore.GetCollection<int, FinanceInfo>("finance");
await collection.EnsureCollectionExistsAsync();

// Embeddings are generated automatically on upsert.
var records = new[]
{
new FinanceInfo { Key = 1, Text = "2024 Budget", EmbeddingSource = "The budget for 2024 is $100,000" },
new FinanceInfo { Key = 2, Text = "2023 Budget", EmbeddingSource = "The budget for 2023 is $80,000" }
};

await collection.UpsertAsync(records[0]);
await collection.UpsertAsync(records[1]);

// Embeddings for search are also generated automatically.
var results = collection.SearchAsync("What is my 2024 budget?", top: 1);

await foreach (var result in results)
{
Console.WriteLine($"Found: Key={result.Record.Key}, Text={result.Record.Text}");
}
// </AutoEmbeddingVectorStore>
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
// <DataModel>
using Microsoft.Extensions.VectorData;

public class Hotel
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment here on record:

Suggested change
public class Hotel
public record class Hotel

{
[VectorStoreKey]
public int HotelId { get; set; }

[VectorStoreData(IsIndexed = true)]
public string? HotelName { get; set; }

[VectorStoreData(IsFullTextIndexed = true)]
public string? Description { get; set; }

[VectorStoreVector(Dimensions: 1536, DistanceFunction = DistanceFunction.CosineSimilarity)]
public ReadOnlyMemory<float>? DescriptionEmbedding { get; set; }

[VectorStoreData(IsIndexed = true)]
public string[]? Tags { get; set; }
}
// </DataModel>
Loading