Skip to content

Add distanceType support to ChromaVectorStore.Builder for distance function configuration #5163

@TTzvet

Description

@TTzvet

Expected Behavior

The ChromaVectorStore.Builder should support configuring the distance function, consistent with other Spring AI vector stores:

@Bean
public ChromaVectorStore vectorStore(EmbeddingModel embeddingModel, ChromaApi chromaApi) {
    return ChromaVectorStore.builder(chromaApi, embeddingModel)
        .collectionName("my-collection")
        .distanceType(ChromaDistanceType.COSINE)
        .initializeSchema(true)
        .build();
}

This would be consistent with other Spring AI vector stores that expose distance type configuration:

  • PgVectorStore.Builder.distanceType()
  • MariaDBVectorStore.Builder.distanceType()

Internally, Spring AI would translate this to ChromaDB's hnsw:space collection metadata.

Additionally, a generic collectionMetadata(Map<String, Object>) method could be useful for advanced ChromaDB-specific HNSW settings (e.g., hnsw:M, hnsw:construction_ef).

Current Behavior

The ChromaVectorStore.Builder does not expose any way to configure collection metadata. The builder only supports:

  • collectionName()
  • tenantName()
  • databaseName()
  • initializeSchema()
  • filterExpressionConverter()
  • initializeImmediately()

ChromaDB defaults to L2 (Euclidean distance), but for text/semantic search, cosine similarity is generally preferred. ChromaDB supports this via collection metadata (hnsw:space), but there's no way to pass this through Spring AI.

Context

I'm building a RAG application for semantic search over text documents. Cosine similarity is the recommended distance metric for text embeddings, but I cannot configure this through the ChromaVectorStore.Builder.

Workarounds considered:

  1. Pre-create the collection in ChromaDB with the correct metadata before the application starts, then set initializeSchema(false) - this adds operational complexity and breaks the self-contained setup
  2. Use ChromaApi directly to create the collection with metadata before building the VectorStore - this bypasses the builder pattern and duplicates configuration

Both workarounds are cumbersome compared to a simple builder method.

Environment:

  • Spring AI Version: 1.0.2
  • ChromaDB Version: 1.0.0

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions