Skip to content

Commit

Permalink
feat(vector-stores): Add support for Chroma VectorStore (#139)
Browse files Browse the repository at this point in the history
  • Loading branch information
davidmigloz committed Aug 27, 2023
1 parent 5fdcbc5 commit 098783b
Show file tree
Hide file tree
Showing 13 changed files with 396 additions and 11 deletions.
3 changes: 2 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -61,8 +61,9 @@ provided by a separate package.
| Package | Version | Description | Models | Data conn. | Chains | Agents & Tools |
|---------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------|--------|------------|--------|----------------|
| [langchain](https://pub.dev/packages/langchain) | [![langchain](https://img.shields.io/pub/v/langchain.svg)](https://pub.dev/packages/langchain) | Core LangChain API |||||
| [langchain_openai](https://pub.dev/packages/langchain_openai) | [![langchain_openai](https://img.shields.io/pub/v/langchain_openai.svg)](https://pub.dev/packages/langchain_openai) | OpenAI integration (GPT-3, GPT-4, Functions, etc.) | || | |
| [langchain_chroma](https://pub.dev/packages/langchain_chroma) | [![langchain_chroma](https://img.shields.io/pub/v/langchain_chroma.svg)](https://pub.dev/packages/langchain_chroma) | Chroma DB integration | || | |
| [langchain_google](https://pub.dev/packages/langchain_google) | [![langchain_google](https://img.shields.io/pub/v/langchain_google.svg)](https://pub.dev/packages/langchain_google) | Google integration (VertexAI, PaLM, Matching Engine, etc.) ||| | |
| [langchain_openai](https://pub.dev/packages/langchain_openai) | [![langchain_openai](https://img.shields.io/pub/v/langchain_openai.svg)](https://pub.dev/packages/langchain_openai) | OpenAI integration (GPT-3, GPT-4, Functions, etc.) |||||

The following packages are maintained (and used internally) by LangChain.dart,
although they can also be used independently:
Expand Down
3 changes: 2 additions & 1 deletion packages/langchain/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -61,8 +61,9 @@ provided by a separate package.
| Package | Version | Description | Models | Data conn. | Chains | Agents & Tools |
|---------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------|--------|------------|--------|----------------|
| [langchain](https://pub.dev/packages/langchain) | [![langchain](https://img.shields.io/pub/v/langchain.svg)](https://pub.dev/packages/langchain) | Core LangChain API |||||
| [langchain_openai](https://pub.dev/packages/langchain_openai) | [![langchain_openai](https://img.shields.io/pub/v/langchain_openai.svg)](https://pub.dev/packages/langchain_openai) | OpenAI integration (GPT-3, GPT-4, Functions, etc.) | || | |
| [langchain_chroma](https://pub.dev/packages/langchain_chroma) | [![langchain_chroma](https://img.shields.io/pub/v/langchain_chroma.svg)](https://pub.dev/packages/langchain_chroma) | Chroma DB integration | || | |
| [langchain_google](https://pub.dev/packages/langchain_google) | [![langchain_google](https://img.shields.io/pub/v/langchain_google.svg)](https://pub.dev/packages/langchain_google) | Google integration (VertexAI, PaLM, Matching Engine, etc.) ||| | |
| [langchain_openai](https://pub.dev/packages/langchain_openai) | [![langchain_openai](https://img.shields.io/pub/v/langchain_openai.svg)](https://pub.dev/packages/langchain_openai) | OpenAI integration (GPT-3, GPT-4, Functions, etc.) |||||

The following packages are maintained (and used internally) by LangChain.dart,
although they can also be used independently:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -76,7 +76,7 @@ abstract class VectorStore {
/// - [ids] is a list of ids to delete.
///
/// Returns true if the delete was successful.
Future<bool> delete({required final List<String> ids});
Future<void> delete({required final List<String> ids});

/// Returns docs most similar to query using specified search type.
///
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -141,11 +141,10 @@ class MemoryVectorStore extends VectorStore {
}

@override
Future<bool> delete({required final List<String> ids}) async {
Future<void> delete({required final List<String> ids}) async {
memoryVectors.removeWhere(
(final vector) => ids.contains(vector.document.id),
);
return true;
}

@override
Expand Down
14 changes: 13 additions & 1 deletion packages/langchain_chroma/README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,19 @@
# 🦜️🔗 LangChain.dart
# 🦜️🔗 LangChain.dart / Google

[![tests](https://img.shields.io/github/actions/workflow/status/davidmigloz/langchain_dart/test.yaml?logo=github&label=tests)](https://github.com/davidmigloz/langchain_dart/actions/workflows/test.yaml)
[![docs](https://img.shields.io/github/actions/workflow/status/davidmigloz/langchain_dart/pages%2Fpages-build-deployment?logo=github&label=docs)](https://github.com/davidmigloz/langchain_dart/actions/workflows/pages/pages-build-deployment)
[![langchain_chroma](https://img.shields.io/pub/v/langchain_chroma.svg)](https://pub.dev/packages/langchain_chroma)
[![](https://dcbadge.vercel.app/api/server/x4qbhqecVR?style=flat)](https://discord.gg/x4qbhqecVR)
[![MIT](https://img.shields.io/badge/license-MIT-purple.svg)](https://github.com/davidmigloz/langchain_dart/blob/main/LICENSE)

Chroma module for [LangChain.dart](https://github.com/davidmigloz/langchain_dart).

## Features

- Vector stores:
* `Chroma` vector store that uses [Chroma](https://www.trychroma.com)
open-source embedding database.

## License

LangChain.dart is licensed under the
Expand Down
4 changes: 3 additions & 1 deletion packages/langchain_chroma/lib/langchain_chroma.dart
Original file line number Diff line number Diff line change
@@ -1,2 +1,4 @@
/// Chroma module for LangChain.dart.
/// LangChain.dart integration module for Chroma open-source embedding database.
library;

export 'src/vector_stores/vector_stores.dart';
187 changes: 187 additions & 0 deletions packages/langchain_chroma/lib/src/vector_stores/chroma.dart
Original file line number Diff line number Diff line change
@@ -0,0 +1,187 @@
import 'package:chromadb/chromadb.dart';
import 'package:http/http.dart' as http;
import 'package:langchain/langchain.dart';
import 'package:uuid/uuid.dart';

import 'models/models.dart';

/// {@template chroma}
/// Vector store for Chroma open-source embedding database.
///
/// Chroma documentation:
/// https://docs.trychroma.com
///
/// This vector stores requires Chroma to be running in client/server mode.
///
/// The server can run on your local computer via docker or be easily deployed
/// to any cloud provider.
///
/// To run Chroma in client server mode, run the docker container:
/// ```
/// docker-compose up -d --build
/// ```
///
/// By default, the Chroma client will connect to a server running on
/// `http://localhost:8000`. To connect to a different server, pass the
/// `host` parameter to the constructor.
///
/// ### Collections
///
/// Chroma lets you manage collections of embeddings, using the collection
/// primitive.
///
/// You can configure the collection to use in the [collectionName] parameter.
///
/// You can also configure the metadata to associate with the collection in the
/// [collectionMetadata] parameter.
///
/// ### Changing the distance function
///
/// You can change the distance function of the embedding space by setting the
/// value of `hnsw:space` in [collectionMetadata]. Valid options are "l2",
/// "ip", or "cosine". The default is "l2".
///
/// ### Filtering
///
/// Chroma supports filtering queries by metadata and document contents.
/// The `where` filter is used to filter by metadata, and the `whereDocument`
/// filter is used to filter by document contents.
///
/// For example:
/// ```dart
/// final vectorStore = Chroma(...);
/// final res = await vectorStore.similaritySearch(
/// query: 'What should I feed my cat?',
/// config: ChromaSimilaritySearch(
/// k: 5,
/// scoreThreshold: 0.8,
/// where: {'class: 'cat'},
/// ),
/// );
/// ```
///
/// Chroma supports a wide range of operators for filtering. Check out the
/// filtering section of the Chroma docs for more info:
/// https://docs.trychroma.com/usage-guide?lang=js#using-where-filters
/// {@endtemplate}
class Chroma extends VectorStore {
/// {@macro chroma}
Chroma({
this.collectionName = 'langchain',
this.collectionMetadata,
required super.embeddings,
final String? host,
final http.Client? client,
}) : _client = ChromaClient(
host: host ?? 'http://localhost:8000',
client: client,
);

/// Name of the collection to use.
final String collectionName;

/// Metadata to associate with the collection.
final Map<String, dynamic>? collectionMetadata;

/// The Chroma client.
final ChromaClient _client;

/// A UUID generator.
final Uuid _uuid = const Uuid();

/// The collection to use.
Collection? _collection;

@override
Future<List<String>> addVectors({
required final List<List<double>> vectors,
required final List<Document> documents,
}) async {
assert(vectors.length == documents.length);

final collection = await _getCollection();

final List<String> ids = [];
final List<Map<String, dynamic>> metadatas = [];
final List<String> docs = [];

for (var i = 0; i < documents.length; i++) {
final doc = documents[i];
final id = doc.id ?? _uuid.v4();
ids.add(id);
metadatas.add(doc.metadata);
docs.add(doc.pageContent);
}

await collection.upsert(
ids: ids,
embeddings: vectors,
metadatas: metadatas,
documents: docs,
);
return ids;
}

@override
Future<void> delete({
required final List<String> ids,
}) async {
final collection = await _getCollection();
await collection.delete(ids: ids);
}

@override
Future<List<(Document, double)>> similaritySearchByVectorWithScores({
required final List<double> embedding,
final VectorStoreSimilaritySearch config =
const VectorStoreSimilaritySearch(),
}) async {
final collection = await _getCollection();
final result = await collection.query(
queryEmbeddings: [embedding],
nResults: config.k,
where: config.filter,
whereDocument:
config is ChromaSimilaritySearch ? config.whereDocument : null,
include: const [
Include.documents,
Include.metadatas,
Include.distances,
],
);
final ids = result.ids.first;
final metadatas = result.metadatas?.first;
final docs = result.documents?.first;
final distances = result.distances?.first;

final List<(Document, double)> results = [];
for (var i = 0; i < ids.length; i++) {
final distance = distances?[i] ?? 0.0;
if (config.scoreThreshold != null && distance < config.scoreThreshold!) {
continue;
}

final doc = Document(
id: ids[i],
metadata: metadatas?[i] ?? {},
pageContent: docs?[i] ?? '',
);
results.add((doc, distance));
}
return results;
}

Future<Collection> _getCollection() async {
if (_collection != null) {
return _collection!;
}

final collection = await _client.getOrCreateCollection(
name: collectionName,
metadata: collectionMetadata,
);

_collection = collection;
return collection;
}
}
33 changes: 33 additions & 0 deletions packages/langchain_chroma/lib/src/vector_stores/models/models.dart
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
import 'package:langchain/langchain.dart';

/// {@template chroma_similarity_search}
/// Chroma similarity search config.
///
/// Chroma supports filtering queries by metadata and document contents.
/// The [where] filter is used to filter by metadata, and the [whereDocument]
/// filter is used to filter by document contents.
///
/// Check out the filtering section of the Chroma docs for more info:
/// https://docs.trychroma.com/usage-guide?lang=js#using-where-filters
///
/// Example:
/// ```dart
/// ChromaSimilaritySearch(
/// k: 5,
/// where: {'style: 'style1'},
/// scoreThreshold: 0.8,
/// ),
/// ```
/// {@endtemplate}
class ChromaSimilaritySearch extends VectorStoreSimilaritySearch {
/// {@macro chroma_similarity_search}
const ChromaSimilaritySearch({
super.k = 4,
final Map<String, dynamic>? where,
this.whereDocument,
super.scoreThreshold,
}) : super(filter: where);

/// Optional query condition to filter results based on document content.
final Map<String, dynamic>? whereDocument;
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
export 'chroma.dart';
export 'models/models.dart';
7 changes: 5 additions & 2 deletions packages/langchain_chroma/pubspec.yaml
Original file line number Diff line number Diff line change
@@ -1,18 +1,21 @@
name: langchain_chroma
description: Chroma module for LangChain.dart.
description: LangChain.dart integration module for Chroma open-source embedding database.
version: 0.0.1-dev.1
repository: https://github.com/davidmigloz/langchain_dart/tree/main/packages/langchain_chroma
issue_tracker: https://github.com/davidmigloz/langchain_dart/issues
homepage: https://github.com/davidmigloz/langchain_dart
documentation: https://langchaindart.com
publish_to: none # Remove when the package is ready to be published

environment:
sdk: ">=3.0.0 <4.0.0"

dependencies:
chromadb: ^0.0.1-dev.1
http: ^1.1.0
langchain: ^0.0.9
meta: ^1.9.1
uuid: ^3.0.7

dev_dependencies:
test: ^1.24.3
langchain_openai: ^0.0.9
6 changes: 5 additions & 1 deletion packages/langchain_chroma/pubspec_overrides.yaml
Original file line number Diff line number Diff line change
@@ -1,4 +1,8 @@
# melos_managed_dependency_overrides: langchain
# melos_managed_dependency_overrides: langchain,chromadb,langchain_openai
dependency_overrides:
chromadb:
path: ../chromadb
langchain:
path: ../langchain
langchain_openai:
path: ../langchain_openai
Loading

0 comments on commit 098783b

Please sign in to comment.