Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: What's between DefaultEmbeddingFunction() and SentenceTransformerEmbeddingFunction(model_name="all-MiniLM-L6-v2")? #2748

Closed
h3clikejava opened this issue Aug 30, 2024 · 2 comments
Labels
bug Something isn't working

Comments

@h3clikejava
Copy link

What happened?

I save some embeddings by default like this:
collection = client.get_or_create_collection(name=db_name)

Then, i can fetch data by DefaultEmbeddingFunction() like:
emb_fn = embedding_functions.DefaultEmbeddingFunction()
collection = client.get_or_create_collection(name=db_name, embedding_function=emb_fn) # It's work

But i can't fetch data by all-MiniLM-L6-v2 like:
emb_fn = embedding_functions.SentenceTransformerEmbeddingFunction(model_name="all-MiniLM-L6-v2")
collection = client.get_or_create_collection(name=db_name, embedding_function=emb_fn) # It's not work

What's the different between DefaultEmbeddingFunction and SentenceTransformerEmbeddingFunction(model_name="all-MiniLM-L6-v2")?

Versions

ChromaDB V0.5.3, python v3.10.11, MacOS 15.0 Beta (24A5327a)

Relevant log output

No response

@h3clikejava h3clikejava added the bug Something isn't working label Aug 30, 2024
@tazarov
Copy link
Contributor

tazarov commented Aug 30, 2024

@h3clikejava, thanks for raising this. Let's start with some background:

  • Default EF uses onnx runtime + mean pooling, resulting in 384-dimensional embeddings
  • ST does everything out of the box and again produces 384-dimensional embeddings

Under normal circumstances, you should not have trouble swapping between the two, as Chroma will accept queries using 384-dimensional embeddings even though there are slight differences in the output embeddings (in the order of 1-e4/1-e5 range).

That said, when you say it won't work, do you mean you get an error? Can you share the error?

@jeffchuber
Copy link
Contributor

@h3clikejava happy to re-open if you can help out! closing for now as @tazarov did a good job addressing it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants