Skip to content

[Feature Request]: change JinaEmbeddingFunction to support jina-embeddings-v3 #3095

Open
@bwanglzu

Description

Describe the problem

Hi, i was randomly browsing Chroma documentation, realised that the default embedding funfction (API integration) for JinaAI is still jina-embeddings-v2, can we update the default model to support jina-embeddings-v3? The advantage:

  1. better performance
  2. multilingual (89 languages)
  3. MRL embedding

This might need to introduce new parameters:

  1. task: choose lora adapter for specific downstream task for optimal performance.
  2. dimensions: cut the embedding to maintain good performance while reduce storage
  3. late_chunking: get token embeddings, chunk token embeddings then apply pooling for better context-aware.

note: i work for Jina AI.

Additional reading:

  1. release blog post: https://jina.ai/news/jina-embeddings-v3-a-frontier-multilingual-embedding-model/
  2. paper: https://arxiv.org/abs/2409.10173
  3. late chunking: https://jina.ai/news/late-chunking-in-long-context-embedding-models/
  4. late chunking paper: https://arxiv.org/abs/2409.04701

Describe the proposed solution

  1. replace jina-embeddings-v2 with jina-embeddings-v3.
  2. introduce additional parameters
  3. update documentation
  4. integration test

Alternatives considered

No response

Importance

nice to have

Additional Information

No response

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions