[Feature Request]: change JinaEmbeddingFunction
to support jina-embeddings-v3 #3095
Open
Description
Describe the problem
Hi, i was randomly browsing Chroma documentation, realised that the default embedding funfction (API integration) for JinaAI is still jina-embeddings-v2
, can we update the default model to support jina-embeddings-v3? The advantage:
- better performance
- multilingual (89 languages)
- MRL embedding
This might need to introduce new parameters:
task
: choose lora adapter for specific downstream task for optimal performance.dimensions
: cut the embedding to maintain good performance while reduce storagelate_chunking
: get token embeddings, chunk token embeddings then apply pooling for better context-aware.
note: i work for Jina AI.
Additional reading:
- release blog post: https://jina.ai/news/jina-embeddings-v3-a-frontier-multilingual-embedding-model/
- paper: https://arxiv.org/abs/2409.10173
- late chunking: https://jina.ai/news/late-chunking-in-long-context-embedding-models/
- late chunking paper: https://arxiv.org/abs/2409.04701
Describe the proposed solution
- replace
jina-embeddings-v2
withjina-embeddings-v3
. - introduce additional parameters
- update documentation
- integration test
Alternatives considered
No response
Importance
nice to have
Additional Information
No response
Activity