-
Notifications
You must be signed in to change notification settings - Fork 11.9k
model : jina-embeddings-v3 support #13693
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Apart from a few minor differences (unsure why) in the tokenizer test, the
vs
and
vs
|
@ngxson @slaren When you have the time I would appreciate some feedback on how best to tackle the task LoRAs of this model. I think the best user-experience would probably be to keep them embedded and add extra metadata for their names so they can be easily chosen via an option. However that increases the scope of this PR quite a bit as new mechanisms would need to be added to load and apply the right LoRA tensors at runtime. This seems a little excessive for just one model, but maybe it can be useful for others as well, I don't know? The less intrusive route would be to extract each LoRA into their own separate GGUF (albeit a more complicated conversion process) and make the user responsible for applying the correct one (and using the correct prompt), but that's seems like a fairly bad UX. The PR as-is now works great and produces identical embeddings as the original using |
I was thinking about supporting built-in lora lately, as this is required for phi-4-multimodal. We can extend the current lora API to support this case, but eventually end-user need a way to select this (for example via llama-server). For multimodal, it can be done easily via libmtmd. Another approach could be to add an enum of pre-defined lora types, and user code can switch it at runtime. This is based on an earlier suggestion from @ggerganov about having multiple models in the same gguf. If I have time this weekend, I can push a draft PR on how this can be done. |
Why can't we use the existing LoRA mechanism that is supported by Btw, did you resolve the tokenization differences? |
No, it seems like a bug/difference in the UGM tokenizer... |
The lora api on server is quite low-level, also downstream apps will have to explicitly set the lora accordingly to use case, which may not be a good UX overall, especially when the lora provides commonly known tasks like embeddings or reranking
|
Another option could be to consider it as an extension to the embedding pooling selection |
Ok I understand - the adapters are embedded inside the GGUF file together with the model and we don't have a mechanism to load them. |
Even if the adapters were embedded, the user still has to use the correct prompt. So the UX seems to be the same regardless how the LoRAs are stored? |
The thinking was that the prompt could be prefixed depending on task selection (easily stored as metadata). |
@ggerganov I can confirm that it's an issue with the UGM tokenizer, the same thing happens with nomic-embed-text-v2-moe f.ex. |
I looked deeper into the jina model. It is a bit confused to me though:
If the use case of non LoRA is not practical, maybe it's more simple to just merge the LoRA into the weight |
No, there are 5 LoRAs, but they are all in the same (
Not sure, just for reference I guess? |
See here for how the correct task LoRA is loaded in |
Ok I see, haven't looked at the tensor shapes. So if I understand correctly, it seems like the first 2 tasks |
No, there are 5 adapters, the tensors are shaped like this: [tasks (5), rank (4), N] |
WIP support for jina-embeddings-v3
Work checklist
Fixes #12327
Fixes #9585