Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refine model file download for python backend #526

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

kaixuanliu
Copy link
Contributor

For models like jinaai/jina-embeddings-v2-base-code, Rust side will download 2 separate directories models--jinaai--jina-embeddings-v2-base-code/ and models--jinaai--jina-bert-v2-qk-post-norm/, while existing implementation only pass 1 Path to python backend. This PR refines model file download so that we can run models below using python backend:

Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>
@kaixuanliu
Copy link
Contributor Author

@regisss @Narsil pls help review

Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>
Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>
Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>
@Narsil
Copy link
Collaborator

Narsil commented Mar 26, 2025

I'm sorry but I do not understand, how does it downloads 2 models ? I cannot reproduce on my side.

Can you provide steps to reproduce, it seems the candle backend properly downloads only 1 model.

Propagating model_id is not something desirable a priori, the fact that the model id is resolved first, and then only a path is used makes debugging usually much simpler.

@kaixuanliu
Copy link
Contributor Author

Steps to re-produce on CPU:

  1. build the docker container docker build --build-arg PLATFORM="cpu" -f Dockerfile-intel -t tei_cpu .
  2. start backend:
model='jinaai/jina-embeddings-v2-base-code'
volume=$PWD/data
docker run -p 8080:80 -v $volume:/data -e TRUST_REMOTE_CODE=True tei-cpu --model-id $model

@Narsil
Copy link
Collaborator

Narsil commented Mar 26, 2025

I can now reproduce. The trust-remote-code behavior of this model is suprising to me, looking into what's going on.

@Narsil
Copy link
Collaborator

Narsil commented Mar 27, 2025

Thanks for surfacing this, discussing internally we figured there are some security implications of this behavior, which we're most likely going to close, so this behavior will go away (and force every repo to own their own remote code, so that trust_remote_code cannot be abused as much).

Before we're going through with this, we're trying to figure out the implications, are you aware of other models with that behavior ?

@alvarobartt
Copy link
Member

alvarobartt commented Mar 27, 2025

Before we're going through with this, we're trying to figure out the implications, are you aware of other models with that behavior ?

I found out that https://huggingface.co/mims-harvard/ToolRAG-T1-GTE-Qwen2-1.5B uses the auto_map to reference code in other repository too

The following filter may not be fully accurate, but may help https://huggingface.co/models?other=custom_code,text-embeddings-inference&sort=trending

@kaixuanliu
Copy link
Contributor Author

I have not met other models with this unexpected behavior. But these three I listed above are in TEI README part. Our customers are asking support for these 3 models:
Alibaba-NLP/gte-Qwen2-7B-instruct
Alibaba-NLP/gte-Qwen2-1.5B-instruct
jinaai/jina-embeddings-v2-base-code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants