Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How can I use CLIP model to obtain the image and text embeddings? #260

Open
zhujiajian98 opened this issue Apr 18, 2024 · 3 comments
Open
Labels
discussions The issue might be helpful or contains useful information

Comments

@zhujiajian98
Copy link

Now I need to try to use the prompt training scheme to finetune the yolo-world. But according to the docs/prompt_yolo_world.md, I need to extract the image embedding/text embedding by using the clip model and save the corresponding npy file. So now how can I use CLIP model to obtains the image and text embeddings?

@zhujiajian98 zhujiajian98 changed the title How can I use CLIP model to obtains the image and text embeddings? How can I use CLIP model to obtain the image and text embeddings? Apr 19, 2024
@wondervictor
Copy link
Collaborator

Hi @zhujiajian98, text embeddings can be extracted by generate_text_prompts.py, I'll provide the scripts for image embeddings in a day (before the next Monday).

@liping-ren
Copy link

Hello Author, I am interested in using images for fine tuning and would like to ask if you can provide the script that generates the image embedding? Thank you very much.

@wondervictor
Copy link
Collaborator

Hi @liping-ren, it has been added to tools/generate_image_prompts.py.

@wondervictor wondervictor added the discussions The issue might be helpful or contains useful information label Apr 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussions The issue might be helpful or contains useful information
Projects
None yet
Development

No branches or pull requests

3 participants