-
Notifications
You must be signed in to change notification settings - Fork 921
Add support for computing CLIP image and text embeddings separately (Closes #148) #227
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
cc @josephrocca |
The documentation is not available anymore as the PR was closed or merged. |
🥹 what did we (the web/js community) do to deserve you xenova 🧎 |
The little web app examples you've been putting together are great! Being able to tweet "this is running completely in the browser!" with a little video showing some magic AI thing (that most devs might assume you'd need a big GPU server for), is, I think, opening up peoples eyes to the possibilities here. And your design skills are 🔥 (looking sadly at my clip-image-sorter web design lol) (Aside: I've mentioned this before, but there's another subset of users [which includes myself], who really just want little code snippets for various tasks - i.e. where a full-blown application isn't really that useful, because I end up having to dig through code to "extract" out the simple few-lines-of-code that I wanted. If these examples could be linked in the "Supported Tasks" table, that would be perfect I think. I know you're getting around to this eventually - but I know it's sometimes useful to hear pain-points from users a few times so you know that users aren't just requesting some niche "nice-to-have" thing that they happened to ponder for a few moments.) |
This PR adds support for computing CLIP text and vision embeddings separately. It uses a custom ONNX config (based on this) and requires models to be exported with the
--split_modalities
flag set in the conversion script. For example:Usage:
Example: Compute text embeddings with
CLIPTextModelWithProjection
.Example: Compute vision embeddings with
CLIPVisionModelWithProjection
.