Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds support for stella_en_v5 embedding model -400M variant #2608

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

iskng
Copy link

@iskng iskng commented Nov 9, 2024

Stella_en_400m_v5 is #6 on MTEB as of 9th Nov 2024.

Model Card

This PR adds support for the model along with some examples.

license: Model is licensed MIT

Authors example from the model card added and reproduced.

@AnubhabB
Copy link
Contributor

@iskng let's try and figure out if we can have one single stella_en_v5 module instead of stella_en_v5 and stella_en_v5_400m. Allow me some time to go through this and discuss possible ways of merging this.

I guess that way, it'll be easier for end users and maintainers.

It would be great if you could mark this as a draft PR for the time being till we sort this out?

Thanks

@LaurentMazare
Copy link
Collaborator

@iskng let's try and figure out if we can have one single stella_en_v5 module instead of stella_en_v5 and stella_en_v5_400m. Allow me some time to go through this and discuss possible ways of merging this.

+1 to this, if it's easy to add support for the 400m model in the existing one that would make it simpler to maintain over time (though there is already a lot of duplication among models so if it's a significant effort to merge the two, I'm happy with the separate file).

@iskng iskng marked this pull request as draft November 10, 2024 18:40
@iskng
Copy link
Author

iskng commented Nov 12, 2024

Should have mentioned I only really tested this for inference on metal and cpu so not sure if the cuda implementation is right, had to disable the ues_efficient_memory because trying to get xformers on a mac was rough.

Also curious if its just my implementation, but its about 3 times slower than sentence transformers for the same model.
Would love to learn to make this faster, if you know of any resources I'm just starting to dig around candle. Thx

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants