Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

XLMRoberta support #8638

Closed
wants to merge 1 commit into from
Closed

XLMRoberta support #8638

wants to merge 1 commit into from

Conversation

Oliver-Y
Copy link

@Oliver-Y Oliver-Y commented Jul 23, 2024

Added support for XLMRoberta model. Tested on Multilingual E5 embeddings model. It seems in the tokenizer.json of E5 a preprocessor is used but since llama.cpp doesn't support SPM preprocessors yet I put a simple workaround right before the SPM tokenizer call.

This is my first time contributing so would love feedback of any form!

@github-actions github-actions bot added the python python script changes label Jul 23, 2024
@Oliver-Y Oliver-Y closed this Jul 23, 2024
@Oliver-Y Oliver-Y reopened this Jul 23, 2024
@Oliver-Y
Copy link
Author

Might be a bug w/ tokenization. Going to take a look first

@Oliver-Y Oliver-Y closed this Jul 23, 2024
@Oliver-Y Oliver-Y reopened this Jul 23, 2024
@Oliver-Y
Copy link
Author

Redundant to #8658 so closing

@Oliver-Y Oliver-Y closed this Jul 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
python python script changes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant