XLMRoberta support #8638

Oliver-Y · 2024-07-23T00:43:27Z

Added support for XLMRoberta model. Tested on Multilingual E5 embeddings model. It seems in the tokenizer.json of E5 a preprocessor is used but since llama.cpp doesn't support SPM preprocessors yet I put a simple workaround right before the SPM tokenizer call.

This is my first time contributing so would love feedback of any form!

I have read the contributing guidelines
Self-reported review complexity: Low-Medium
- Low
- Medium
- High

Oliver-Y · 2024-07-23T17:38:11Z

Might be a bug w/ tokenization. Going to take a look first

Oliver-Y · 2024-07-24T07:37:47Z

Redundant to #8658 so closing

XLMRoberta support

60e3674

github-actions bot added the python python script changes label Jul 23, 2024

Oliver-Y closed this Jul 23, 2024

Oliver-Y reopened this Jul 23, 2024

Oliver-Y closed this Jul 23, 2024

Oliver-Y reopened this Jul 23, 2024

Oliver-Y closed this Jul 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

XLMRoberta support #8638

XLMRoberta support #8638

Oliver-Y commented Jul 23, 2024 •

edited

Loading

Oliver-Y commented Jul 23, 2024

Oliver-Y commented Jul 24, 2024

XLMRoberta support #8638

XLMRoberta support #8638

Conversation

Oliver-Y commented Jul 23, 2024 • edited Loading

Oliver-Y commented Jul 23, 2024

Oliver-Y commented Jul 24, 2024

Oliver-Y commented Jul 23, 2024 •

edited

Loading