Skip to content

Conversation

@qinxuye
Copy link
Contributor

@qinxuye qinxuye commented Nov 2, 2025

Before this PR, embedding is sequential, though users could create multiple embeddings at same time.

After this PR, the model class could inherit BatchMixin, and provides method with batch version (utilizing xoscar batch API), the request will be put into queue, and a background coroutine will collect items as many as possible and calling API in a single call.

This is an initial version of auto batching, actually, this could be applied to all models, not only for self regressive models(basically LLM).

Fixes #4123

@XprobeBot XprobeBot added this to the v1.x milestone Nov 2, 2025
@qinxuye
Copy link
Contributor Author

qinxuye commented Nov 2, 2025

@llyycchhee please help me check this PR, and see if there's anything that can be improved.

Benchmarks welcome.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

嵌入模型并发性能问题

2 participants