Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

server : fill usage info in embeddings and rerank responses #10852

Merged
merged 2 commits into from
Dec 17, 2024

Conversation

krystiancha
Copy link
Contributor

These two commits add token usage info to responses of embedding and rerank endpoints.

Copy link
Collaborator

@ngxson ngxson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd appreciate if you can add a test case for it. See test_embedding.py

@github-actions github-actions bot added the python python script changes label Dec 17, 2024
@krystiancha
Copy link
Contributor Author

Tests added

examples/server/server.cpp Outdated Show resolved Hide resolved
@ggerganov ggerganov merged commit 05c3a44 into ggerganov:master Dec 17, 2024
1 check passed
@krystiancha krystiancha deleted the fill-usage branch December 17, 2024 21:02
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Dec 20, 2024
…v#10852)

* server : fill usage info in embeddings response

* server : fill usage info in reranking response
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
examples python python script changes server
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants