Skip to content

Server: fix server hangs on empty prompt #5733

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Feb 26, 2024

Conversation

ngxson
Copy link
Collaborator

@ngxson ngxson commented Feb 26, 2024

This is a proposal to fix #5724 and #5246

It's buggy when we run a slot with no tokens to evaluate, since some parts of code implicitly expect n_tokens to be > 0

This PR fixes issue when using:

  • /v1/embeddings with "input": ""
  • /embedding with "content": ""
  • /completion with "prompt": ""

@ngxson ngxson requested review from ggerganov and phymbert February 26, 2024 14:38
Copy link
Collaborator

@phymbert phymbert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we hardcode a default embeddings answer as OpenAI ? Do you think it deserve a small test scenario ?

@ngxson
Copy link
Collaborator Author

ngxson commented Feb 26, 2024

I don't think we should hard code the result, as it's vector and not a fixed text value (so we may expect different floating point inaccuracy on different hardwares)

But what we can do for the test is:

  • Test if embedding works with empty input { "input": "" }, only to see if it outputs a vector or not. We don't care what's inside the vector.
  • Try to get embedding for one-space { "input": " " } and two-spaces { "input": " " } then compare the euclidean distance of the too vector, you can hard code to check if vector1 != vector2 and distance(vector1, vector2) < THRESHOLD, where THRESHOLD can be hard coded. THRESHOLD can be a bit bigger than it need to be, to compensate the inaccuracy among hardwares.

@ngxson ngxson merged commit b11a93d into ggml-org:master Feb 26, 2024
@ibehnam
Copy link
Contributor

ibehnam commented Feb 26, 2024

@ngxson @phymbert @ggerganov

Can we also fix the issue where incorrect grammar crashes the server? Last I checked there was a boolean check at the top of the server code which validated the input args. I think we could move the grammar validation outside of that and just return None or Error if incorrect grammar is passed.

jordankanter pushed a commit to jordankanter/llama.cpp that referenced this pull request Mar 13, 2024
hodlen pushed a commit to hodlen/llama.cpp that referenced this pull request Apr 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Server gets stuck after invalid request
3 participants