-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: Concurrent requests messing up GREEDY responses #5607
Comments
This might be a dup and being investigated here: #5404 |
This seems to be related to the fact that some requests have See #5639 |
Also worth mentioning that I see the same behaviour if I remove the sampling_params = []
sampling_params.append(
{
"prompt": prompt,
"temperature": 0.0,
"max_tokens": 5,
"repetition_penalty": 2,
}
)
sampling_params.append(
{
"prompt": prompt,
"seed": 99,
"max_tokens": 4,
}
)
sampling_params = sampling_params * 10 sometimes produces:
However, if I have both requests be greedy then the issue goes away. I think that is because the offending code only gets excecuted if |
Thanks @prashantgupta24 @tdoublep for finding and investigating this! |
What is the final conclusion of this question? |
Your current environment
🐛 Describe the bug
Issue
While sending concurrent requests,
greedy
responses become inconsistent and become affected due to the other concurrent sampling request.In the example below, if only the
greedy
request is sent, the output is 100%Once upon a time, there was an old man
.But, as soon as another concurrent sampling request is sent along with the
greedy
request, thegreedy
response sometimes changes toOnce upon a time, there was a young woman
-> which should not be the case.(Note: not consistent behavior, but happens like 2/5 times)
Server
Sample client
Output
The text was updated successfully, but these errors were encountered: