-
-
Notifications
You must be signed in to change notification settings - Fork 5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bugfix][Frontend] Cleanup "fix chat logprobs" #5026
Conversation
f01bba0
to
36fc320
Compare
36fc320
to
08e41d7
Compare
Hopefully my PR still makes sense after this (#5031) |
3bac2c4
to
cbed5ec
Compare
@simon-mo This PR should be the final one to resolve the current chat logprobs issue. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the pr!! Apologies in advance if my comments are off.
assert completion.choices[0].finish_reason == "length" | ||
|
||
choice = completion.choices[0] | ||
assert len(choice.text) >= 5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit : since we are doing a more stricter check about completion.usage (L:174) wonder if we can have a more strict equality check in the len(choice.text) for consistency here and in other places?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that we can't check the length of the string using strict equality as each token may correspond to multiple characters. I'll keep the range check then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for trying out my comment. LGTM from my understanding of the issue. Thanks!
To confirm, for OpenAI API, if you specify logprobs without top_logprobs, what would happen? We are throwing an error here, if OpenAI API is not throwing error, we should follow their behavior. |
It returns an empty list of logprobs, as if |
Sorry I am reading this if request.logprobs and request.top_logprobs is not None:
assert top_logprobs is not None, (
"top_logprobs must be provided when logprobs "
"is requested") and it seems we require users to set |
The naming and error message are confusing. I have pushed a commit to clear up the confusion. |
eb9dc12
to
7de6cf7
Compare
Originally, I had some fixes to #5135 in this PR, but those have been superseded by #5319. To avoid confusion, I reverted this PR to the state before merging #5135, and then merged the fixes from #5319. The current state of this PR is effectively merging the current |
* upstream/main: (126 commits) [Bugfix][Frontend] Cleanup "fix chat logprobs" (vllm-project#5026) [Bugfix] OpenAI entrypoint limits logprobs while ignoring server defined --max-logprobs (vllm-project#5312) [Misc] Various simplifications and typing fixes (vllm-project#5368) [ci] Fix Buildkite agent path (vllm-project#5392) [Doc] Add documentation for FP8 W8A8 (vllm-project#5388) Bump version to v0.5.0 (vllm-project#5384) [Docs] Alphabetically sort sponsors (vllm-project#5386) [Docs] Add Docs on Limitations of VLM Support (vllm-project#5383) [ci] Mount buildkite agent on Docker container to upload benchmark results (vllm-project#5330) [ci] Use small_cpu_queue for doc build (vllm-project#5331) [Bugfix] Fix LLaVA-NeXT (vllm-project#5380) [Feature][Frontend]: Continued `stream_options` implementation also in CompletionRequest (vllm-project#5319) [Model] Initial support for LLaVA-NeXT (vllm-project#4199) [Misc] Improve error message when LoRA parsing fails (vllm-project#5194) [misc][typo] fix typo (vllm-project#5372) [Frontend][Misc] Enforce Pixel Values as Input Type for VLMs in API Server (vllm-project#5374) [Misc] Update to comply with the new `compressed-tensors` config (vllm-project#5350) [Bugfix] Fix KeyError: 1 When Using LoRA adapters (vllm-project#5164) [Kernel][Misc] Use TORCH_LIBRARY instead of PYBIND11_MODULE for custom ops (vllm-project#5047) [mis][ci/test] fix flaky test in test_sharded_state_loader.py (vllm-project#5361) ...
Follow-up to #5029 and #5031:
finish_reason
being too strict (SequenceStatus.get_finished_reason
can return'abort'
which is not in OpenAI spec)test_single_completion
andtest_single_chat_session
since that is already covered by the newly-added tests.Future work
As mentioned in #5031, the selected logprob should always be in
top_logprobs
for Completions API (this is not necessary for Chat Completions API though). However, this is not yet implemented in vLLM core.Side note
I've noticed that the Pydantic models for the Embeddings API (#3734) are not inheriting from the stricter
OpenAIBaseModel
, and their return values have unnecessary information. The Embeddings API tests are also not conforming to OpenAI's spec. Maybe we can address this in a separate PR.