Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bugfix][Frontend] Cleanup "fix chat logprobs" #5026

Merged
merged 21 commits into from
Jun 11, 2024

Conversation

DarkLight1337
Copy link
Member

@DarkLight1337 DarkLight1337 commented May 24, 2024

Follow-up to #5029 and #5031:

  • Fix finish_reason being too strict (SequenceStatus.get_finished_reason can return 'abort' which is not in OpenAI spec)
  • Check strict equality instead of upper-bounding number of logprobs returned by Chat Completions API.
  • Remove unnecessary logprobs check in test_single_completion and test_single_chat_session since that is already covered by the newly-added tests.

Future work

As mentioned in #5031, the selected logprob should always be in top_logprobs for Completions API (this is not necessary for Chat Completions API though). However, this is not yet implemented in vLLM core.

Side note

I've noticed that the Pydantic models for the Embeddings API (#3734) are not inheriting from the stricter OpenAIBaseModel, and their return values have unnecessary information. The Embeddings API tests are also not conforming to OpenAI's spec. Maybe we can address this in a separate PR.

@Etelis
Copy link
Contributor

Etelis commented May 24, 2024

Hopefully my PR still makes sense after this (#5031)

@DarkLight1337
Copy link
Member Author

DarkLight1337 commented May 24, 2024

Hopefully my PR still makes sense after this (#5031)

At this moment, this PR only address the second issue in #5031. See if the newly added tests can catch the other two issues as well.

Edit: The tests managed to additionally catch the first issue. I have applied the fix to this PR accordingly.

@DarkLight1337 DarkLight1337 changed the title [Bugfix][Frontend] Fix format of returned logprobs for OpenAI Chat Completions API [Bugfix][Frontend] Fix some issues left behind by #5029 and #5031 May 30, 2024
@DarkLight1337 DarkLight1337 changed the title [Bugfix][Frontend] Fix some issues left behind by #5029 and #5031 [Bugfix][Frontend] Cleanup "fix chat logprobs" May 30, 2024
@DarkLight1337
Copy link
Member Author

DarkLight1337 commented May 30, 2024

@simon-mo This PR should be the final one to resolve the current chat logprobs issue.

@DarkLight1337 DarkLight1337 requested a review from simon-mo May 31, 2024 05:32
Copy link
Collaborator

@sroy745 sroy745 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the pr!! Apologies in advance if my comments are off.

assert completion.choices[0].finish_reason == "length"

choice = completion.choices[0]
assert len(choice.text) >= 5
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit : since we are doing a more stricter check about completion.usage (L:174) wonder if we can have a more strict equality check in the len(choice.text) for consistency here and in other places?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that we can't check the length of the string using strict equality as each token may correspond to multiple characters. I'll keep the range check then.

Copy link
Collaborator

@sroy745 sroy745 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for trying out my comment. LGTM from my understanding of the issue. Thanks!

@simon-mo
Copy link
Collaborator

simon-mo commented Jun 3, 2024

To confirm, for OpenAI API, if you specify logprobs without top_logprobs, what would happen? We are throwing an error here, if OpenAI API is not throwing error, we should follow their behavior.

@DarkLight1337
Copy link
Member Author

DarkLight1337 commented Jun 4, 2024

To confirm, for OpenAI API, if you specify logprobs without top_logprobs, what would happen? We are throwing an error here, if OpenAI API is not throwing error, we should follow their behavior.

It returns an empty list of logprobs, as if top_logprobs=0. This is consistent with the current implementation in vLLM (which was updated by #5029 to allow top_logprobs=None).

@simon-mo
Copy link
Collaborator

simon-mo commented Jun 4, 2024

Sorry I am reading this

            if request.logprobs and request.top_logprobs is not None:
                assert top_logprobs is not None, (
                    "top_logprobs must be provided when logprobs "
                    "is requested")

and it seems we require users to set top_logprobs to a number, when logprobs is specified. And #5029 did not default the number to 0. I am missing how is this consistent with OAI behavior when logprobs is specified and top_logprobs is missing, the request will still succeed.

@DarkLight1337
Copy link
Member Author

Sorry I am reading this

            if request.logprobs and request.top_logprobs is not None:
                assert top_logprobs is not None, (
                    "top_logprobs must be provided when logprobs "
                    "is requested")

and it seems we require users to set top_logprobs to a number, when logprobs is specified. And #5029 did not default the number to 0. I am missing how is this consistent with OAI behavior when logprobs is specified and top_logprobs is missing, the request will still succeed.

The naming and error message are confusing. I have pushed a commit to clear up the confusion.

@DarkLight1337
Copy link
Member Author

Originally, I had some fixes to #5135 in this PR, but those have been superseded by #5319. To avoid confusion, I reverted this PR to the state before merging #5135, and then merged the fixes from #5319.

The current state of this PR is effectively merging the current main branch into 908cac4, so no further errors should arise. @simon-mo feel free to merge this if my previous comment resolves your question.

@zhuohan123 zhuohan123 merged commit 640052b into vllm-project:main Jun 11, 2024
101 of 103 checks passed
@DarkLight1337 DarkLight1337 deleted the openai-logprobs branch June 11, 2024 05:41
tjohnson31415 added a commit to tjohnson31415/vllm that referenced this pull request Jun 11, 2024
* upstream/main: (126 commits)
  [Bugfix][Frontend] Cleanup "fix chat logprobs" (vllm-project#5026)
  [Bugfix] OpenAI entrypoint limits logprobs while ignoring server defined --max-logprobs (vllm-project#5312)
  [Misc] Various simplifications and typing fixes (vllm-project#5368)
  [ci] Fix Buildkite agent path (vllm-project#5392)
  [Doc] Add documentation for FP8 W8A8 (vllm-project#5388)
  Bump version to v0.5.0 (vllm-project#5384)
  [Docs] Alphabetically sort sponsors (vllm-project#5386)
  [Docs] Add Docs on Limitations of VLM Support (vllm-project#5383)
  [ci] Mount buildkite agent on Docker container to upload benchmark results (vllm-project#5330)
  [ci] Use small_cpu_queue for doc build (vllm-project#5331)
  [Bugfix] Fix LLaVA-NeXT (vllm-project#5380)
  [Feature][Frontend]:  Continued `stream_options` implementation also in CompletionRequest (vllm-project#5319)
  [Model] Initial support for LLaVA-NeXT (vllm-project#4199)
  [Misc] Improve error message when LoRA parsing fails (vllm-project#5194)
  [misc][typo] fix typo (vllm-project#5372)
  [Frontend][Misc] Enforce Pixel Values as Input Type for VLMs in API Server (vllm-project#5374)
  [Misc] Update to comply with the new `compressed-tensors` config (vllm-project#5350)
  [Bugfix] Fix KeyError: 1 When Using LoRA adapters (vllm-project#5164)
  [Kernel][Misc] Use TORCH_LIBRARY instead of PYBIND11_MODULE for custom ops (vllm-project#5047)
  [mis][ci/test] fix flaky test in test_sharded_state_loader.py (vllm-project#5361)
  ...
joerunde pushed a commit to joerunde/vllm that referenced this pull request Jun 17, 2024
xjpang pushed a commit to xjpang/vllm that referenced this pull request Jun 27, 2024
xjpang pushed a commit to xjpang/vllm that referenced this pull request Jul 8, 2024
xjpang pushed a commit to xjpang/vllm that referenced this pull request Jul 24, 2024
Temirulan pushed a commit to Temirulan/vllm-whisper that referenced this pull request Sep 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants