Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[UX] sampling with vllm #1624

Merged
merged 2 commits into from
Mar 13, 2024
Merged

Conversation

sindhuvahinis
Copy link
Contributor

Description

This PR

  • Makes greedy the default for vllm, [before this PR random sampling was the default for vllm]
  • Introduces num_beams should enable beam search. [Even for the beam search, the generated sequence will be 1. It will be greater than 1 only if n > 1. n in equivalent to num_return_sequences in huggingface]
  • Makes default value of max_new_tokens to be 30. [Before this PR, it was 16]

@sindhuvahinis sindhuvahinis requested review from zachgk, frankfliu and a team as code owners March 13, 2024 00:09
serving/docs/lmi/lmi_input_output_schema.md Outdated Show resolved Hide resolved
@@ -86,15 +86,19 @@ def translate_vllm_params(self, parameters: dict) -> dict:

:return: The same parameters dict, but with VLLM style parameter names.
"""
parameters.pop('do_sample', None)
parameters["max_tokens"] = parameters.pop("max_new_tokens", 30)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why 30? is this the same default for other backends as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is the same default for other backends in our handlers.

if "seed" in parameters.keys():
parameters["seed"] = int(parameters["seed"])
if "max_new_tokens" in parameters.keys():
parameters["max_tokens"] = parameters.pop("max_new_tokens")
if not parameters.pop('do_sample', False):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is do_sample also an alias for sampling with some default temperature in other backends? if so, are the default values for temperature when do_sample is set in parity across backends?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, we probably want to define the do sample behavior

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, do_sample means sampling here. But the default for temperature for greedy in other backends is 1.0.

And vllm seems to do sampling only when temperature is less than 1.0. I guess, they go for the logic that settign temp=0, means disabling the randomness, so they choose greedy when temp is set to 0.

vLLM chooses Sampling as their default method for choosing next token. But all other frameworks has Greedy as their default. So to unify our handlers behavior, here we are trying to make the sampling method in parity. But yes, default temp value would not be the same as other backends.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. It can be in another PR, or it might not be feasible at all, but if the default sampling behavior is different across backends when do_sample=True, that could cause confusion when users switch between backends. I wonder if it's possible (or woth it) to try to unify the behavior when do_sample=True across all backends (meaning same temperature/other params set to same value)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, Will write an internal quip doc for it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sindhuvahinis sindhuvahinis force-pushed the vllm branch 3 times, most recently from 0b3ae63 to 5ae924f Compare March 13, 2024 01:50
@sindhuvahinis sindhuvahinis merged commit 650a5fe into deepjavalibrary:master Mar 13, 2024
8 checks passed
@sindhuvahinis sindhuvahinis deleted the vllm branch April 4, 2024 17:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants