[UX] sampling with vllm #1624

sindhuvahinis · 2024-03-13T00:09:35Z

Description

This PR

Makes greedy the default for vllm, [before this PR random sampling was the default for vllm]
Introduces num_beams should enable beam search. [Even for the beam search, the generated sequence will be 1. It will be greater than 1 only if n > 1. n in equivalent to num_return_sequences in huggingface]
Makes default value of max_new_tokens to be 30. [Before this PR, it was 16]

serving/docs/lmi/lmi_input_output_schema.md

siddvenk · 2024-03-13T00:13:58Z

engines/python/setup/djl_python/rolling_batch/vllm_rolling_batch.py

@@ -86,15 +86,19 @@ def translate_vllm_params(self, parameters: dict) -> dict:

        :return: The same parameters dict, but with VLLM style parameter names.
        """
-        parameters.pop('do_sample', None)
+        parameters["max_tokens"] = parameters.pop("max_new_tokens", 30)


why 30? is this the same default for other backends as well?

Yes, this is the same default for other backends in our handlers.

siddvenk · 2024-03-13T00:15:50Z

engines/python/setup/djl_python/rolling_batch/vllm_rolling_batch.py

        if "seed" in parameters.keys():
            parameters["seed"] = int(parameters["seed"])
-        if "max_new_tokens" in parameters.keys():
-            parameters["max_tokens"] = parameters.pop("max_new_tokens")
+        if not parameters.pop('do_sample', False):


is do_sample also an alias for sampling with some default temperature in other backends? if so, are the default values for temperature when do_sample is set in parity across backends?

+1, we probably want to define the do sample behavior

Yes, do_sample means sampling here. But the default for temperature for greedy in other backends is 1.0.

And vllm seems to do sampling only when temperature is less than 1.0. I guess, they go for the logic that settign temp=0, means disabling the randomness, so they choose greedy when temp is set to 0.

vLLM chooses Sampling as their default method for choosing next token. But all other frameworks has Greedy as their default. So to unify our handlers behavior, here we are trying to make the sampling method in parity. But yes, default temp value would not be the same as other backends.

I see. It can be in another PR, or it might not be feasible at all, but if the default sampling behavior is different across backends when do_sample=True, that could cause confusion when users switch between backends. I wonder if it's possible (or woth it) to try to unify the behavior when do_sample=True across all backends (meaning same temperature/other params set to same value)

Agreed, Will write an internal quip doc for it.

vllm implements greedy sampling differently internally, but it just uses temperature of 0 as the API instead of adding a separate parameter for it.

https://github.com/vllm-project/vllm/blob/fb96c1e98c05ffa35dd48416f68e88edb2f9eb34/vllm/sampling_params.py#L235-L236

https://github.com/vllm-project/vllm/blob/fb96c1e98c05ffa35dd48416f68e88edb2f9eb34/vllm/model_executor/layers/sampler.py#L396-L448

sindhuvahinis requested review from zachgk, frankfliu and a team as code owners March 13, 2024 00:09

siddvenk reviewed Mar 13, 2024

View reviewed changes

sindhuvahinis force-pushed the vllm branch 3 times, most recently from 0b3ae63 to 5ae924f Compare March 13, 2024 01:50

siddvenk approved these changes Mar 13, 2024

View reviewed changes

sindhuvahinis added 2 commits March 13, 2024 14:31

[UX] sampling with vllm

5a71880

review changes

380b41a

sindhuvahinis force-pushed the vllm branch from e7e4bee to 380b41a Compare March 13, 2024 21:32

sindhuvahinis merged commit 650a5fe into deepjavalibrary:master Mar 13, 2024
8 checks passed

sindhuvahinis deleted the vllm branch April 4, 2024 17:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[UX] sampling with vllm #1624

[UX] sampling with vllm #1624

sindhuvahinis commented Mar 13, 2024

siddvenk Mar 13, 2024

sindhuvahinis Mar 13, 2024

siddvenk Mar 13, 2024

lanking520 Mar 13, 2024

sindhuvahinis Mar 13, 2024

siddvenk Mar 13, 2024

sindhuvahinis Mar 13, 2024

davidthomas426 Mar 16, 2024

[UX] sampling with vllm #1624

[UX] sampling with vllm #1624

Conversation

sindhuvahinis commented Mar 13, 2024

Description

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment