Skip to content

Commit

Permalink
[Frontend] Make beam search emulator temperature modifiable (vllm-pro…
Browse files Browse the repository at this point in the history
…ject#8928)

Co-authored-by: Eduard Balzin <nfunctor@yahoo.fr>
Signed-off-by: Amit Garg <mitgarg17495@gmail.com>
  • Loading branch information
2 people authored and garg-amit committed Oct 28, 2024
1 parent b74af93 commit 1510539
Showing 1 changed file with 3 additions and 1 deletion.
4 changes: 3 additions & 1 deletion vllm/entrypoints/llm.py
Original file line number Diff line number Diff line change
Expand Up @@ -396,6 +396,7 @@ def beam_search(
beam_width: int,
max_tokens: int,
ignore_eos: bool = False,
temperature: float = 0.0,
) -> List[BeamSearchOutput]:
"""
Generate sequences using beam search.
Expand All @@ -405,6 +406,7 @@ def beam_search(
of token IDs.
beam_width: The number of beams to keep at each step.
max_tokens: The max number of tokens to generate for each prompt.
temperature: The temperature to use for generation.
TODO: how does beam search work together with length penalty, frequency
penalty, and stopping criteria, etc.?
Expand All @@ -416,7 +418,7 @@ def beam_search(
# at https://github.com/huggingface/transformers/blob/e15687fffe5c9d20598a19aeab721ae0a7580f8a/src/transformers/generation/beam_search.py#L534 # noqa
beam_search_params = SamplingParams(logprobs=2 * beam_width,
max_tokens=1,
temperature=0.0)
temperature=temperature)
instances: List[BeamSearchInstance] = []

for prompt in prompts:
Expand Down

0 comments on commit 1510539

Please sign in to comment.