[V0][Sampler] Use raw logits for greedy argmax #13312

njhill · 2025-02-14T23:49:03Z

To hopefully avoid some of the reported precision-related nondeterminism.

Also delete a vestigial intermediate method.

To hopefully avoid some of the reported precision-related nondeterminism. Signed-off-by: Nick Hill <nhill@redhat.com>

github-actions · 2025-02-14T23:49:14Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

patrickvonplaten · 2025-03-07T13:21:21Z

Is this fix not necessary for the v1 sampler?

njhill · 2025-03-15T02:03:01Z

Is this fix not necessary for the v1 sampler?

@patrickvonplaten I don't think so, we were already using logits rather than logprobs for this in V1:

vllm/vllm/v1/sample/sampler.py

Lines 77 to 78 in 9f37422

    
           def greedy_sample(self, logits: torch.Tensor) -> torch.Tensor: 
        
               return logits.argmax(dim=-1).view(-1)

tonyaw · 2025-04-09T02:51:54Z

@njhill, may I ask when this PR will be merged? Any target release? Thanks in advance!

njhill · 2025-04-09T05:11:36Z

@tonyaw there are test failures that need investigating: https://buildkite.com/vllm/ci/builds/15883#0195b060-d27c-4bd5-b435-c495d3709d24, any help would be welcome!

gx16377 · 2025-04-09T11:31:20Z

@njhill Hi, may I ask why argmax(logits) is more stable than argmax(logprobs)?

tonyaw · 2025-04-09T13:11:40Z

@njhill Hi, may I ask why argmax(logits) is more stable than argmax(logprobs)?

@njhill, I also want to know the reason, could you please help to explain? :-)

njhill · 2025-04-09T14:08:59Z

@gx16377 @tonyaw logprob is softmax of the logits which is a nonlinear projection of the entire floating point range into the range [0,1]. So it essentially reduces precision and many token values will end up tied that weren't beforehand, including values tied for first place, and argmax may select from these arbitrarily (e.g. could vary by batch size).

gx16377 · 2025-04-10T04:43:33Z

@gx16377 @tonyaw logprob is softmax of the logits which is a nonlinear projection of the entire floating point range into the range [0,1]. So it essentially reduces precision and many token values will end up tied that weren't beforehand, including values tied for first place, and argmax may select from these arbitrarily (e.g. could vary by batch size).

Thank you

tonyaw · 2025-04-10T09:47:46Z

@gx16377 @tonyaw logprob is softmax of the logits which is a nonlinear projection of the entire floating point range into the range [0,1]. So it essentially reduces precision and many token values will end up tied that weren't beforehand, including values tied for first place, and argmax may select from these arbitrarily (e.g. could vary by batch size).

Thank you

Thanks!
Could you please explain more about "could vary by batch size"?
Does "batch size" here mean concurrent request count or sth else?
If it is "concurrent request count", could you please explain more why it is related to softmax calculation?

njhill · 2025-04-10T17:27:00Z

@tonyaw I mean argmax. May not be deterministic when there are tied max values. And yes batch size when multiple requests are processed together.

tonyaw · 2025-04-11T06:05:07Z

@njhill Thanks!
In vllm, all the calculation is done by matrix.
When multiple requests are processed together, logits of different requests are put into one matrix, so it is possible that calculation of logits of different requests may impact each other.
Is my understanding right? :-)

saikrishb · 2025-04-11T09:11:30Z

I mean argmax. May not be deterministic when there are tied max values.

@njhill torch documentation says argmax will pick the lowest index. At least for the case of multiple max logits, this commit will make sampling deterministic instead of picking one of the max values as mentioned here correct?

[V0][Sampler] Use raw logits for greedy argmax

ddd7620

To hopefully avoid some of the reported precision-related nondeterminism. Signed-off-by: Nick Hill <nhill@redhat.com>

njhill mentioned this pull request Feb 14, 2025

[Bugfix] [Core] Fix zero temperature case (#5404 and part of #5898) #12802

Open

njhill marked this pull request as ready for review February 15, 2025 00:09

njhill requested review from zhuohan123, youkaichao, alexm-redhat and comaniac as code owners February 15, 2025 00:09

Merge remote-tracking branch 'origin/main' into greedy-logit-argmax

c882516

njhill added the ready ONLY add when PR is ready to merge/full CI is needed label Mar 19, 2025

njhill added the v0 label Mar 29, 2025

njhill mentioned this pull request Apr 7, 2025

[Bug]: topk=1 and temperature=0 cause different output in vllm #5404

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[V0][Sampler] Use raw logits for greedy argmax #13312

[V0][Sampler] Use raw logits for greedy argmax #13312

Uh oh!

njhill commented Feb 14, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Feb 14, 2025

Uh oh!

patrickvonplaten commented Mar 7, 2025

Uh oh!

njhill commented Mar 15, 2025

Uh oh!

tonyaw commented Apr 9, 2025

Uh oh!

njhill commented Apr 9, 2025

Uh oh!

gx16377 commented Apr 9, 2025

Uh oh!

tonyaw commented Apr 9, 2025

Uh oh!

njhill commented Apr 9, 2025

Uh oh!

gx16377 commented Apr 10, 2025

Uh oh!

tonyaw commented Apr 10, 2025 •

edited by njhill

Loading

Uh oh!

njhill commented Apr 10, 2025

Uh oh!

tonyaw commented Apr 11, 2025

Uh oh!

saikrishb commented Apr 11, 2025

Uh oh!

Uh oh!

Uh oh!

[V0][Sampler] Use raw logits for greedy argmax #13312

Are you sure you want to change the base?

[V0][Sampler] Use raw logits for greedy argmax #13312

Uh oh!

Conversation

njhill commented Feb 14, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Feb 14, 2025

Uh oh!

patrickvonplaten commented Mar 7, 2025

Uh oh!

njhill commented Mar 15, 2025

Uh oh!

tonyaw commented Apr 9, 2025

Uh oh!

njhill commented Apr 9, 2025

Uh oh!

gx16377 commented Apr 9, 2025

Uh oh!

tonyaw commented Apr 9, 2025

Uh oh!

njhill commented Apr 9, 2025

Uh oh!

gx16377 commented Apr 10, 2025

Uh oh!

tonyaw commented Apr 10, 2025 • edited by njhill Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

njhill commented Apr 10, 2025

Uh oh!

tonyaw commented Apr 11, 2025

Uh oh!

saikrishb commented Apr 11, 2025

Uh oh!

Uh oh!

njhill commented Feb 14, 2025 •

edited by github-actions bot

Loading

tonyaw commented Apr 10, 2025 •

edited by njhill

Loading