Skip to content

Conversation

@zacharyblank
Copy link

This PR makes it possible to define custom logits processors to alter the probability of token generation based on user defined code. This allows for the vLLM OpenAI server to accept requests with logit_bias too.

The BiasLogitsProcessor is included specifically for the OpenAI server to handle requests with logit_bias.


if logits_processors is not None:
for logits_processor in logits_processors:
logits = logits_processor(logits, output_tokens)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this call, you send output_tokens to logits_processor(). However, in the LogitsProcessor interface, the output_tokens parameter does not exist:

def __call__(self, logits: torch.tensor) -> torch.tensor:

How does it work?

@simon-mo
Copy link
Collaborator

simon-mo commented Nov 30, 2023

Thank you for your PR! Now #1469 is merged with logits_processors API, can you help rebase and use that instead? The logit_bias support in OpenAI API will still be very useful for vLLM users.

@Qubitium
Copy link
Contributor

Qubitium commented Feb 5, 2024

@zacharyblank Any update? This will be greate for advanced users. vllm/outlines is nice but nothing beats pure code based state machine when it comes to performance.

@esmeetu
Copy link
Member

esmeetu commented Mar 25, 2024

Thanks for your contribution. Close this PR as this feature has been supported in #3027.

@esmeetu esmeetu closed this Mar 25, 2024
rickyyx added a commit to rickyyx/vllm that referenced this pull request Oct 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants