Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't re-use logits processors in SequenceGeneratorAdapter, copy them #1160

Merged
merged 1 commit into from
Sep 23, 2024

Conversation

lapp0
Copy link
Contributor

@lapp0 lapp0 commented Sep 17, 2024

Fixes #1109

Problem

Logits processors can't be reused across multiple generation runs and must be copied. SequenceGeneratorAdapter wasn't respecting this requirement. The result is that in generator = generate.choice(...), the generator can only be used once.

During inference, logits processors are called with a sequence of input_ids + output_ids. Since structured generation only applies to output_ids, we track the length of input_ids on the first call and treat subsequent tokens as output_ids. However, this approach fails when the input_ids sequence changes.

Solution

copy the logits processors for each SequenceGeneratorAdapter.__call__(...), ensuring they can correctly determine the start of output_ids.

Further Work

Logits processors requires more documentation. Especially since vLLM contributors have discussed replacing Outlines logits processors implementation with import outlines.processors.

@lapp0 lapp0 marked this pull request as ready for review September 17, 2024 22:08
@rlouf rlouf merged commit 77c6d67 into dottxt-ai:main Sep 23, 2024
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

llama_cpp - Multiple calls to 'choice' generator do not return results.
2 participants