Don't re-use logits processors in SequenceGeneratorAdapter, copy them #1160
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes #1109
Problem
Logits processors can't be reused across multiple generation runs and must be copied.
SequenceGeneratorAdapter
wasn't respecting this requirement. The result is that ingenerator = generate.choice(...)
, thegenerator
can only be used once.During inference, logits processors are called with a sequence of
input_ids + output_ids
. Since structured generation only applies tooutput_ids
, we track the length ofinput_ids
on the first call and treat subsequent tokens asoutput_ids
. However, this approach fails when theinput_ids
sequence changes.Solution
copy
the logits processors for eachSequenceGeneratorAdapter.__call__(...)
, ensuring they can correctly determine the start ofoutput_ids
.Further Work
Logits processors requires more documentation. Especially since vLLM contributors have discussed replacing Outlines logits processors implementation with
import outlines.processors
.