Description
Prerequisites
- [✅] I reviewed the Discussions, and have a new bug or useful enhancement to share.
Feature Idea
If we want to efficiently bias against specific words that are made up of multiple tokens or 'ban' them, as well as short phrases, checking the logit list to see if the other predictions imply the 'full word' or 'full phrase' could be very beneficial. Currently, there is a limitation of predicting single tokens at a time; this means the decision on whether or not to pick a token based on context clues (e.g short synonyms instead of the first piece of a larger word) would be beneficial as there would be no overhead from 'rewinding' or reprocessing context.
A related draft PR exists which is dedicated towards implementing a 'rewind' feature for a sequence repetition penalty option. This could be very beneficial for longer phrases that can't be accurately 'predicted' ahead of time:
#2593
But I don't see any PR that attempts to tackle the issue in a way that doesn't incur performance overheads of some kind from having to regenerate tokens.
I have visually drafted out this conditional biasing concept in hopes that anyone working on a similar feature might be willing to help on this idea.
In addition, you could theoretically implement this in such a way where if you are biasing against a continued phrase or sentence, you gradually bias it for each consecutive word. For example, let's say you want to avoid this sentence from being referenced in any way:
"The quick brown fox jumps over the lazy dog."
Individually, these could still be considered typical tokens; the bias would only be introduced if a repeated sequence order is seen based on the frequency of those words.
"The" by itself shouldn't be impacted for obvious reasons; but a small bias against 'quick' could be introduced if the word preceding it was 'The'. For 'brown', you could bias the probability more aggressively and so on.
For every token that is breaking out of the 'banned sequence', you could ease off the biasing until it returns back to zero.
Doing this by hand would be tedious; maybe an automatic calculation that judges the rarest portions of the 'banned phrases' and weighs them proportionally to the rest of the temperature would be a better move for a 'phrase ban list'?
In addition, it doesn't necessarily have to be followed exactly in order to trigger the 'ban' as you could proportionally penalize more generic phrases like 'jumps over the' less than others. 'quick brown fox' might have a stronger negative bias, for example.