Is anyone working on 'conditional' logit biasing? (How to potentially improve repetition penalty?)

# Prerequisites

- [✅] I reviewed the [Discussions](https://github.com/ggerganov/llama.cpp/discussions), and have a new bug or useful enhancement to share.

# Feature Idea

If we want to efficiently bias against specific words that are made up of multiple tokens or 'ban' them, as well as short phrases, checking the logit list to see if the other predictions imply the 'full word' or 'full phrase' could be very  beneficial. Currently, there is a limitation of predicting single tokens at a time; this means the decision on whether or not to pick a token based on context clues (e.g short synonyms instead of the first piece of a larger word) would be beneficial as there would be no overhead from 'rewinding' or reprocessing context.

A related draft PR exists which is dedicated towards implementing a 'rewind' feature for a sequence repetition penalty option. This could be very beneficial for longer phrases that can't be accurately 'predicted' ahead of time:
https://github.com/ggerganov/llama.cpp/pull/2593

But I don't see any PR that attempts to tackle the issue in a way that doesn't incur performance overheads of some kind from having to regenerate tokens.

![image](https://github.com/ggerganov/llama.cpp/assets/66376113/c08c40cc-a807-4556-bcbd-68f4e5de721b)

I have visually drafted out this conditional biasing concept in hopes that anyone working on a similar feature might be willing to help on this idea.

In addition, you could theoretically implement this in such a way where if you are biasing against a continued phrase or sentence, you gradually bias it for each consecutive word. For example, let's say you want to avoid this sentence from being referenced in any way:

> "The quick brown fox jumps over the lazy dog."

Individually, these could still be considered typical tokens; the bias would only be introduced if a repeated sequence order is seen based on the frequency of those words.

"The" by itself shouldn't be impacted for obvious reasons; but a small bias against 'quick' could be introduced if the word preceding it was 'The'. For 'brown', you could bias the probability more aggressively and so on. 
For every token that is breaking out of the 'banned sequence', you could ease off the biasing until it returns back to zero.

Doing this by hand would be tedious; maybe an automatic calculation that judges the rarest portions of the 'banned phrases' and weighs them proportionally to the rest of the temperature would be a better move for a 'phrase ban list'?

In addition, it doesn't necessarily have to be followed exactly in order to trigger the 'ban' as you could proportionally penalize more generic phrases like 'jumps over the' less than others. 'quick brown fox' might have a stronger negative bias, for example.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Is anyone working on 'conditional' logit biasing? (How to potentially improve repetition penalty?) #3149

Prerequisites

Feature Idea

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Is anyone working on 'conditional' logit biasing? (How to potentially improve repetition penalty?) #3149

Description

Prerequisites

Feature Idea

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions