Description
Feature Description
The project "Brain Hacking Chip" demonstrates a sophisticated, albeit conceptually simple method of manipulating LLM inference, for a powerful increase in obedience. It has great potential to practically double a prompter's ability to guide an LLM toward desirable behaviors, because it allows for a prompter to directly discourage undesirable behaviors, without implying those undesirable behaviors are even possibilities.
It is my understanding that this kind of feature is currently very difficult to implement into LLaMA-CPP.
Motivation
The "Brain Hacking Chip" project allows for negative prompts, which have been demonstrated by the creator to allow for immediate gains in model obedience. I think this is significant, because negative prompting is relatively intuitive and accessible, especially for non-technical prompters.
Negative prompts are especially useful when trying to discourage the LLM from undesirable behaviors via prompting, because it circumvents the "Don't think of a pink elephant" problem - wherein explicitly mentioning the thing the LLM shouldn't do, necessarily puts that idea into mind, and thus pollutes the LLM's inference with the implication that this undesired idea is a possibility in the first place.
It is akin to the difference between telling a child, "Eat the vegetables on your plate, but don't take the candy inside the jar next to your plate," and telling a child, "Eat the vegetables on your plate" and erasing the jar from existence.
If one's ability to command an LLM's behavior could be measured with a scalar, I'd say this could double it.
Possible Implementation
I don't understand the details outside the ideas of vector manipulation, I assume those details are elaborated upon in the repo.
But, as someone who has spent a lot of time trying to guide LLM behavior through prompting, I recognize this as an extremely powerful way to improve the consistency and usefulness LLMs for end users, and think the community could greatly benefit from these kinds of experiments being easier to implement into LLaMA-CPP.