Skip to content

Request: Allow for adjustments at the layer-level, for a practically two-fold increase in LLM handling ability by prompters #4843

Closed
@9a9o

Description

@9a9o

Feature Description

The project "Brain Hacking Chip" demonstrates a sophisticated, albeit conceptually simple method of manipulating LLM inference, for a powerful increase in obedience. It has great potential to practically double a prompter's ability to guide an LLM toward desirable behaviors, because it allows for a prompter to directly discourage undesirable behaviors, without implying those undesirable behaviors are even possibilities.

It is my understanding that this kind of feature is currently very difficult to implement into LLaMA-CPP.

Motivation

The "Brain Hacking Chip" project allows for negative prompts, which have been demonstrated by the creator to allow for immediate gains in model obedience. I think this is significant, because negative prompting is relatively intuitive and accessible, especially for non-technical prompters.

Negative prompts are especially useful when trying to discourage the LLM from undesirable behaviors via prompting, because it circumvents the "Don't think of a pink elephant" problem - wherein explicitly mentioning the thing the LLM shouldn't do, necessarily puts that idea into mind, and thus pollutes the LLM's inference with the implication that this undesired idea is a possibility in the first place.

It is akin to the difference between telling a child, "Eat the vegetables on your plate, but don't take the candy inside the jar next to your plate," and telling a child, "Eat the vegetables on your plate" and erasing the jar from existence.

If one's ability to command an LLM's behavior could be measured with a scalar, I'd say this could double it.

Possible Implementation

I don't understand the details outside the ideas of vector manipulation, I assume those details are elaborated upon in the repo.

But, as someone who has spent a lot of time trying to guide LLM behavior through prompting, I recognize this as an extremely powerful way to improve the consistency and usefulness LLMs for end users, and think the community could greatly benefit from these kinds of experiments being easier to implement into LLaMA-CPP.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions