DIRECTER: Enhancing Instruction Following of LLMs via Activation Steering with Dynamic Rejection

Overview

DIRECTER is a novel inference-time activation steering method designed to significantly improve how Large Language Models (LLMs) follow complex instructions while mitigating the common risk of "oversteering."

While activation steering techniques can effectively force models to adhere to constraints, they often suffer from a trade-off: excessive emphasis on the instruction can degrade the overall coherence and quality of the generated text. DIRECTER solves this by dynamically modulating steering strength at every decoding step.

Key Mechanism

DIRECTER couples KV cache steering with a plausibility-guided decoding loop. At each step, the method:

Steers: Tentatively amplifies the "Key" vectors in the KV cache associated with the instruction.
Checks Plausibility: Compares the steered output distribution against the raw model's distribution.
Modulates: If the steered output is deemed implausible (deviates too far from the model's natural distribution), DIRECTER progressively reduces the steering strength by removing layers from the intervention set.

This process is guided by a lightweight, one-time Sensitivity Analysis that ranks layers based on their influence, ensuring that the most effective layers are prioritized.

Code Release

The official implementation code will be released soon.

We are currently preparing the codebase for public release. Please watch this repository for updates.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
assets		assets
README.md		README.md
index.html		index.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

DIRECTER: Enhancing Instruction Following of LLMs via Activation Steering with Dynamic Rejection

Overview

Key Mechanism

Code Release

About

Uh oh!

Releases

Packages

Languages

mjk0618/directer

Folders and files

Latest commit

History

Repository files navigation

DIRECTER: Enhancing Instruction Following of LLMs via Activation Steering with Dynamic Rejection

Overview

Key Mechanism

Code Release

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages