Pivotal Token Search
-
Updated
May 7, 2025 - Python
Pivotal Token Search
Steering GPT2-EMGSD less biased & Generating stereotyped text with vanilla GPT2 without fine tuning or prompt engineering
Exploration of an alternative approach to extracting steering vectors. Instead of using the classical contrastive method we investigate whether comparing activations between a base model and its fine-tuned deceptive version reveals a more meaningful latent direction.
Add a description, image, and links to the steering-vector topic page so that developers can more easily learn about it.
To associate your repository with the steering-vector topic, visit your repo's landing page and select "manage topics."