Training only steering vectors can match the performance of fully fine-tuned models trained with GRPO-style method.
It’s substantially more efficient in memory use and training time.
| Metric | Full-Tune | Steering |
|---|---|---|
| Number of Parameters | 14.7 B | 245 K |
| Optimizer Memory | 13.8 GB | 240 KB |
| Per-step Time | 9.94 s | 0.11 s |
| Overall Time | 52 m | 34 s |
The resulting vectors are interpretable; see our paper, “Small Vectors, Big Effects: A Mechanistic Study of RL-Induced Reasoning via Steering Vectors”, for details.
We explain below how to train, evaluate, visualize, and run the auxiliary experiments in this repo. All commands are intended to be run from the repository root.
Model path: place your base model under
/from_s3/models/<MODEL_NAME>
- Example:
/from_s3/models/Qwen2.5-Math-7B
Set CONFIG_PATH environment variable. Configs live under configs/train/rl/ and are organized by model and dataset.
File names correspond to the setup type (e.g., steering.yml).
- Example: Qwen2.5-Math-7B + DeepScaleR, steering vectors only
configs/train/rl/qwen2.5-math-7b/deepscaler/steering.yml
Single node:
bash /workspace/bin/train/rl/run_master.shMulti-node (distributed):
-
Set
IS_DIST=trueon every node (master and workers). -
Start the master with:
bash /workspace/bin/train/rl/run_master.sh
-
On each worker node, run:
bash /workspace/bin/train/rl/run_worker.sh
Outputs: trained checkpoints are written to train_output/.
Note (steering setups): The training scripts auto-detect
steeringsetups and patch thetransformersandvllmmodel code to enable a bias term on the MLPdown_projlinear layer. This is done via:bin/helpers/modify_bias.sh
-
Place the model to evaluate under:
/from_s3/model/ -
Pick a config from:
configs/eval/ -
Run vanilla evaluation:
# Set whether the model uses steering vectors # IS_STEERING=true -> steering-vector models # IS_STEERING=false -> LoRA or fully-tuned models IS_STEERING=true CONFIG_PATH=... bash bin/eval/vanilla/run_inner.sh
Outputs: results are written to results/.
Located in bin/eval/:
add_place— Appendix C in “Small Vectors, Big Effects: …”exchange_svs— Section 9 in “Small Vectors, Big Effects: …”magnitude— unpublishedpair_single— Appendix D in “Small Vectors, Big Effects: …”patch_head— Sections 6 & 7 in “Small Vectors, Big Effects: …”patch_head_path— unpublished
Put evaluation results in results/, then run the desired script from bin/visualize/:
accuracies_table— Table 1 in “Steering LLM Reasoning …”exchange_svs— Table 1 in “Small Vectors, Big Effects: …”layers— Figures 1, 9, 10, 11 in “Small Vectors, Big Effects: …”magnitude— unpublishedpair_layers— Figure 13 in “Small Vectors, Big Effects: …”patch_head— Figures 5, 20, 21 in “Small Vectors, Big Effects: …”seed_alignment— unpublished
bin/metrics/ contains one-shot experiment scripts that evaluate and visualize in a single Python program:
add_place_steering— unpublishedlast_layer_steering— Figures 3 & 19 in “Small Vectors, Big Effects: …”logit_lens— unpublishedpre_last_layer_steering— Figure 6 in “Small Vectors, Big Effects: …”self_explain— unpublishedmatch_effect— Figures 2, 4, 14, 15, 17, 18 in “Small Vectors, Big Effects: …”lora1_plots— Appendix S
To export a 2D matrix stacking single-layer steering vectors from a model trained with steering vectors:
# Model location for extraction
# /from_s3/model/
bash bin/helpers/extract_steering_vectors.sh
# -> Saves outputs to: results/Optionally merge single-layer vectors:
bash bin/helpers/merge_vectors_from_layers.shIS_DIST— set totrueon all nodes for distributed training.IS_STEERING— set totruewhen evaluating models trained with steering vectors; set tofalsefor LoRA or fully-tuned models.CONFIG_PATH-- the path to the config file for training or evaluation.
The code expects models, eval results, and extracted steering vectors to follow a fixed directory structure. It’s formed in steering_reasoning/train/rl/config.py when setting output_dir. After running eval or extraction, keep the same stem as the trained model when uploading to your storage node. For example,
- a trained model is saved to
.../trained_models/Qwen2.5-Math-7B/deepscaler/steering/seed-0/checkpoint-159/; - eval results to
.../eval/Qwen2.5-Math-7B/deepscaler/steering/seed-0/checkpoint-159/temp_1.0_top_p_1.0/eval_seed-0/; - and extracted steering vectors to
.../steering_vectors/Qwen2.5-Math-7B/deepscaler/steering/seed-0/checkpoint-159.
If you don’t preserve this layout, some visualization and auxiliary scripts may fail.
The initial implementation of the online training code was developed by Alexey Malahov and Almaz Dautov.
@article{sinii2025steering,
title={Steering LLM Reasoning Through Bias-Only Adaptation},
author={Sinii, Viacheslav and Gorbatovski, Alexey and Cherepanov, Artem and Shaposhnikov, Boris and Balagansky, Nikita and Gavrilov, Daniil},
journal={arXiv preprint arXiv:2505.18706},
year={2025}
}
@article{sinii2025small,
title={Small Vectors, Big Effects: A Mechanistic Study of RL-Induced Reasoning via Steering Vectors},
author={Sinii, Viacheslav and Balagansky, Nikita and Aksenov, Yaroslav and Kurochkin, Vadim and Laptev, Daniil and Gerasimov, Gleb and Gorbatovski, Alexey and Shaposhnikov, Boris and Gavrilov, Daniil},
journal={arXiv preprint arXiv:2509.06608},
year={2025}
}
