TransformerLens Dependency, Hidden States, and Hooks

There is some chatter regarding this library's over-reliant dependence on [TransformerLens](https://github.com/TransformerLensOrg/TransformerLens). (See https://github.com/FailSpy/abliterator/issues/10#issuecomment-2156118931)

I agree that it would be best to move away from TransformerLens (TL). One suggestion has been to use the technique from https://github.com/Sumandora/remove-refusals-with-transformers, which uses the "hidden_states" parameter offered from the Model HuggingFace transformer library.

Using `hidden_states` *might* be a good quick fix. It is mostly compatible across many models and presumably works for these purposes.

However, there are some quirks that deserve awareness. 
(Personally, it feels slightly "hacky" and limiting for future work, but I may be suffering "purism".) It also removes the ability to interact with the `resid_mid` between the attention and mlp in each block.

It seems there are slight discrepancies between the `resid_pre` and `resid_post` subcache tensor values in TL and the `hidden_states` in HF. It is a tiny difference, but seems to grow over layers. And then the `hidden_state[-1]` or  `last_hidden_state` is completely different from TL's last layer's `resid_post`. I have not been able to determine why.

If this is problematic method would be to roll-our-own hooks into the residual streams. I had started this prior to finding this library. This would be most direct (allow us to use the `resid_mid` too), but would probably require re-implementing the multi-model configurations found in TransformerLens.

I hereby open this up for discussion!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TransformerLens Dependency, Hidden States, and Hooks #24

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

TransformerLens Dependency, Hidden States, and Hooks #24

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions