What was the "latest methodology" in the v3s?

In https://huggingface.co/failspy/Meta-Llama-3-70B-Instruct-abliterated-v3.5 you mentioned a new methodology, but what changed that made it so much more effective?  For a while I've been trying to reproduce this (originally with Llama 3 and now with 3.1, both 8B and 70B).  With Llama 3.1 70B I have to edit layers 10 through 40, and it gets less effective as I narrow the range further.

The only way I've been able to get a decent effect from just a single layer is by multiplying the direction by about 1.5 after normalization.  You mentioned somewhere that you did something that sounds similar.  On Llama 3.1 8B I can get a good result by scaling the direction by 1.5 and applying it just to layer 11.  But that only worked for me when hooking activations, I wasn't able to figure out how to bake that to the matrix (just scaling the direction when orthogonalizing didn't work).  I haven't tried it with the 70B.

Was I accidentally on the right track with scaling the directions, or was there something else?  Nothing else I've tried (layer selection, sampling different tokens, varying and mixing training sets) has worked with fewer than about 7 layers on 8B and 30 layers on 70B.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What was the "latest methodology" in the v3s? #27

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

What was the "latest methodology" in the v3s? #27

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions