Skip to content

Conversation

@thiswillbeyourgithub
Copy link
Contributor

Hi!

I'm the dev who made a research fork of repeng to explore some questions about what's possible with repeng and repeng-like techniques. I first mentioned my ideas in these messages

The project is still ongoing but as I kept adding more and more little enhancements it felt a good idea to try to upstream some of them.

Notably, added a utils.py file with function make datasets, joblib caching for the activations, support for chat templates (with auto correction), l1/l2 norm, documentation, more controls on which layer to trigger, numpy v2 compatibility.

Note that I couldn't run the test file because of import issues with flash attention.

I also did not add any modules to the requirements yet.

I mainly wanted your feedback on this.

Also thanks a lot for making repeng, it's really nice and really allowed me to get my hands dirty!

Commits:

  • add file settings.py to store settings
  • add a utils.py file that contains a make_dataset function
  • move DatasetEntry to utils.py
  • feat: give more controls to how the user can select the layers to modify
  • feat: add joblib memory for caching model activations
  • fix: numpy v2 compatibility
  • feat: add arg norm_type when training that allows setting l1 or l2 (or other) norm
  • feat: add function to accept chat template as inputs + autocorrect
  • fix: forgot an import
  • minor: add tqdm for getting activations and applying the new directions
  • perf: potentially faster one liner to tokenize
  • feat: use flag LOW_MEMORY to reduce the amount of memory needed when storing the arrays before computing directions
  • fix: imports
  • update default model in example from mistral 7B v0.1 to v0.3
  • docs: add more details on how to load the model, including quantization, avoiding OOM, etc
  • docs: use chat messages in example
  • docs: in example, show how to login for models that require it
  • docs: add missing declaration of tokenizer
  • fix: in example, move the tokens directly to the correct device
  • minor: changed default in the example
  • docs: add link to github issue about OOM for gguf files
  • docs: add more examples of models
  • fix: autocorrect chat templates

allows to set verbosity, low_memory flags etc

Signed-off-by: thiswillbeyourgithub <26625900+thiswillbeyourgithub@users.noreply.github.com>
Signed-off-by: thiswillbeyourgithub <26625900+thiswillbeyourgithub@users.noreply.github.com>
Signed-off-by: thiswillbeyourgithub <26625900+thiswillbeyourgithub@users.noreply.github.com>
now the layer_ids arg can be either a list of int (the index of the
layers), or "all" for all layers, "middle" for the middle half,
"only_middle" for only one layer that is at the middle, or ranges in the
form "0.3-0.7" (=layers from 30% of the depth to 70%)

Signed-off-by: thiswillbeyourgithub <26625900+thiswillbeyourgithub@users.noreply.github.com>
Signed-off-by: thiswillbeyourgithub <26625900+thiswillbeyourgithub@users.noreply.github.com>
should close vgel#51

Signed-off-by: thiswillbeyourgithub <26625900+thiswillbeyourgithub@users.noreply.github.com>
…r other) norm

Signed-off-by: thiswillbeyourgithub <26625900+thiswillbeyourgithub@users.noreply.github.com>
'autocorrect' meaning that we check that each message is indeed present
in the output after applying the chat template. This was needed as
there's a lot of cases where models (including official releases from
GAFAM!) have sketchy implementation that drop system prompts silently
etc

Signed-off-by: thiswillbeyourgithub <26625900+thiswillbeyourgithub@users.noreply.github.com>
Signed-off-by: thiswillbeyourgithub <26625900+thiswillbeyourgithub@users.noreply.github.com>
Signed-off-by: thiswillbeyourgithub <26625900+thiswillbeyourgithub@users.noreply.github.com>
Signed-off-by: thiswillbeyourgithub <26625900+thiswillbeyourgithub@users.noreply.github.com>
…storing the arrays before computing directions

Signed-off-by: thiswillbeyourgithub <26625900+thiswillbeyourgithub@users.noreply.github.com>
Signed-off-by: thiswillbeyourgithub <26625900+thiswillbeyourgithub@users.noreply.github.com>
Signed-off-by: thiswillbeyourgithub <26625900+thiswillbeyourgithub@users.noreply.github.com>
…on, avoiding OOM, etc

Signed-off-by: thiswillbeyourgithub <26625900+thiswillbeyourgithub@users.noreply.github.com>
Signed-off-by: thiswillbeyourgithub <26625900+thiswillbeyourgithub@users.noreply.github.com>
Signed-off-by: thiswillbeyourgithub <26625900+thiswillbeyourgithub@users.noreply.github.com>
Signed-off-by: thiswillbeyourgithub <26625900+thiswillbeyourgithub@users.noreply.github.com>
Signed-off-by: thiswillbeyourgithub <26625900+thiswillbeyourgithub@users.noreply.github.com>
Signed-off-by: thiswillbeyourgithub <26625900+thiswillbeyourgithub@users.noreply.github.com>
Signed-off-by: thiswillbeyourgithub <26625900+thiswillbeyourgithub@users.noreply.github.com>
Signed-off-by: thiswillbeyourgithub <26625900+thiswillbeyourgithub@users.noreply.github.com>
Signed-off-by: thiswillbeyourgithub
<26625900+thiswillbeyourgithub@users.noreply.github.com>
Signed-off-by: thiswillbeyourgithub <26625900+thiswillbeyourgithub@users.noreply.github.com>
Signed-off-by: thiswillbeyourgithub <26625900+thiswillbeyourgithub@users.noreply.github.com>
Signed-off-by: thiswillbeyourgithub <26625900+thiswillbeyourgithub@users.noreply.github.com>
@vgel
Copy link
Owner

vgel commented Dec 14, 2024

This is really cool! My first thought is there's probably too much going on here for one PR, but let me look it over more closely and figure out how we can split this up. I've also been considering adding a utils file for a similar reason as you--I might make a PR for that soon-ish and tag you in.

@vgel
Copy link
Owner

vgel commented Dec 14, 2024

I'm also currently migrating the library from poetry to uv, so no rush on adding dependencies!

@thiswillbeyourgithub
Copy link
Contributor Author

thiswillbeyourgithub commented Dec 14, 2024

This is really cool! My first thought is there's probably too much going on here for one PR, but let me look it over more closely and figure out how we can split this up. I've also been considering adding a utils file for a similar reason as you--I might make a PR for that soon-ish and tag you in.

Glad you lile it! I take this opportunity to thank you once again for making repeng, it's been super helpful to get me started on many ideas I had.

I'm fine with splitting this up but I do care about my contribution being credited as I'm a self taught medical student (soon psychiatry resident!) so I want my interests and contributions to appear on my github for exposition :)

In any case I keep breaking and unbreaking my fork as I'm continuously experimenting (since yesterday I've been playing around with filtering samples that are retrievable after a 2 cluster kmeans on the umap projection and I feel good about this one!). Appart from the read_representation method I think the code is mostly stable. I'm sometimes unsure about my chat templating thingy but that might be because of errors in the original templates of their repo too so would appreciate a second look on this if you have the bandwith.

Also I couldn't run the tests on my machine, I'm thinking it would really help to know if my modifications are stable and with no side effects

edit: actually I was able to run the tests. But as I made extensive changes I'll keep them so far in my fork.

@vgel
Copy link
Owner

vgel commented Jan 8, 2025

definitely, will absolutely make sure you're credited on anything that gets merged :-)

@vgel
Copy link
Owner

vgel commented Jan 8, 2025

here, a good initial thing to split off would be the ability to pass a string for layer_ids in ControlModel.__init__ . i think we should have three options:

  • a list / range object, like we currently have
  • "all"
  • "middle", which should keep the middle 2/3 (instead of the middle half as it does now)

then in the docstring, we should suggest people use "middle" as the default. i don't think we should support None, because people should be explicit about what layers are being controlled. and i think "only_middle" and the range stuff are marginal enough that we don't need to support them by default--people can always calculate it themselves if they need it, or we can add it later.

if you could split that off into a new PR, i can help make sure the tests are passing etc so we can get it merged, if that sounds good?

@thiswillbeyourgithub
Copy link
Contributor Author

Sounds good to me, thanks! I'll do that when I have some time in the next week

@thiswillbeyourgithub
Copy link
Contributor Author

thiswillbeyourgithub commented Jul 28, 2025

Hi, nearly 8 months later I just wanted to tell that I have not forgotten at all about all this. I refined the idea in my head and have upped my python skills so the PR project is still on.

I also wanted to share this new idea I had:

  • I intend to make a clean PR to repeng to improve it
  • ask the model to estimate its IQ. As IQ is defined by gaussian statistics, it would be nice to see that value move depending on which layer was repeng'ed on a dump<->smart vector. This would be a way to validate the depth most impacted by a model.
    • Other idea: on a young<->old vector, we could ask it to estimate its age.
    • At least those two ideas would allow automatic and objectively measuring the impact of repeng on the behavior
  • then map out the parameter space of how well the repeng vector works (i.e. visually show how much the vector at depth X moves the IQ/age estimation if we give the vector a strength of 1).
  • then do so for each other techniques (PCA, UMAP, with or without scaling, etc).
  • then do so for a few other models for comparison

Let me know what you think of this!

@thiswillbeyourgithub
Copy link
Contributor Author

Hello everyone, I created the PR #65 that hopefully implements most of the features I needed. Notably:

It mainly brings 3 features:

  1. The hugging face model's template is respected if possible, making it easier to try a different model without having to dig into the intricacies of its template.
  2. It is now possible to supply the examples in the form of familiar chat messages (list of dicts, with role and content).
  3. It is now possible to specify layer_zones instead of layer_ids. For example [[0.1, 0.5]] means we control the layers whose relative depth is between 0.1 (included) and 0.5 (not included).

Hence I'm closing this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants