-
Notifications
You must be signed in to change notification settings - Fork 53
feat: chat messages, model templates, layer zones #65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat: chat messages, model templates, layer zones #65
Conversation
Signed-off-by: thiswillbeyourgithub <26625900+thiswillbeyourgithub@users.noreply.github.com>
Signed-off-by: thiswillbeyourgithub <26625900+thiswillbeyourgithub@users.noreply.github.com>
Signed-off-by: thiswillbeyourgithub <26625900+thiswillbeyourgithub@users.noreply.github.com>
Signed-off-by: thiswillbeyourgithub <26625900+thiswillbeyourgithub@users.noreply.github.com>
Signed-off-by: thiswillbeyourgithub <26625900+thiswillbeyourgithub@users.noreply.github.com>
Signed-off-by: thiswillbeyourgithub <26625900+thiswillbeyourgithub@users.noreply.github.com>
Signed-off-by: thiswillbeyourgithub <26625900+thiswillbeyourgithub@users.noreply.github.com>
Signed-off-by: thiswillbeyourgithub <26625900+thiswillbeyourgithub@users.noreply.github.com>
Signed-off-by: thiswillbeyourgithub <26625900+thiswillbeyourgithub@users.noreply.github.com>
Signed-off-by: thiswillbeyourgithub <26625900+thiswillbeyourgithub@users.noreply.github.com>
Signed-off-by: thiswillbeyourgithub <26625900+thiswillbeyourgithub@users.noreply.github.com>
acde836 to
6e6139c
Compare
Signed-off-by: thiswillbeyourgithub <26625900+thiswillbeyourgithub@users.noreply.github.com>
…back Signed-off-by: thiswillbeyourgithub <26625900+thiswillbeyourgithub@users.noreply.github.com>
|
I found an edge case. In the original code of control.py to get the layer IDs you do something like this: layer_ids = range(-1, -model.config.num_hidden_layers, -1)
nlayers = len(layer_ids)
layer_ids = [i if i >= 0 else nlayers + i for i in layer_ids]I don't understand why you would do this instead of just The edge case arises when using layer_zones that include |
Signed-off-by: thiswillbeyourgithub <26625900+thiswillbeyourgithub@users.noreply.github.com>
Signed-off-by: thiswillbeyourgithub <26625900+thiswillbeyourgithub@users.noreply.github.com>
Signed-off-by: thiswillbeyourgithub <26625900+thiswillbeyourgithub@users.noreply.github.com>
Signed-off-by: thiswillbeyourgithub <26625900+thiswillbeyourgithub@users.noreply.github.com>
Signed-off-by: thiswillbeyourgithub <26625900+thiswillbeyourgithub@users.noreply.github.com>
Signed-off-by: thiswillbeyourgithub <26625900+thiswillbeyourgithub@users.noreply.github.com>
Signed-off-by: thiswillbeyourgithub <26625900+thiswillbeyourgithub@users.noreply.github.com>
Signed-off-by: thiswillbeyourgithub <26625900+thiswillbeyourgithub@users.noreply.github.com>
|
Thanks for these PR's I've had to make similar changes. A couple of comments:
To apply the model templates, I just did this |
Signed-off-by: thiswillbeyourgithub <26625900+thiswillbeyourgithub@users.noreply.github.com>
|
Hi! Thanks for taking the time. (note: there is a mistake in your message's formatting)
Can't wait to get vgel's take on this to know what I should improve to make it upstream. |
Yeah, that makes sense. And I think it's kind of an open question. For example, I've found that the default repeng method doesn't work so well with thinking models. But if you add "You are an honest person Let me think step by step" then it works better. There are other papers that use pairs of completions, instead of pairs of prompts. It seems to work worse... but it seems more elegant in theory. https://github.com/IBM/activation-steering & https://arxiv.org/abs/2308.10248 . There MUST be a way to get it to work reliably... but I haven't found it yet. The reason I think it's better is that you're not "leading the witness" by telling it that it is a good person, you're just giving it an example of outputs, and it should be much more general and less specific.
Same! https://github.com/wassname/activation_store I'm interested in what else you have noticed and tried, even if it's just anecdotal or tentative? |
|
These are some of the things I've noticed, and some of my current opinions
|
|
Also for this current PR, it includes two bugfixes which seem valuable FYI I've forked and made a bunch of similar changes to this repo (work with more models, and fix the position ids bug), but I haven't had time to tidy them up into PR's. I thought I would mention them here in case you to vgel revisits this, and want's to browse the changes #67 , it contains the following bugfix Improved handling of the shape of position_ids: |
Maybe a way would be :
Do you think it would work? I think my testing rig could maybe be useful to settle this.
The answer to this is in the README of my fork: here. But I haven't finished what I want to try. I want to re run the whole grid search on multiple models but am testing one last feature before that : do you have insights into how I should rescale the directions? What I mean is that when I extract the direction, I think it makes sense to try to make the magnitude of the direction match the activations in that layer. Because I have a feeling that umap tends do work but needs more strength than other methods which I find odd. So I added a rescaling argument but struggle to get decent results so far. Have you any thoughts about this?
Interesting. I just took a look and don't find it super obvious to figure out how to implement this in repeng so with my limited time I'll pass. But if you make a PR I'm super interested.
My main problem is that I have just one 1060 and one 1080 so training is absolutely out of the question. I'm budgeting for a 3090 though because I can't go on like this.
But in practice how can you find something it would say? Or you mean just to try to match the vibe or something? |
Hmm I think it could definitely work is training a rank1 LoRA (this is an alternative method to activation steering that people are starting to use, it's very powerful) because you are essentially training it to not need the system prompt. For activation steering, I'm not sure. Instead of training it to perform new behavior, you are trying to find meaningful representations of existing behavior and change then. Once you remove the system prompt, you might go from realistic to unrealistic behavior for the model. It's hard to predict, but worth trying imo. Perhaps the pairs could be (without sys prompt, same completion with sys prompt), and that way you are moving the hidden states towards the same behaviour without a sys prompt. (perhaps this is what you meant all along).
Yeah either look at the avg_logprob of a sequence (cheap), or you have it generate it's own completion (expensive).
Indeed, I am also GPU poooor single tear. I got a 3090 TI when Ethereum mining stopped, but it's not enough. It's never enough. |
Yes indeed.
My intuition fails me. As you can see from my github readme I'm self taught so would you by any chance have compact (time wise) online resources for me to grok this? My take is that the logprobs here are the last logit in the network, so taking the average gives us a measure of how surprised the model was by the words. So I guess if you generate multiple times you can try to favor the examples with the highest logprob but otherwise you can't use the avg_logprob "signal" to guide your generation right?
That's what I'm planning on buying in the coming months. edit: I'll change my grid_search.py to also store the avg_logprob and investigate a bit. Thanks! |
Hmm it's been so long since I learned this stuff. Back in the day I read the Deep Learning Book (https://www.deeplearningbook.org/contents/prob.html) and did the cs231n course... but these days an LLM might be able to give you information that's more tailored to your background.
Well I'd say you get it, as this is pretty much right, imo. Yeah, logprobs are often the last token. But in this case, we are taking the average logprobs of each token in the whole input sequence (e.g. the persona + suffix). So we are trying to measure, for a input sentence that the model reads, how likely would it be to generate it, on its own? If it's something it would never say, is it really worth "guiding" it not to do something.
It's pretty good to be able to run 8b models, the ones smaller than that kind of suck. |
|
This looks good - I've been wanting to add something like layer_zones. Will pick at least that off and add you as co-author for 0.5. For chat templates I'll need to take a look at what you did closer - I'm leery of adding too much experiment-specific logic, but I guess having a "default" |
Thanks a lot. I really appreciate your thoughtfulness regarding crediting. Regarding chat templates: It was so much of a pain to handle model specific quirks regarding templates etc that I ended up convinced that it makes sense to include it somewhere in repeng. A middle ground could be to put it in another file like Re reading the code, your edit: of course let me know if I can be of help |
Hi!
This is a polished version of #55 I made a while ago.
It mainly brings 3 features:
roleandcontent).layer_zonesinstead oflayer_ids. For example[[0.1, 0.5]]means we control the layers whose relative depth is between 0.1 (included) and 0.5 (not included).I took the liberty of adding loguru for some logging but I can scrape that if you prefer.
All the tests are passing, although I had to add some value for the LLMs but they remain sound. I also added a test for the
make_datasetfunction.Hopefully, this will make
repengeasier to work with when benchmarking my ideas over several setups and models.Let me know if you want me to modify anything in this PR.
My plan is to now implement some mechanism to dump the logits to a file so that we can use commodity hardware, add more pairs of examples, use more advanced pairs manipulation (clustering etc).
Edit: my ongoing work will take place in that fork and if I find other mistakes I'll push them here as #65 is not merged. I implemented a bunch of features, notably a h5py caching of activations to make it easier to run on low end hardware as this way I don't have to hold all the activations in memory.
If you could find the time to check out the README of the research fork and tell me what you think of the features I added I'm very interested :) I intend to make all my changes upstream so would like to know what you think of each!