-
Notifications
You must be signed in to change notification settings - Fork 9.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implementation of dynamic temperature sampling as seen in KoboldCpp #4972
Conversation
A note on usage: pass
This PR should be directly compatible with the example:
|
However, we can potentially expose this as an input parameter? Let me know if this is something we want? and I can make an update. |
It should be. ooba has their own implementation of Dynatemp and from what I've heard the exponent has made larger dynatemp ranges (1-5) useful while remaining coherent. |
made an update exposing the |
this avoids counting the time taken stats twice Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
…into dynamic-temp
Is there a reason to prefer llama_sample_entropy over llama_sample_greedy_dynamic_temp or llama_sample_hhi? From what I read, the HHI variant was supposed to be superior, at least in theory - I haven't tried the others myself. |
I ported the implementation from @kalomaze, he is the guy who came up with this, so all credit goes to him. See related issue here: #3483 It has been implemented in KolboldCpp, SillyTavern, Oobabooga and other frontends and from talking to the users, it seems to have gotten a very positive response.
I've not heard of "greedy dynamic temp" nor "hhi". I did a search in the current repo for those two functions but can't find them. Are they implemented already? |
Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com>
Those were older alternative dynatemp methods. Older revisions of kalomaze's koboldcpp have them. This branch has all three implemented https://github.com/kalomaze/koboldcpp/tree/dec-15-updated |
This PR contains the implementation from https://github.com/kalomaze/koboldcpp/tree/v1.54-ui-dynatemp-fix (KoboldCpp from 2 weeks ago). I believe this one should be the latest? |
The implementation of DynaTemp is correct (Kalo can chime in if I'm wrong) that said does it make sense to use the range instead of just min and max being passed in? The reason I bring it up is because when I built the UI into kcpp originally Kalo wanted a min and a max. Concedo wanted range for the UI so it shifted that way, but maybe the original iteration of min/max makes more sense here? Also, I'm using this branch as a base for one of Kalo's new experiments, and I noticed server.cpp has nowhere to intake DynaTemp Range. |
Ah, I guess I haven't been keeping up-to-date on the the release notes - the last time I was reading about the different methods, Kalomaze said Gini/HHI sampling seemed superior. I implemented it in my own llama.cpp fork around when noisy sampling was added, and I also implemented noisy sampling. But kalomaze said entropy sampling was preferred in December and removed the other methods three weeks ago - I wasn't aware. |
This reverts commit 5eaf996.
I can't figure out how to use it. Is it something I need to change in the source code? |
@Rotatingxenomorph simply set the field For example, if you want |
This reverts commit 5eaf996.
This reverts commit 5eaf996.
This reverts commit 8c7db9a.
How do you use this? Passing --dynatemp_range when running results in "error: unknown argument: --dynatemp-range", and changing dynatemp_range and dynatemp_exponent in sampling.h doesn't affect generation. |
Could we please have this added to main for use as a CLI argument?
as mentioned above. |
Of course! PR here: #5295 |
* implemented dynamic temperature sampling from koboldcpp * removed trailing whitespace * removed unused temp parameter in llama_sample_entropy * exposed exponent_val in dynamic temp sampler * added debug check for printf statements * use nullptr in llama_sample_softmax call during llama_sample_entropy this avoids counting the time taken stats twice Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * return earlier if there is only 1 candiate (i.e. max_entropy == 0) * reformat 't' case in llama_sample_queue Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com> * check for one or zero candidates case in llama_sample_entropy --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com>
* implemented dynamic temperature sampling from koboldcpp * removed trailing whitespace * removed unused temp parameter in llama_sample_entropy * exposed exponent_val in dynamic temp sampler * added debug check for printf statements * use nullptr in llama_sample_softmax call during llama_sample_entropy this avoids counting the time taken stats twice Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * return earlier if there is only 1 candiate (i.e. max_entropy == 0) * reformat 't' case in llama_sample_queue Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com> * check for one or zero candidates case in llama_sample_entropy --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com>
See issue here: #3483
@kalomaze I took the liberty of porting the dynamic temperature sampling method in Koboldcpp to llama.cpp since your users seems to be heavily recommending it to me =D