Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementation of dynamic temperature sampling as seen in KoboldCpp #4972

Merged
merged 11 commits into from
Jan 25, 2024

Conversation

l3utterfly
Copy link
Contributor

See issue here: #3483

@kalomaze I took the liberty of porting the dynamic temperature sampling method in Koboldcpp to llama.cpp since your users seems to be heavily recommending it to me =D

@cebtenzzre cebtenzzre self-requested a review January 16, 2024 17:11
llama.cpp Outdated Show resolved Hide resolved
@l3utterfly
Copy link
Contributor Author

l3utterfly commented Jan 18, 2024

A note on usage: pass dynatemp_range field in llama_sampling_params.

max_temp and min_temp are calculated simply by: temp +/- dynatemp_range.

This PR should be directly compatible with the example: main

float dynatemp_range = 0.00f; // 0.0 = disabled

@l3utterfly
Copy link
Contributor Author

exponent_val for controlling how entropy maps to the temperature range is harded-coded as 1.0f for now. This is not exposed in KoboldCpp either.

However, we can potentially expose this as an input parameter? Let me know if this is something we want? and I can make an update.

@Chanka0
Copy link

Chanka0 commented Jan 18, 2024

exponent_val for controlling how entropy maps to the temperature range is harded-coded as 1.0f for now. This is not exposed in KoboldCpp either.

However, we can potentially expose this as an input parameter? Let me know if this is something we want? and I can make an update.

It should be. ooba has their own implementation of Dynatemp and from what I've heard the exponent has made larger dynatemp ranges (1-5) useful while remaining coherent.

@l3utterfly
Copy link
Contributor Author

made an update exposing the exponent_val in dynamic temp sampler

llama.cpp Outdated Show resolved Hide resolved
llama.cpp Outdated Show resolved Hide resolved
llama.cpp Outdated Show resolved Hide resolved
llama.cpp Outdated Show resolved Hide resolved
common/sampling.cpp Outdated Show resolved Hide resolved
@cebtenzzre
Copy link
Collaborator

cebtenzzre commented Jan 23, 2024

Is there a reason to prefer llama_sample_entropy over llama_sample_greedy_dynamic_temp or llama_sample_hhi? From what I read, the HHI variant was supposed to be superior, at least in theory - I haven't tried the others myself.

@l3utterfly
Copy link
Contributor Author

l3utterfly commented Jan 24, 2024

I ported the implementation from @kalomaze, he is the guy who came up with this, so all credit goes to him. See related issue here: #3483

It has been implemented in KolboldCpp, SillyTavern, Oobabooga and other frontends and from talking to the users, it seems to have gotten a very positive response.

Is there a reason to prefer llama_sample_entropy over llama_sample_greedy_dynamic_temp or llama_sample_hhi?

I've not heard of "greedy dynamic temp" nor "hhi". I did a search in the current repo for those two functions but can't find them. Are they implemented already?

@Chanka0
Copy link

Chanka0 commented Jan 24, 2024

I ported the implementation from @kalomaze, he is the guy who came up with this, so all credit goes to him. See related issue here: #3483

It has been implemented in KolboldCpp, SillyTavern, Oobabooga and other frontends and from talking to the users, it seems to have gotten a very positive response.

Is there a reason to prefer llama_sample_entropy over llama_sample_greedy_dynamic_temp or llama_sample_hhi?

I've not heard of "greedy dynamic temp" nor "hhi". I did a search in the current repo for those two functions but can't find them. Are they implemented already?

Those were older alternative dynatemp methods. Older revisions of kalomaze's koboldcpp have them. This branch has all three implemented https://github.com/kalomaze/koboldcpp/tree/dec-15-updated

@l3utterfly
Copy link
Contributor Author

This PR contains the implementation from https://github.com/kalomaze/koboldcpp/tree/v1.54-ui-dynatemp-fix (KoboldCpp from 2 weeks ago). I believe this one should be the latest?

@AAbushady
Copy link

AAbushady commented Jan 24, 2024

The implementation of DynaTemp is correct (Kalo can chime in if I'm wrong) that said does it make sense to use the range instead of just min and max being passed in? The reason I bring it up is because when I built the UI into kcpp originally Kalo wanted a min and a max. Concedo wanted range for the UI so it shifted that way, but maybe the original iteration of min/max makes more sense here?

Also, I'm using this branch as a base for one of Kalo's new experiments, and I noticed server.cpp has nowhere to intake DynaTemp Range.

@cebtenzzre
Copy link
Collaborator

I've not heard of "greedy dynamic temp" nor "hhi". I did a search in the current repo for those two functions but can't find them. Are they implemented already?

Ah, I guess I haven't been keeping up-to-date on the the release notes - the last time I was reading about the different methods, Kalomaze said Gini/HHI sampling seemed superior. I implemented it in my own llama.cpp fork around when noisy sampling was added, and I also implemented noisy sampling.

But kalomaze said entropy sampling was preferred in December and removed the other methods three weeks ago - I wasn't aware.

@ggerganov ggerganov merged commit 5eaf996 into ggerganov:master Jan 25, 2024
38 of 45 checks passed
Nexesenex added a commit to Nexesenex/croco.cpp that referenced this pull request Jan 26, 2024
@Rotatingxenomorph
Copy link

I can't figure out how to use it. Is it something I need to change in the source code?

@l3utterfly
Copy link
Contributor Author

A note on usage: pass dynatemp_range field in llama_sampling_params.

max_temp and min_temp are calculated simply by: temp +/- dynatemp_range.

This PR should be directly compatible with the example: main

float dynatemp_range = 0.00f; // 0.0 = disabled

@Rotatingxenomorph simply set the field dynatemp_range in llama_sampling_params to anything greater than 0.

For example, if you want max_temp = 1.5 and min_temp = 0.5, you would pass: temp = 1.0 and dynatemp_range = 0.5

@l3utterfly l3utterfly deleted the dynamic-temp branch January 26, 2024 12:23
Nexesenex added a commit to Nexesenex/croco.cpp that referenced this pull request Jan 27, 2024
Nexesenex added a commit to Nexesenex/croco.cpp that referenced this pull request Jan 27, 2024
Nexesenex added a commit to Nexesenex/croco.cpp that referenced this pull request Jan 27, 2024
@Azirine
Copy link

Azirine commented Jan 31, 2024

How do you use this? Passing --dynatemp_range when running results in "error: unknown argument: --dynatemp-range", and changing dynatemp_range and dynatemp_exponent in sampling.h doesn't affect generation.

@strawberrymelonpanda
Copy link
Contributor

strawberrymelonpanda commented Feb 2, 2024

Could we please have this added to main for use as a CLI argument?

This PR should be directly compatible with the example: main
float dynatemp_range = 0.00f; // 0.0 = disabled

as mentioned above.

@l3utterfly
Copy link
Contributor Author

l3utterfly commented Feb 3, 2024

Could we please have this added to main for use as a CLI argument?

This PR should be directly compatible with the example: main
float dynatemp_range = 0.00f; // 0.0 = disabled

as mentioned above.

Of course!

PR here: #5295

@Azirine @strawberrymelonpanda

jordankanter pushed a commit to jordankanter/llama.cpp that referenced this pull request Feb 3, 2024
* implemented dynamic temperature sampling from koboldcpp

* removed trailing whitespace

* removed unused temp parameter in llama_sample_entropy

* exposed exponent_val in dynamic temp sampler

* added debug check for printf statements

* use nullptr in llama_sample_softmax call during llama_sample_entropy

this avoids counting the time taken stats twice

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* return earlier if there is only 1 candiate (i.e. max_entropy == 0)

* reformat 't' case in llama_sample_queue

Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com>

* check for one or zero candidates case in llama_sample_entropy

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com>
hodlen pushed a commit to hodlen/llama.cpp that referenced this pull request Apr 1, 2024
* implemented dynamic temperature sampling from koboldcpp

* removed trailing whitespace

* removed unused temp parameter in llama_sample_entropy

* exposed exponent_val in dynamic temp sampler

* added debug check for printf statements

* use nullptr in llama_sample_softmax call during llama_sample_entropy

this avoids counting the time taken stats twice

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* return earlier if there is only 1 candiate (i.e. max_entropy == 0)

* reformat 't' case in llama_sample_queue

Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com>

* check for one or zero candidates case in llama_sample_entropy

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants