Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Min P sampler implementation [alternative to Top P/Top K] #3841

Merged
merged 25 commits into from
Oct 31, 2023

Conversation

kalomaze
Copy link
Contributor

@kalomaze kalomaze commented Oct 28, 2023

The way that this sampler works is:

  • Every possible token has a probability percentage attached to it that we will be measuring for consideration.
  • The base min p value represents the starting required percentage. (For example, 0.05 = only include tokens that are at least 5% probable)
  • This gets scaled by the top token in the entire list's probability. So if your top token is 90%, then that 5% is multiplied by 0.9x (4.5%)
  • So if the top token is 90% probable, and your base_min_p is set to 0.05, then only tokens that are at least 4.5% probable will be sampled from before temperature is applied.
  • This method seems more effective at selecting the reasonable tokens compared to both Top P and Top K.

image

Top P has a design flaw in that numerous tail end tokens can be considered if the top tokens don't have concentrated enough scores to meet up to the specified Top P value, while TFS and other novel sampler approaches aren't as easily interpretable or consistent as Top P. The primary purpose of the Min P sampler is to accomodate for both of these design flaws.

The current implementation is very rough around the edges code-wise, as I am not very experienced with C++, but I hope to properly polish this implementation to be considered for merging. I have gotten improved results personally and positive feedback from other users, especially in regards to increased coherent creativity.

Mathematically, it is not as complex as TFS or other tail search algorithms, but importantly, it is easily understandable and in how it impacts the probabilities as a result. It is essentially a streamlined linear version of Top A in design. However, it consistently outperforms Top P and Top K for removing tail end tokens.

@kalomaze
Copy link
Contributor Author

kalomaze commented Oct 28, 2023

The current implementation:

  • Checks if Top P is set to 0.02, and if it has this value, triggers an override to use Min P sampling
  • Creates a .txt file with the base_min_p value, SamplerBaseMinP.txt if it doesn't already exist
  • Loads the value from the .txt file and performs Min P calculations

This is of course suboptimal in a lot of ways, but when drafting sampler ideas, I wanted to avoid touching the sampler stack order as it currently existed before I found a solution. What would be the best way to integrate this if the objective was to avoid Top P and Top K's flaws via an improved single sampler, where it's not intended to be used in tandem with them? (Maybe they should be disabled like how Mirostat disables samplers when this is enabled?)

@cebtenzzre cebtenzzre marked this pull request as draft October 28, 2023 22:58
llama.cpp Outdated Show resolved Hide resolved
llama.cpp Outdated Show resolved Hide resolved
llama.cpp Outdated Show resolved Hide resolved
llama.cpp Outdated Show resolved Hide resolved
llama.cpp Outdated Show resolved Hide resolved
llama.cpp Show resolved Hide resolved
common/common.cpp Outdated Show resolved Hide resolved
llama.h Outdated Show resolved Hide resolved
@kalomaze
Copy link
Contributor Author

kalomaze commented Oct 29, 2023

image
A comparison between Top P and Min P when faced with absurdly high temperature scaling (no prompt formatting or anything, so not ideal model conditions, just a quick test)

llama.cpp Outdated Show resolved Hide resolved
common/sampling.h Outdated Show resolved Hide resolved
common/sampling.h Outdated Show resolved Hide resolved
tk-master added a commit to tk-master/llama-cpp-python that referenced this pull request Nov 16, 2023
My small contribution to this great project.

Ref: ggerganov/llama.cpp#3841

Closes: abetlen#911
brittlewis12 added a commit to brittlewis12/llmfarm_core.swift that referenced this pull request Nov 17, 2023
abetlen pushed a commit to abetlen/llama-cpp-python that referenced this pull request Nov 21, 2023
* Added support for min_p

My small contribution to this great project.

Ref: ggerganov/llama.cpp#3841

Closes: #911

* Fix for negative temp (sample_softmax)
@kalomaze kalomaze mentioned this pull request Nov 21, 2023
4 tasks
olexiyb pushed a commit to Sanctum-AI/llama.cpp that referenced this pull request Nov 23, 2023
…gerganov#3841)

* Introduce the new Min-P sampler by @kalomaze
   The Min-P sampling method was designed as an alternative to Top-P, and aims to ensure a balance of quality and variety. The parameter *p* represents the minimum probability for a token to be considered, relative to the probability of the most likely token.

* Min-P enabled and set to 0.05 default

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: cebtenzzre <cebtenzzre@gmail.com>
olexiyb pushed a commit to Sanctum-AI/llama.cpp that referenced this pull request Nov 23, 2023
* Update server.cpp with min_p after it was introduced in ggerganov#3841

* Use spaces instead of tabs

* Update index.html.hpp after running deps.sh

* Fix test - fix line ending
@ivanstepanovftw
Copy link
Collaborator

rep-pen=1.18

Some languages does not have full word tokens, and you will penalize sub words or characters.

@ZoomRmc
Copy link

ZoomRmc commented Apr 24, 2024

Having experimented with using strictly min-p sampler (all others turned off) for creative but structured writing with llama3-7b, I see very little sense in using the currently default order (temperature last).

  • --samplers min_p;temperature (current default if top-k and top-p turned off)
    In this configuration, it seems there's very little direct influence on the creativity by setting various temp values. The output follows the prompted structure correctly and initially generates rather cohesive output well into the very high temp values (t:4.0 min-p: 0.01) until some extremely unlikely garbage tokens slip through which immediately destabilizes the output.
    In other words, I couldn't achieve any gradual control over the creativity of the output. Moreover, tweaking min-p alone doesn't help with changing the creativity much, abruptly starting to produce gibberish.
  • --samplers temperature;min_p
    This seems far more controllable. The gradual increase of the temperature up to the extreme values work as you'd expect: the output follows the bland>unremarkable>creative>typo-ridden>broken grammar>unstable curve in a more or less predictable way.
    On the other hand, this sampler order is more interdependent and makes it easier to subtly influence the results in an unintended way.

My impression after a brief testing: Current order defaults certainly provide more deterministic output for most cases, including uninformed tweaking of the sampler settings, but probably limit the user's control.

Relevant: #4091

@ivanstepanovftw
Copy link
Collaborator

Also NMS (Non Max Suppression) from image object detection task uses probability threshold too, similarly to Min P.

@Arcitec
Copy link

Arcitec commented Aug 2, 2024

@ZoomRmc

Having experimented with using strictly min-p sampler (all others turned off) for creative but structured writing with llama3-7b, I see very little sense in using the currently default order (temperature last).

Your quote above may very well provide better results for you with your CURRENT settings, but the author of Min P clearly says the correct order for Temperature in his recommended "General Purpose parameters".

It includes a note that says (paraphrased) "Temperature should be applied LAST, otherwise you will break Min P".

And that claim makes sense, because Temperature artificially boosts the "probability scores" of all tokens by a non-linear amount based on the top token. This would totally screw with Min P's purpose of EXCLUDING tokens that are NOT "likely enough".

By putting Temperature BEFORE Min P, you essentially BYPASS/SKIP/BREAK Min P's algorithm!

Therefore, Temperature MUST come last with Min P.

I suspect that your overall settings were simply bad. Try applying the author of Min P's settings from the image below:

image

The image is taken from the author's presentation.

@ZoomRmc
Copy link

ZoomRmc commented Aug 2, 2024

You may be right, but I don't know what you mean by "your overall settings were simply bad". The only settings used were given: only min-p and temperature samplers enabled and various values in ranges [0.05..0.005] for min-p and [0.1..4.0] for temp tried with subjective results described.

@Arcitec
Copy link

Arcitec commented Aug 3, 2024

@ZoomRmc A quick glance at the Min P author's settings above shows that the active samplers were Repetition Penalty, Min P and Temperature, in that order.

His original presentation (see link above, under the image of his settings) explains that Temperature performs a non-linear boost of the probability values of every token.

So let's say a token was 5% likely to be the next token. Another token was 95% likely. Temperature would scale those non-linearly, so suddenly 5% might be 25%, while the 95% token becomes 97%. This completely screws up the probability values for every token.

Imagine what that does to Min P? It COMPLETELY breaks it.

Because the purpose of Min P is to look at the most probable token and then set a dynamic limit at X% of that probability, to cut out all the random noise at the bottom of the list.

So it's unfortunately not even a question: Temperature MUST be after Min P, otherwise Min P doesn't work anymore.

For what it's worth, I've tried those exact "General Purpose" settings that he recommended above (with Min P before Temperature), and my results are incredibly good. It was like upgrading the brain of my Llama 2 model. I really, really like it! Thanks everyone who implemented it in llama.cpp!

YuMJie pushed a commit to YuMJie/powerinfer that referenced this pull request Oct 25, 2024
* Update server.cpp with min_p after it was introduced in ggerganov/llama.cpp#3841

* Use spaces instead of tabs

* Update index.html.hpp after running deps.sh

* Fix test - fix line ending
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.