Add top-p sampling #23

jkrukowski · 2023-11-30T17:56:51Z

In this PR

added top-p sampling function based on https://gist.github.com/thomwolf/1a5a29f6962089e871b94cbd09daf317
added tests

I've compared 2 different implementations here https://github.com/jkrukowski/topp -- looks like using Accelerate to compute a cumulative sum gives it a seed boost.

- added tests

pcuenca

This is fantastic, @jkrukowski!

Let me try it out and check the implementation against the current one in transformers in case there are some details that could be incorporated, but this looks great already.

As a side comment, we could potentially implement the costly cumsum operation in Core ML as part of the model conversion, or using a Core ML pipeline. But using Accelerate should be more than enough for now!

pcuenca

I tested it and it works fine! As expected, it's a bit slow but we can try to optimize later. Additionally, top-k and top-p could potentially coexist as pointed out below, but we can also handle that in a new PR unless you want to tackle it now :)

pcuenca · 2023-12-04T18:24:13Z

Sources/Generation/Generation.swift

            if config.topK > 0 {
                let topK = Math.topK(arr: logits, k: config.topK)
                nextToken = Math.sample(indexes: topK.indexes, probs: topK.probs)
+            } else if config.topP < 1.0 {


If my understanding of this is correct, top-k can coexist with top-p: https://github.com/huggingface/transformers/blob/42017d82baa083da2bee3055fdac80c81ee97b8a/src/transformers/generation/utils.py#L805-L808

However, it could make sense to merge this PR now and making them coexist in a future one. What do you think?

I'd say let's merge it now, seems logical to create a separate PR with a common interface to these two

pcuenca · 2023-12-04T18:25:39Z

Sources/Generation/Generation.swift

-                fatalError("topP not implemented yet")
+                fatalError("not implemented yet")


If we make top-k compatible with top-p, we'd do a single sample call on the selected tokens and remove this fatalError.

- added topp implementation

4b185f9

- added tests

pcuenca reviewed Nov 30, 2023

View reviewed changes

pcuenca approved these changes Dec 4, 2023

View reviewed changes

pcuenca merged commit 62b2ddd into huggingface:main Dec 4, 2023

jkrukowski deleted the topp-sampling branch December 4, 2023 20:50

jkrukowski mentioned this pull request Dec 9, 2023

Allow top-k and top-p to coexist #27

Merged

pcuenca mentioned this pull request Dec 24, 2023

Generation: add nucleus sampling #6

Closed

This was referenced Sep 24, 2025

Stateful cache, MLTensor #257

Merged

Add back Top-P sampling, Repetition Penalty, Logits Warpers #271

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add top-p sampling #23

Add top-p sampling #23

Uh oh!

jkrukowski commented Nov 30, 2023

Uh oh!

pcuenca left a comment

Uh oh!

pcuenca left a comment

Uh oh!

pcuenca Dec 4, 2023

Uh oh!

jkrukowski Dec 4, 2023

Uh oh!

pcuenca Dec 4, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		fatalError("topP not implemented yet")
		fatalError("not implemented yet")

Add top-p sampling #23

Add top-p sampling #23

Uh oh!

Conversation

jkrukowski commented Nov 30, 2023

Uh oh!

pcuenca left a comment

Choose a reason for hiding this comment

Uh oh!

pcuenca left a comment

Choose a reason for hiding this comment

Uh oh!

pcuenca Dec 4, 2023

Choose a reason for hiding this comment

Uh oh!

jkrukowski Dec 4, 2023

Choose a reason for hiding this comment

Uh oh!

pcuenca Dec 4, 2023

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants