Fixes the error in num_accepted_tokens calculation in reject sampler #2658

sighingnow · 2024-01-30T03:29:02Z

The accepted mask vector [True, False, True, True], the num_accepted_tokens should be 1 instead of 3, as the last two tokens will be rejected as the 2-nd token is not accepted.

Thus, accepted.sum() is incorrect for num_accepted_tokens..

The reject sampler was merged in #2336, so @cadedaniel could you please help to confirm this bugfix here? Thanks!

Signed-off-by: Tao He <sighingnow@gmail.com>

LiuXiaoxuanPKU · 2024-01-30T18:16:28Z

cadedaniel · 2024-01-30T18:41:03Z

Hi @sighingnow and @LiuXiaoxuanPKU . Thanks for investigating this behavior and your interest in speculative decoding.

This is actually by design. To run speculative decoding in production, we want an empirical measurement of the quality of our draft model. We formulate this as the number of tokens acceptable by rejection sampling assuming their prefix is also accepted. This gives us a good real-world proxy for how aligned the draft and target models are.

For the top-level metric on how many tokens are emitted by speculative decoding, one should use num_emitted_tokens. This is what should be used to model the reduction in user-perceived inter-token-latency (ITL), as ITL is approximately step_time / num_tokens_per_step.

sighingnow · 2024-01-31T02:21:16Z

num_accepted_tokens should be 4 here because 8 is a bonus token so it does not count.

Here, the 1 in the first sequence is a token generated by the target model, rather than drafted by the draft model. Note that in speculative decoding there is always at least 1 token to be generated even though all draft tokens are rejected.

For the top-level metric on how many tokens are emitted by speculative decoding, one should use num_emitted_tokens. This is what should be used to model the reduction in user-perceived inter-token-latency (ITL), as ITL is approximately step_time / num_tokens_per_step.

I see. Thanks for making it clear.

Fixes the error in num_accepted_tokens calculation in reject sampler

a4a035c

Signed-off-by: Tao He <sighingnow@gmail.com>

Yard1 requested a review from cadedaniel January 30, 2024 12:34

LiuXiaoxuanPKU self-assigned this Jan 30, 2024

sighingnow closed this Jan 31, 2024

sighingnow deleted the ht/fixes-num-accepted-tokens branch January 31, 2024 02:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Fixes the error in num_accepted_tokens calculation in reject sampler #2658

Fixes the error in num_accepted_tokens calculation in reject sampler #2658

Uh oh!

sighingnow commented Jan 30, 2024

Uh oh!

LiuXiaoxuanPKU commented Jan 30, 2024 •

edited

Loading

Uh oh!

cadedaniel commented Jan 30, 2024

Uh oh!

sighingnow commented Jan 31, 2024

Uh oh!

Uh oh!

Uh oh!

Fixes the error in num_accepted_tokens calculation in reject sampler #2658

Fixes the error in num_accepted_tokens calculation in reject sampler #2658

Uh oh!

Conversation

sighingnow commented Jan 30, 2024

Uh oh!

LiuXiaoxuanPKU commented Jan 30, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cadedaniel commented Jan 30, 2024

Uh oh!

sighingnow commented Jan 31, 2024

Uh oh!

Uh oh!

LiuXiaoxuanPKU commented Jan 30, 2024 •

edited

Loading