🔴 Isolate prefill from generation loops #40652

manueldeprada · 2025-09-03T12:27:23Z

As per title, isolate prefill from particular decoding methods into a separate function.

The only exception is assisted generation, since there is no clear prefill/decoding steps, as on each step a bunch of candidates with variable quantity of new tokens goes through the model.

Breaking changes: behavior of _beam_search and _sample changed for anyone subclassing them.

HuggingFaceDocBuilderDev · 2025-09-03T12:36:43Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

zucchini-nlp · 2025-09-03T14:09:23Z

Yay, finally, can't wait to get this shipped 🎉

manueldeprada · 2025-09-03T14:14:05Z

@zucchini-nlp look also at this beauty #40657 no more if generation_mode=blabla / elif / elif 💖 ✨ ✨

src/transformers/modeling_utils.py

src/transformers/generation/utils.py

gante

Looks like it's going in the right direction, added a few comments :)

src/transformers/generation/utils.py

src/transformers/modeling_utils.py

src/transformers/generation/utils.py

gante · 2025-09-05T09:33:01Z

src/transformers/generation/utils.py

-            if is_prefill:
-                outputs = self(**model_inputs, return_dict=True)
-                is_prefill = False
+            if prefill_outputs is not None:


we can prepare the 1st round of inputs outside the decoding loop, which should result in a cleaner decoding loop

It is not an easy task 😢:

We cant move the prepare and forward calls to the bottom of the loop, since it is critical that _has_unfinished_sequences is checked just after updating this_peer_finished

Another reason for prepare, forward and updateKwargs calls to be on top is that DeepSpeed's synced_gpus mode needs to run only that over and over if this peer finished but the others didnt

With these constraints, we either need to unroll the whole first iteration of the loop or have some sort of if in the beginning of the loop that accounts for the prefill. I think it is less complex to have the if.

So I have kept the existing approach with a boolean variable for readability, WDYT?

src/transformers/generation/utils.py

manueldeprada · 2025-09-17T19:36:49Z

run-slow: gemma

github-actions · 2025-09-17T19:38:24Z

This comment contains run-slow, running the specified jobs:

models: ['models/gemma']
quantizations: [] ...

gante

Looks mostly good to me 👍

src/transformers/generation/utils.py

…te-prefill

manueldeprada · 2025-11-04T14:42:13Z

thanks @gante last bits corrected!! looking much better :)

github-actions · 2025-11-04T14:42:42Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: csm, dia, musicgen, rag

gante

LGTM, thank you for iterating 🤗

(but let me check slow tests before merging)

gante · 2025-11-05T11:35:19Z

run-slow: csm, dia, musicgen, rag, bart, gpt2, llama,

github-actions · 2025-11-05T11:36:40Z

This comment contains run-slow, running the specified jobs:

models: ["models/bart", "models/csm", "models/dia", "models/gpt2", "models/llama", "models/musicgen", "models/rag"]
quantizations: []

github-actions · 2025-11-05T11:55:03Z

CI Results

Workflow Run ⚙️

✅ No failing test specific to this PR 🎉 !

manueldeprada · 2025-11-05T12:47:25Z

thanks @gante all green! (you have to merge it yourself, I dont have permissions to merge anymore😄)

* isolate-prefill: squash * prefill inside decoding methods * simplify autocompile helpers

manueldeprada mentioned this pull request Sep 3, 2025

Align assisted generate for unified signature in decoding methods #40657

Merged

manueldeprada commented Sep 4, 2025

View reviewed changes

src/transformers/modeling_utils.py Outdated Show resolved Hide resolved

manueldeprada commented Sep 4, 2025

View reviewed changes

src/transformers/generation/utils.py Show resolved Hide resolved

manueldeprada changed the title ~~Isolate prefill~~ 🔴 Isolate prefill from generation loops Sep 4, 2025

manueldeprada commented Sep 4, 2025

View reviewed changes

src/transformers/generation/utils.py Outdated Show resolved Hide resolved

gante reviewed Sep 5, 2025

View reviewed changes

manueldeprada requested a review from gante September 15, 2025 07:09

isolate-prefill: squash

44274e6

manueldeprada force-pushed the isolate-prefill branch from 5385359 to 44274e6 Compare September 15, 2025 13:54

gante reviewed Sep 19, 2025

View reviewed changes

src/transformers/generation/utils.py Outdated Show resolved Hide resolved

src/transformers/generation/utils.py Outdated Show resolved Hide resolved

src/transformers/generation/utils.py Show resolved Hide resolved

manueldeprada added 3 commits November 4, 2025 15:05

Merge branch 'main' of github.com:huggingface/transformers into isola…

2d68770

…te-prefill

prefill inside decoding methods

a77a50d

simplify autocompile helpers

3cb293e

manueldeprada marked this pull request as ready for review November 4, 2025 14:41

manueldeprada requested a review from gante November 4, 2025 14:41

gante approved these changes Nov 5, 2025

View reviewed changes

gante merged commit 571352d into huggingface:main Nov 5, 2025
25 checks passed

zucchini-nlp mentioned this pull request Nov 5, 2025

generate() produces incoherent output when inputs_embeds has length 1 #41863

Open

4 tasks

manueldeprada mentioned this pull request Nov 6, 2025

Refactor output handling in generate for cleaner decoding methods #40887

Open

Abdennacer-Badaoui pushed a commit to Abdennacer-Badaoui/transformers that referenced this pull request Nov 10, 2025

🔴 Isolate prefill from generation loops (huggingface#40652)

43bffd0

* isolate-prefill: squash * prefill inside decoding methods * simplify autocompile helpers

🔴 Isolate prefill from generation loops #40652

🔴 Isolate prefill from generation loops #40652

Uh oh!

Conversation

manueldeprada commented Sep 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Sep 3, 2025

Uh oh!

zucchini-nlp commented Sep 3, 2025

Uh oh!

manueldeprada commented Sep 3, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gante left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gante Sep 5, 2025

Choose a reason for hiding this comment

Uh oh!

manueldeprada Sep 15, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

manueldeprada commented Sep 17, 2025

Uh oh!

github-actions bot commented Sep 17, 2025

Uh oh!

gante left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

manueldeprada commented Nov 4, 2025

Uh oh!

github-actions bot commented Nov 4, 2025

Uh oh!

gante left a comment

Choose a reason for hiding this comment

Uh oh!

gante commented Nov 5, 2025

Uh oh!

github-actions bot commented Nov 5, 2025

Uh oh!

github-actions bot commented Nov 5, 2025

CI Results

Uh oh!

manueldeprada commented Nov 5, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

manueldeprada commented Sep 3, 2025 •

edited

Loading