[Generation, Gemma 3] When passing a custom `generation_config`, overwrite default values with the model's base `generation_config` #36684

gante · 2025-03-12T18:19:37Z

What does this PR do?

See title.

For instance, gemma 3 models have cache_implementation="hybrid" by default but, if we pass generation_config=GenerationConfig() (i.e. default parameters) the code will crash because a hybrid cache is not used. In other words, let's assume a user wants to use the base parameterization by the model creators, and use model-specific defaults as opposed to global defaults.

Original testing script, crashing on main: (by @NathanHB)

from transformers import AutoModelForCausalLM, AutoTokenizer, GenerationConfig
import torch

def main():
    model = "google/gemma-3-1b-it"
    revision = "e735e8d98f6d2ccdb3bdfc43ac1c252bebb2527f"
    dtype = "bfloat16"
    tokenizer = AutoTokenizer.from_pretrained(model)
    model = AutoModelForCausalLM.from_pretrained(model, revision=revision, torch_dtype=dtype, device_map="cuda:0")
    prompt = """Solve the following math problem efficiently and clearly.  The last line of your response should be of the following format: 'Therefore, the final answer is: $\boxed{ANSWER}$. I hope it is correct' (without quotes) where ANSWER is just the final number or expression that solves the problem. Think step by step before answering.

Alice chooses a set $A$ of positive integers. Then Bob lists all finite nonempty sets $B$ of positive integers with the property that the maximum element of $B$ belongs to $A$. Bob's list has 2024 sets. Find the sum of the elements of A.
        """.strip()

    chat = [{
        "content": prompt,
        "role": "user",
    }]

    inputs = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
    print(inputs)
    inputs = tokenizer(inputs, return_tensors="pt").to(model.device)
    print("=== DECODING ===")
    generation_config = GenerationConfig(max_new_tokens=2048, temperature=1.0, do_sample=True, top_k=64, top_p=0.95)
    outputs = model.generate(**inputs, generation_config=generation_config)
    outputs = tokenizer.decode(outputs[0], skip_special_tokens=False)

    print(outputs)

if __name__ == "__main__":
    main()

github-actions · 2025-03-12T18:19:48Z

Hi 👋, thank you for opening this pull request! The pull request is converted to draft by default. When it is ready for review, please click the Ready for review button (at the bottom of the PR page).

HuggingFaceDocBuilderDev · 2025-03-12T18:48:03Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

gante · 2025-03-12T18:58:50Z

src/transformers/generation/utils.py

        default_list: Union[LogitsProcessorList, StoppingCriteriaList],
        custom_list: Union[LogitsProcessorList, StoppingCriteriaList],
    ) -> Union[LogitsProcessorList, StoppingCriteriaList]:
+        """


The changes in this function are secondary to the main change:

whisper breaks because it both sets custom logits processors AND has the default flags in the generation config to instantiate them

after the original change (inherit defaults from the model's generation config), we were throwing an exception here

after this secondary change, we only throw a warning and discard the logits processor instance created inside .generate() (i.e. assumes the user knows what they are doing when the pass logits_processors to .generate() instead of crashing)

A tiny comment wouldn't hurt for future us, isn't very easy to get why we do this without reading PR description.

Also I am not sure if this is required, aren't we restricting custom logits processors to only those that cannot be configured by generation config? Something that is only defined by users for their use-case

Also I am not sure if this is required, aren't we restricting custom logits processors to only those that cannot be configured by generation config? Something that is only defined by users for their use-case

The user always had the option of unsetting a flag and passing the corresponding processor. This change makes it less restricting: if they pass both a flag and the corresponding processor, we keep the processor and ignore the flag (previously we would throw an exception)

I'm not super fan of the new behavior, I think the exception less ambiguous (and thus preferable). However, it's needed to coexist with the main change in this PR, which is (IMO) more important.

I'll add a comment to clarify the evolution of this function, in case we need to trace back a related change :)

yes, same feeling here. I would even restrict it to only custom logits processors in v5 to free us from the burden of maintaining correct priority/ordering etc. Looks like pandora's box what is being fixed 😿

I would even restrict it to only custom logits processors in v5 to free us from the burden of maintaining correct priority/ordering etc.

I definitely don't want this, generate would become very uncomfortable to use 😅 A user always has the option of disabling generation_config flags and passing the processors in the order they want. But most users don't want that level of control, and yet may want an external processor

The reason transformers is so popular is because we enable complex use-cases from a few lines of code

Yes, I agree that this gives users more freedom to order processors the way they want by disabling generation config. Though I feel like it is not very clear to me as a user what happens under the hood, when I pass my own Temperature processor or use config. IMO we need a better docs page for advanced usage, if we allow that much freedom and expect users to know what they are doing

Users almost never 100% know what they are doing, thus open issues on GH 😆

gante · 2025-03-13T17:28:01Z

tests/generation/test_utils.py

            # The two outputs must match and their shape must be as expected
            self._check_similar_generate_outputs(low_output, high_output)

-    @pytest.mark.generate


(a few tests that failed in previous commits were incorrectly marked :) when parameterized is used, most decorators should be used after it)

btw, how important is it to mark skipped test as generate? I have no idea where we use those marks, unless when trying to run only generation tests in which case skip doesn't help much

not very important, it's more for long-term bookkeeping with the automated CI reports (how many tests we have of each mark, how much time do we spend on each mark, % of skips, ...)

zucchini-nlp

Interesting bug, given that the default cache implementation is None. So with user defined config, we're overriding everything back to None?

This PR can work as a short-term solution but we're over-complicating thing too much in generate imo. Setting generation config values to None is fine for most cases, but I see at least one edge case. Suppose a model saves generation config with cache_implementation='static' and the users wants to override it by passing a config with explicitly set cache_implementation=None, because user wants dynamic cache. The future solution wouldn't work for this case

Custom vs model config issue is also relevant to pretrained config, and we kinda solve that issue by asking users to pass kwargs dict, i.e. we're 100% sure which values user wants to override. Just leaving here as random though hehe, this would address the above edge case

I totally agree we need a robust solution, but seems like whatever we do might have edge cases and will be breaking 😢

zucchini-nlp · 2025-03-13T21:42:25Z

src/transformers/generation/utils.py

        default_list: Union[LogitsProcessorList, StoppingCriteriaList],
        custom_list: Union[LogitsProcessorList, StoppingCriteriaList],
    ) -> Union[LogitsProcessorList, StoppingCriteriaList]:
+        """


A tiny comment wouldn't hurt for future us, isn't very easy to get why we do this without reading PR description.

Also I am not sure if this is required, aren't we restricting custom logits processors to only those that cannot be configured by generation config? Something that is only defined by users for their use-case

zucchini-nlp · 2025-03-13T21:51:33Z

tests/generation/test_utils.py

            # The two outputs must match and their shape must be as expected
            self._check_similar_generate_outputs(low_output, high_output)

-    @pytest.mark.generate


btw, how important is it to mark skipped test as generate? I have no idea where we use those marks, unless when trying to run only generation tests in which case skip doesn't help much

gante · 2025-03-14T11:02:20Z

@zucchini-nlp I'm also not happy with the state of parameterization, but I disagree with a few of your points. Let me split my comment into parts, starting with why I believe this is the right change in the short term.

First, an overview of our current status:

model Config and GenerationConfig are intertwined and we can't fully separate them without breaking BC (and it would be very breaking, old working code parametrizes generate through Config).
We don't have per-model GenerationConfig, and we are piggybacking default parameterization through Config. More on this later, see long-term plans at the end.
GenerationConfig has many flags, and it's not reasonable to expect the user to read the full docs.
Likewise, it's not reasonable that a user knows the full compatibility between the model and generate.

Short-term issue

When a user creates a model Config, the initialization is, in essence, a diff to the model's default parameterization. In other words, if we do

config = LlamaConfig(hidden_size=512)

config will have all other fields set to model-specific defaults, because we have per-model Config classes. However, if we do

generation_config = GenerationConfig(do_sample=True)

the initialization is model-agnostic. In other words, initializing GenerationConfig this way loses all model-specific parameterization, ⚠️ even if the model config sets generate-specific args ⚠️. This shouldn't happen for two reasons:
a. The two configs have different assumptions: one follows the model's defaults, the other ignores them.
b. Because it is lacking model information, a default GenerationConfig may cause generate to crash

This leaves us with two short-term solutions:
i. Add more model-level validation to inform the user which flags they need to change. This validation would need to be set in each modeling class.
ii. Shift the model-agnostic assumption of GenerationConfig where it is possible (when generation_config has visibility of the model).

This PR is (ii.) above.

Now, into your comment

Interesting bug, given that the default cache implementation is None. So with user defined config, we're overriding everything back to None?

This PR does not do that. If a user passes GenerationConfig(cache_implementation=None) or GenerationConfig() and the model has cache_implementation = foo by default, the resulting cache_implementation will be foo. The global default is replaced by the model default.

Suppose a model saves generation config with cache_implementation='static' and the users wants to override it by passing a config with explicitly set cache_implementation=None, because user wants dynamic cache. The future solution wouldn't work for this case

It is simple to fix: we add the specific case of dynamic to the list of values in cache_implementation so users can override it. But there are definitely edge cases (e.g. the model owner saves num_beams=4 and the user sets num_beams=1). I consider these edge cases a smaller problem than the problem this PR fixes, which is shift in the right direction (start any parameterization from model-specifc defaults, like we do in Config).

In general, we haven't been rigorous following the good pattern of defaulting to None, with None corresponding to "not set by the user", which would have made a PR like this free of conflicts.

Custom vs model config issue is also relevant to pretrained config, and we kinda solve that issue by asking users to pass kwargs dict, i.e. we're 100% sure which values user wants to override. Just leaving here as random though hehe, this would address the above edge case

That would move us in the opposite direction I want to move 😉 With a well-defined config we can activate more advanced features like caching, hashing, etc, which are useful for advanced use-cases (multi-device, compilation, ...)

Long-term issue

To wrap this very long comment: we're seeing more and more models with generation-specific parameterization, so we will definitely need model-specific GenerationConfig.

In a nutshell, the long-term plan is:

Create model-specific GenerationConfig -- not so much to add new flags, but to hold the right defaults
Create AutoGenerationConfig, from_pretrained() loads from the right class
Discourage the use of the generic GenerationConfig
Now that we have model-specific GenerationConfig, deprecate setting any form of generate flags from Config and finally break BC with a long deprecation cycle
Delete all shenanigans like this PR

But this will be a long piece of work, not something I can sort in a few hours :) Meanwhile, I'd like us to move towards model-level generation defaults whenever possible.

zucchini-nlp · 2025-03-14T12:05:14Z

even if the model config sets generate-specific args

OMG, so many dependencies in generation config. Yeah, this comment makes total sense and agree with the solution for short-tem. We have been adding a lot of stuff without checking for robustness on edge cases like gemma2. My main concern was about the long-term plan to make generation config stable across model types, as we''ll be getting more models with hardcoded generation values. At least from VLM side, we use static cache for image generation. Interestingly, using dynamic cache degardes quality for some models, no idea why yet

Now that we have model-specific GenerationConfig, deprecate setting any form of generate flags from Config and finally break BC with a long deprecation cycle

THIS! ❤️ Love the plan, and looking forward to getting things sorted out. I feel like even just adding model-specific generation config will solve many issues/hacky workarunds

This PR does not do that. If a user passes GenerationConfig(cache_implementation=None) or GenerationConfig() and the model has cache_implementation = foo by default, the resulting cache_implementation will be foo. The global default is replaced by the model default.

Yeah, I meant before this PR we've been setting all cache to None which caused issues when generating. Trying to get the first root cause

zucchini-nlp

Approving, with the note to refactor long-term in the future. Thanks for digging into this issue, and for the detailed plan 💛

gante · 2025-03-15T11:58:39Z

@zucchini-nlp fyi, before merging, I've added:

a TODO with the long term plan
the option to pass cache_implementation="dynamic"
a warning when the changes of this PR kick in (i.e. when the model default overrites flags) and what to do about it if it is not desired

ArthurZucker

nice that you jumped quickly on this one! thanks

ArthurZucker · 2025-03-17T10:24:27Z

tests/models/gemma3/test_modeling_gemma3.py

+    def test_generation_beyond_sliding_window_with_generation_config(self):
+        """


very very nice thanks!

use base gen config

7221e23

github-actions bot marked this pull request as draft March 12, 2025 18:19

gante marked this pull request as ready for review March 12, 2025 18:20

github-actions bot requested review from ArthurZucker and Rocketknight1 March 12, 2025 18:21

trigger tests

3bf123f

fix whisper: custom processors have priority

3befb4a

gante commented Mar 12, 2025

View reviewed changes

gante requested review from zucchini-nlp and removed request for ArthurZucker March 12, 2025 18:58

gante and others added 6 commits March 12, 2025 18:59

Merge branch 'main' into inherit_values_from_base_generation_config

8e004b7

Merge branch 'main' into inherit_values_from_base_generation_config

900b273

Merge branch 'main' into inherit_values_from_base_generation_config

5ac5869

add missing pytest.mark.generate decorator

e105a63

make fixup

1866a47

correct mark after paremeterization

c075e66

gante commented Mar 13, 2025

View reviewed changes

gante changed the title ~~[Generation] When passing a custom generation_config, overwrite default values with the model's base generation_config~~ [Generation, Gemma 3] When passing a custom generation_config, overwrite default values with the model's base generation_config Mar 13, 2025

gante added 4 commits March 13, 2025 17:48

we can only safely overwrite None defaults

e7a2fa3

apply kwargs afterwards

d56f073

add gemma 3 test

9034227

nit

e6b67ef

zucchini-nlp reviewed Mar 13, 2025

View reviewed changes

zucchini-nlp approved these changes Mar 14, 2025

View reviewed changes

gante and others added 2 commits March 15, 2025 11:42

throw warning on overwrite; add 'dynamic' cache_implementation

0b67b12

Merge branch 'main' into inherit_values_from_base_generation_config

b66b0dc

better msg

a636af0

gante merged commit fc8764c into huggingface:main Mar 15, 2025
23 checks passed

ArthurZucker reviewed Mar 17, 2025

View reviewed changes

gante deleted the inherit_values_from_base_generation_config branch March 17, 2025 10:36

yaswanth19 mentioned this pull request Mar 18, 2025

Add Janus model #36053

Merged

5 tasks

gante added the for patch Tag issues / labels that should be included in the next patch label Mar 19, 2025

gante mentioned this pull request Mar 21, 2025

[generate] model defaults being inherited only happens for newer models #36881

Merged

		def test_generation_beyond_sliding_window_with_generation_config(self):
		"""

[Generation, Gemma 3] When passing a custom generation_config, overwrite default values with the model's base generation_config #36684

[Generation, Gemma 3] When passing a custom generation_config, overwrite default values with the model's base generation_config #36684

Uh oh!

Conversation

gante commented Mar 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Uh oh!

github-actions bot commented Mar 12, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Mar 12, 2025

Uh oh!

gante Mar 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gante Mar 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zucchini-nlp left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gante commented Mar 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Short-term issue

Long-term issue

Uh oh!

zucchini-nlp commented Mar 14, 2025

Uh oh!

zucchini-nlp left a comment

Choose a reason for hiding this comment

Uh oh!

gante commented Mar 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[Generation, Gemma 3] When passing a custom `generation_config`, overwrite default values with the model's base `generation_config` #36684

[Generation, Gemma 3] When passing a custom `generation_config`, overwrite default values with the model's base `generation_config` #36684

gante commented Mar 12, 2025 •

edited

Loading

gante Mar 12, 2025 •

edited

Loading

gante Mar 14, 2025 •

edited

Loading

gante commented Mar 14, 2025 •

edited

Loading

gante commented Mar 15, 2025 •

edited

Loading