-
Notifications
You must be signed in to change notification settings - Fork 31.3k
[Generation, Gemma 3] When passing a custom generation_config, overwrite default values with the model's base generation_config
#36684
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Generation, Gemma 3] When passing a custom generation_config, overwrite default values with the model's base generation_config
#36684
Conversation
|
Hi 👋, thank you for opening this pull request! The pull request is converted to draft by default. When it is ready for review, please click the |
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
| default_list: Union[LogitsProcessorList, StoppingCriteriaList], | ||
| custom_list: Union[LogitsProcessorList, StoppingCriteriaList], | ||
| ) -> Union[LogitsProcessorList, StoppingCriteriaList]: | ||
| """ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes in this function are secondary to the main change:
- whisper breaks because it both sets custom logits processors AND has the default flags in the generation config to instantiate them
- after the original change (inherit defaults from the model's generation config), we were throwing an exception here
- after this secondary change, we only throw a warning and discard the logits processor instance created inside
.generate()(i.e. assumes the user knows what they are doing when the passlogits_processorsto.generate()instead of crashing)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A tiny comment wouldn't hurt for future us, isn't very easy to get why we do this without reading PR description.
Also I am not sure if this is required, aren't we restricting custom logits processors to only those that cannot be configured by generation config? Something that is only defined by users for their use-case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also I am not sure if this is required, aren't we restricting custom logits processors to only those that cannot be configured by generation config? Something that is only defined by users for their use-case
The user always had the option of unsetting a flag and passing the corresponding processor. This change makes it less restricting: if they pass both a flag and the corresponding processor, we keep the processor and ignore the flag (previously we would throw an exception)
I'm not super fan of the new behavior, I think the exception less ambiguous (and thus preferable). However, it's needed to coexist with the main change in this PR, which is (IMO) more important.
I'll add a comment to clarify the evolution of this function, in case we need to trace back a related change :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, same feeling here. I would even restrict it to only custom logits processors in v5 to free us from the burden of maintaining correct priority/ordering etc. Looks like pandora's box what is being fixed 😿
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would even restrict it to only custom logits processors in v5 to free us from the burden of maintaining correct priority/ordering etc.
I definitely don't want this, generate would become very uncomfortable to use 😅 A user always has the option of disabling generation_config flags and passing the processors in the order they want. But most users don't want that level of control, and yet may want an external processor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason transformers is so popular is because we enable complex use-cases from a few lines of code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I agree that this gives users more freedom to order processors the way they want by disabling generation config. Though I feel like it is not very clear to me as a user what happens under the hood, when I pass my own Temperature processor or use config. IMO we need a better docs page for advanced usage, if we allow that much freedom and expect users to know what they are doing
Users almost never 100% know what they are doing, thus open issues on GH 😆
| # The two outputs must match and their shape must be as expected | ||
| self._check_similar_generate_outputs(low_output, high_output) | ||
|
|
||
| @pytest.mark.generate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(a few tests that failed in previous commits were incorrectly marked :) when parameterized is used, most decorators should be used after it)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw, how important is it to mark skipped test as generate? I have no idea where we use those marks, unless when trying to run only generation tests in which case skip doesn't help much
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not very important, it's more for long-term bookkeeping with the automated CI reports (how many tests we have of each mark, how much time do we spend on each mark, % of skips, ...)
generation_config, overwrite default values with the model's base generation_configgeneration_config, overwrite default values with the model's base generation_config
zucchini-nlp
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting bug, given that the default cache implementation is None. So with user defined config, we're overriding everything back to None?
This PR can work as a short-term solution but we're over-complicating thing too much in generate imo. Setting generation config values to None is fine for most cases, but I see at least one edge case. Suppose a model saves generation config with cache_implementation='static' and the users wants to override it by passing a config with explicitly set cache_implementation=None, because user wants dynamic cache. The future solution wouldn't work for this case
Custom vs model config issue is also relevant to pretrained config, and we kinda solve that issue by asking users to pass kwargs dict, i.e. we're 100% sure which values user wants to override. Just leaving here as random though hehe, this would address the above edge case
I totally agree we need a robust solution, but seems like whatever we do might have edge cases and will be breaking 😢
| default_list: Union[LogitsProcessorList, StoppingCriteriaList], | ||
| custom_list: Union[LogitsProcessorList, StoppingCriteriaList], | ||
| ) -> Union[LogitsProcessorList, StoppingCriteriaList]: | ||
| """ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A tiny comment wouldn't hurt for future us, isn't very easy to get why we do this without reading PR description.
Also I am not sure if this is required, aren't we restricting custom logits processors to only those that cannot be configured by generation config? Something that is only defined by users for their use-case
| # The two outputs must match and their shape must be as expected | ||
| self._check_similar_generate_outputs(low_output, high_output) | ||
|
|
||
| @pytest.mark.generate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw, how important is it to mark skipped test as generate? I have no idea where we use those marks, unless when trying to run only generation tests in which case skip doesn't help much
|
@zucchini-nlp I'm also not happy with the state of parameterization, but I disagree with a few of your points. Let me split my comment into parts, starting with why I believe this is the right change in the short term. First, an overview of our current status:
Short-term issueWhen a user creates a model config = LlamaConfig(hidden_size=512)
generation_config = GenerationConfig(do_sample=True)the initialization is model-agnostic. In other words, initializing This leaves us with two short-term solutions: This PR is (ii.) above. Now, into your comment
This PR does not do that. If a user passes
It is simple to fix: we add the specific case of In general, we haven't been rigorous following the good pattern of defaulting to
That would move us in the opposite direction I want to move 😉 With a well-defined config we can activate more advanced features like caching, hashing, etc, which are useful for advanced use-cases (multi-device, compilation, ...) Long-term issueTo wrap this very long comment: we're seeing more and more models with generation-specific parameterization, so we will definitely need model-specific In a nutshell, the long-term plan is:
But this will be a long piece of work, not something I can sort in a few hours :) Meanwhile, I'd like us to move towards model-level generation defaults whenever possible. |
OMG, so many dependencies in generation config. Yeah, this comment makes total sense and agree with the solution for short-tem. We have been adding a lot of stuff without checking for robustness on edge cases like gemma2. My main concern was about the long-term plan to make generation config stable across model types, as we''ll be getting more models with hardcoded generation values. At least from VLM side, we use static cache for image generation. Interestingly, using dynamic cache degardes quality for some models, no idea why yet
THIS! ❤️ Love the plan, and looking forward to getting things sorted out. I feel like even just adding model-specific generation config will solve many issues/hacky workarunds
Yeah, I meant before this PR we've been setting all cache to |
zucchini-nlp
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving, with the note to refactor long-term in the future. Thanks for digging into this issue, and for the detailed plan 💛
|
@zucchini-nlp fyi, before merging, I've added:
|
ArthurZucker
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice that you jumped quickly on this one! thanks
| def test_generation_beyond_sliding_window_with_generation_config(self): | ||
| """ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
very very nice thanks!

What does this PR do?
See title.
For instance, gemma 3 models have
cache_implementation="hybrid"by default but, if we passgeneration_config=GenerationConfig()(i.e. default parameters) the code will crash because a hybrid cache is not used. In other words, let's assume a user wants to use the base parameterization by the model creators, and use model-specific defaults as opposed to global defaults.Original testing script, crashing on
main: (by @NathanHB)