-
Notifications
You must be signed in to change notification settings - Fork 31.5k
Forbid PretrainedConfig from saving generate parameters; Update deprecations in generate-related code 🧹
#32659
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TL;DR disabled TF conversion (the warning said it would be removed in v4.43), but kept the error message which redirects to the safetensor conversion guide :)
Added a note to myself to remove this file in the future (v4.47).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
found in my quick scan for deprecations (CMD+F on "v4.41") -- let's use our latest docs to guide our users
1303438 to
f14a71a
Compare
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
amyeroberts
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Killing old code 🤩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not trigger this warning within the DynamicCache.from_legacy_cache call - avoids repeating in all the modeling code + avoids forgetting to add in modelling code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may want to use DynamicCache.from_legacy_cache in non-deprecated ways :p for instance, some optimum integrations use it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This situation was not encoded before this PR (and caused tests to fail, god bless tests 🙏 )
src/transformers/modeling_utils.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of warning (before this PR) / raising an exception (this PR if the lines below were not added) when model.config holds things that should be in model.generation_config AND were explicitly set by the user (e.g. the user runs model.config.top_k = 5 before training), now we automagically move misplaced items ✨
Automagically moving things is not usually recommended, but this is a special case: model.save_pretrained is often run at the end of training jobs, crashing here means that $$$ can be lost
|
(@amyeroberts don't re-review yet, I'm going through failed tests -- mostly cases where we are explicitly setting |
|
@gante I'm going to unsubscribe to avoid notifications for each push. Just ping / @ me to let me know when it's ready for review again! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of documenting generate parameters in PretrainedConfig, which is deprecated, show a warning in the docs pointing towards GenerationConfig
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We disallowed storing custom generate flags in the model config -> we can no longer set custom generate parameterization in the model tester [which is not necessarily bad, all models should be able to generate with their default parameters!] -> we can't hardcode our way out of edge cases
This change fixes the edge case where we set a token that's not in the vocab to be suppressed. The alternative would be to reconfigure the tester to have a vocab size larger than the default value for this parameter, >50k tokens. Given that it is pretty harmless, I went with this :)
Confirmed that there were no slow TF Whisper test regressions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just so I understand this is to handle cases coming from our testing suite? I don't think we want to handle this in core modeling code: I'd argue things should fail if tokens are selected outside of the vocab. Is there anything we can modify in the tester instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@amyeroberts We can increase the vocabulary size of the tester to the original vocabulary size of 50+k tokens (instead of the dummy vocab size of 100).
Happy to do that instead :)
[Note: given TF's indexing, where GPU+no XLA gladly accepts out-of-bounds indexes, this logic actually streamlines the behavior across devices and/or XLA.]
src/transformers/generation/utils.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this was actually incorrect: we want to detect changes against the default model config, not against the default generation parameters
(To recap: if a user updates the model config and the generation config was automatically generated from it, regenerate it and throw a warning)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a try/except for composite models, catching ValueError, which led to this discovery :) You'll see this pattern fixed in other composite models.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we go check the PR that added these, we can read that this change was added to maybe fix a bug that happens sporadically 👀
I don't agree with the description of the problem given our current code base, I believe the root cause has been addressed. Furthermore, it's good to remove fudge factors 😉
[note: bart has non-none forced_eos_token_id, for instance. this means our tests now run against the expected parameterization]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this was meant to make the prior version of test_generate_continue_from_past_key_values (touched in this PR) work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No idea why this was here :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this test is also failing on main -- perhaps because of concurrent merges?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Failing on main too.
I suspect this didn't get detected in the (recent) PR that added it because of our test fetched, which only checks the main models on these wide changes (?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you open an issue to track?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(The issue catches a wider net than skipping test_inputs_embeds_matches_input_ids_with_generate, as this skip is merely a symptom of the root problem :) )
|
@amyeroberts Ready 🙌 Notes:
|
PretrainedConfig from saving generate parameters; Update deprecations in generate-related code 🧹
amyeroberts
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for handling all of these updates!
A few questions / comments but overall looks good 🤗
tests/utils/test_modeling_utils.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does it equal in this case? Is it unset or its old value?
If it's its old value, I'm worried this could cause confusion (do we raise an error if the value is in the model config?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@amyeroberts that is a great question -- what gets written in model.config in this case. I agree with your comment (If it's its old value, I'm worried this could cause confusion), which was the actual behavior in the commit you reviewed, so I've made the following adjustments:
- Moving a parameter from
model.configtomodel.generation_configsets that parameter toNoneinmodel.config - Storing a generation parameter as
Noneinmodel.configis now accepted (even if it is not the default value) - Updated the warning message, explaining to the user why they are seeing it
- The test now also checks that the warning is thrown. It is also commented, so it should be simple to understand what should be happening.
🤗
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you open an issue to track?
src/transformers/models/encoder_decoder/configuration_encoder_decoder.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I understand this - why would there be checks run on a config -> generation_config if the model isn't compatible with generate?
It seems like a bad pattern to need a generate specific method added to config if the model can't be used for generation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand and share your pain 😬 Allow me to elaborate.
The root issue is in the model config. A model config is agnostic to whether the model can or cannot generate, unless we hardcode it in the config class or access external objects.
When we call config.save_pretrained, we don't know whether we should or should not check generation parameters. Since we want to move all generation-related logic to GenerationConfig, whose one of the goals is to avoid clashes like this and allow proper validation, we have to assume all configs may be holding generate parameters. The current logic enforces no custom generate parameters are held in the model config, so no new models parameterize generate through the model config.
AST is the exception. It doesn't support generate, but it has a model parameter whose name is also used in generate (max_length). If a user creates a model with a custom max_length, which should be allowed, the current save_pretrained would crash because it thinks it has a custom generate parameter. This workaround effectively skips the generate checks, since we know it can't generate.
Between all three alternatives I could see, this seemed the best solution -- especially since it is a temporary fix, to be removed after we untangle the two configs (v5.0?)
Considered fixes:
- skip generate-related checks on this specific class (implemented)
- add a parameter to ALL model config classes to store whether the model can generate
- add an argument to
config.save_pretrainedto check whether to checkgenerateparameters, receivemodel.can_generate()when called frommodel.save_pretrained
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sense - thanks for taking the time to explain!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just so I understand this is to handle cases coming from our testing suite? I don't think we want to handle this in core modeling code: I'd argue things should fail if tokens are selected outside of the vocab. Is there anything we can modify in the tester instead?
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
…decoder.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
a806b06 to
9f91b5d
Compare
|
CI was failing, rebased with main to check if that was the issue EDIT: root cause identified, it's unrelated to this PR and being fixed (the CI image needs an update) |
amyeroberts
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for iterating and taking the time to answer all my qs.
As this touches so many models and core logic, we should ideally do a [run_all] run of slow tests to make sure everything is tip-top before merge
| self.assertTrue(torch.equal(p1, p2)) | ||
|
|
||
| def test_modifying_model_config_causes_warning_saving_generation_config(self): | ||
| def test_modifying_model_config_gets_moved_to_generation_config(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sense - thanks for taking the time to explain!
…eprecations in `generate`-related code 🧹 (huggingface#32659) Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
What does this PR do?
See title :) I did a quick scan and update over things that fulfilled the following criteria:
Key changes: