-
Notifications
You must be signed in to change notification settings - Fork 26.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add LLaVa-1.6, bis #29586
Add LLaVa-1.6, bis #29586
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
Thanks for opening @NielsRogge! I can see some model related tests are failing - ping me for review once they're all passing |
@amyeroberts I've fixed all tests, only the documentation test is failing because the official checkpoint hasn't been uploaded yet. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding this as a new architecture - it's a lot cleaner this way!
src/transformers/models/llava_next/image_processing_llava_next.py
Outdated
Show resolved
Hide resolved
@unittest.skip(reason="CPU offload is not yet supported") | ||
def test_cpu_offload(self): | ||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is funny - if we have _no_split_modules
set then it should be supported. Otherwise the model will automatically be skipped for this test, so we shouldn't need it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pinging @younesbelkada here regarding how _no_split_modules
should be defined for llava/llava-next. This test fails, but it's not clear to me which class would need to be added to _no_split_modules
to make it pass, this test is also skipped for llava.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, as _no_split_modules
are set in a copied model, let's just leave as-is for the moment
Thanks for your review, addressed all comments and uploaded all checkpoints to a collection. The documentation test fails, seems like the machine crashed due to the size of the model (cc @ydshieh). |
If you can make use lower resource (anyway like fp16, 4-bit etc) that would be great. |
@NielsRogge Please review the PR before asking for review. There's files which should not be included as part of the PR here |
Apologies, scripts are removed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the continued work on this. It's almost ready, but some more coverage is needed for tests. In particular, the generation tests need to be added for the model
src/transformers/models/llava_next/image_processing_llava_next.py
Outdated
Show resolved
Hide resolved
src/transformers/models/llava_next/image_processing_llava_next.py
Outdated
Show resolved
Hide resolved
Regarding your comment regarding the The other 2 settings ("clip" and "pad") are there to be used for Should I keep the |
I don't quite understand this statement. "This one" is referring to
Again, not sure which object you're talking about replacing, but if there's an incorrect implementation it would be better to fix it, possibly with some kind of deprecation cycle / legacy behaviour flag. If the issue is with the pad method, I don't see why we need to completely replace the image processor, instead of updating
We should remove the It doesn't make sense to have an image processor for a specific model, which is preparing inputs for a different model which already has its own image processor. |
Ok, I've removed the flag as it leads to confusion. Feel free to approve the PR, if possible. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for iterating!
PR can't be merged as-is as it introduces breaking changes to existing models: llava and vip llava
model_tied = model_class(config_tied) | ||
params_tied = list(model_tied.parameters()) | ||
# Check that the embedding layer and decoding layer are the same in size and in value | ||
# self.assertTrue(check_same_values(embeddings, decoding)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are a bunch of tests commented out here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is copied from the common test
@unittest.skip(reason="CPU offload is not yet supported") | ||
def test_cpu_offload(self): | ||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be resolved
model_tied = model_class(config_tied) | ||
params_tied = list(model_tied.parameters()) | ||
# Check that the embedding layer and decoding layer are the same in size and in value | ||
# self.assertTrue(check_same_values(embeddings, decoding)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question here - it's better not to copy and have all this dead code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Opened #29737 to fix this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rebase and make fix-copies should fix this now :)
model_tied = model_class(config_tied) | ||
params_tied = list(model_tied.parameters()) | ||
# Check that the embedding layer and decoding layer are the same in size and in value | ||
# self.assertTrue(check_same_values(embeddings, decoding)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here re commented out code
@@ -186,6 +184,188 @@ def test_training_gradient_checkpointing_use_reentrant(self): | |||
def test_training_gradient_checkpointing_use_reentrant_false(self): | |||
pass | |||
|
|||
# Copied from tests.test_modeling_common.ModelTesterMixin.test_resize_tokens_embeddings with config.vocab_size->config.text_config.vocab_size |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can still set config.vocab_size
as a parameter, even if it's not an input argument to the config's init. This would solve having to have all of these copied files for llava, vipllava and llava next
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've tried that, but that doesn't work. The test fails with:
# Check that resizing the token embeddings with a larger vocab size increases the model's vocab size
model_embed = model.resize_token_embeddings(model_vocab_size + 10)
self.assertEqual(model.config.vocab_size, model_vocab_size + 10)
AssertionError: 99 != 109
This is because when calling the resize_token_embeddings method, we are changing config.text_config.vocab_size
, whereas config.vocab_size
remains unchanged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's because you removed the line that properly updates the config value in resize_token_embeddings
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok but adding an attribute just to make a test pass sounds a bit weird, right? There's no need to add vocab_size
as attribute of the model config, only the text config needs to have it, which why I've removed that line from resize_token_embeddings
. Are you ok with leveraging copied from for these tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I agree we don't want to just add stuff to make the tests pass. I mistakenly though vocab_size
was still being set in the config for BC, in which case, we would have still needed to update the parameter.
If we leave as a property to deprecate (cf notes in vip llava config file) then we can leave copied from headers here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM let's just fix the last 2 nits!
FutureWarning, | ||
) | ||
# set the vocab_size | ||
text_config["vocab_size"] = kwargs["vocab_size"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No - this is still wrong and will break backwards compatibility
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
vocab size from the kwargs was only ever set when text_config was None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just pushed an update which resolves this. If not, please let me know how we can make it backwards compatible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the continued iterations!
Just a few final notes on BC for the config files
model_tied = model_class(config_tied) | ||
params_tied = list(model_tied.parameters()) | ||
# Check that the embedding layer and decoding layer are the same in size and in value | ||
# self.assertTrue(check_same_values(embeddings, decoding)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rebase and make fix-copies should fix this now :)
@@ -186,6 +184,188 @@ def test_training_gradient_checkpointing_use_reentrant(self): | |||
def test_training_gradient_checkpointing_use_reentrant_false(self): | |||
pass | |||
|
|||
# Copied from tests.test_modeling_common.ModelTesterMixin.test_resize_tokens_embeddings with config.vocab_size->config.text_config.vocab_size |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I agree we don't want to just add stuff to make the tests pass. I mistakenly though vocab_size
was still being set in the config for BC, in which case, we would have still needed to update the parameter.
If we leave as a property to deprecate (cf notes in vip llava config file) then we can leave copied from headers here
@unittest.skip(reason="CPU offload is not yet supported") | ||
def test_cpu_offload(self): | ||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, as _no_split_modules
are set in a copied model, let's just leave as-is for the moment
# set the vocab_size | ||
text_config.vocab_size = kwargs["vocab_size"] | ||
|
||
self.text_config = text_config | ||
|
||
super().__init__(**kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because it's a param which used to be part of the config, we'll need to deprecate vocab_size
too
def __init__(...):
...
self._vocab_size = self.text_config.vocab_size
@property
def vocab_size(self):
warnings.warn(
"The `vocab_size` property is deprecated and will be removed in v4.42, since it can be inferred from the `text_config`.",
FutureWarning,
)
return self._vocab_size
def to_dict(self):
output = super().to_dict()
output.pop("_vocab_size", None)
return output
This should be added to the llava config too
if "vocab_size" in kwargs and not isinstance(text_config, dict): | ||
warnings.warn( | ||
"The `vocab_size` argument is deprecated and will be removed in v4.42, since it can be inferred from the `text_config`.", | ||
FutureWarning, | ||
) | ||
# set the vocab_size | ||
text_config["vocab_size"] = kwargs["vocab_size"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Accidentally left? This has the same BC issues as before and conflicts with the code below
if "vocab_size" in kwargs and not isinstance(text_config, dict): | |
warnings.warn( | |
"The `vocab_size` argument is deprecated and will be removed in v4.42, since it can be inferred from the `text_config`.", | |
FutureWarning, | |
) | |
# set the vocab_size | |
text_config["vocab_size"] = kwargs["vocab_size"] |
if "vocab_size" in kwargs: | ||
warnings.warn( | ||
"The `vocab_size` argument is deprecated and will be removed in v4.42, since it can be inferred from the `text_config`.", | ||
FutureWarning, | ||
) | ||
# set the vocab_size | ||
text_config.vocab_size = kwargs["vocab_size"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might still result in some unexpected behaviour, as previously self.vocab_size
was only set from the kwarg vocab_size
if text_config
was None, but in neither case was the text config overwritten by the passed in argument.
Instead, rather than forcibly overwriting here, I would just have the warning at the top of the init if "vocab_size" is passed in
def __init__(...):
if "vocab_size" in kwargs:
warnings.warn(
"The `vocab_size` argument is deprecated and will be removed in v4.42, since it can be inferred from the `text_config`. Passing this argument will have no effect",
FutureWarning,
)
if "vocab_size" in kwargs: | |
warnings.warn( | |
"The `vocab_size` argument is deprecated and will be removed in v4.42, since it can be inferred from the `text_config`.", | |
FutureWarning, | |
) | |
# set the vocab_size | |
text_config.vocab_size = kwargs["vocab_size"] |
if "vocab_size" in kwargs: | ||
warnings.warn( | ||
"The `vocab_size` argument is deprecated and will be removed in v4.42, since it can be inferred from the `text_config`.", | ||
FutureWarning, | ||
) | ||
# set the vocab_size | ||
text_config.vocab_size = kwargs["vocab_size"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comments as for vip llava:
- Just have a warning at the start of the init
- Add some deprecation for
vocab_size
ddf7abd
to
0b68483
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pushed the changes requested so all good to merge once tests pass!
* First draft * Fix tests, add docs * Improve docstrings * Fix test * Address comments * Address comments * Remove vocab_size attribute * Remove batch_size * Address comment * Add image processor tests * Support fx * Update docstring * Add support for 34b * Convert 34b model * Add integration tests * Update checkpoints * Convert vicuna-13b, remove doc tests * Remove script * Remove file * Address comments * Improve docstrings * Deprecate vocab_size * Remove aspect_ratio_setting * Address comments * Update READMEs * Add tips about chat templates * Fix tests * Deprecate vocab_size safely * Update tests --------- Co-authored-by: Amy Roberts <22614925+amyeroberts@users.noreply.github.com>
a2a9516 there were some unprotected imports 😈 |
self.vision_feature_select_strategy = vision_feature_select_strategy | ||
self.vision_feature_layer = vision_feature_layer | ||
self.vocab_size = vocab_size |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self = <tests.models.llava.test_modeling_llava.LlavaForConditionalGenerationIntegrationTest testMethod=test_small_model_integration_test_llama>
@slow
@require_bitsandbytes
def test_small_model_integration_test_llama(self):
# Let' s make sure we test the preprocessing to replace what is used
model_id = "llava-hf/llava-1.5-7b-hf"
> model = LlavaForConditionalGeneration.from_pretrained("llava-hf/llava-1.5-7b-hf", load_in_4bit=True)
tests/models/llava/test_modeling_llava.py:390:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
src/transformers/modeling_utils.py:3006: in from_pretrained
config, model_kwargs = cls.config_class.from_pretrained(
src/transformers/configuration_utils.py:609: in from_pretrained
return cls.from_dict(config_dict, **kwargs)
src/transformers/configuration_utils.py:761: in from_dict
config = cls(**config_dict)
src/transformers/models/llava/configuration_llava.py:140: in __init__
super().__init__(**kwargs)
src/transformers/configuration_utils.py:375: in __init__
raise err
src/transformers/configuration_utils.py:372: in __init__
setattr(self, key, value)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = LlavaConfig {
"architectures": [
"LlavaForConditionalGeneration"
],
"ignore_index": -100,
"image_token_ind..._dim": 768,
"vocab_size": 32000
},
"vision_feature_layer": -2,
"vision_feature_select_strategy": "default"
}
key = 'vocab_size', value = 32064
def __setattr__(self, key, value):
if key in super().__getattribute__("attribute_map"):
key = super().__getattribute__("attribute_map")[key]
> super().__setattr__(key, value)
E AttributeError: can't set attribute
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will fix in #29389, a setter is missing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry about this - thanks for addressing and fixing
This model when loading gives |
Hi @aliencaocao, could you open an issue detailing this error and a minimal code snippet to reproduce? This will help us track what has and hasn't been addressed and help people find it if they're experiencing something similar. |
* First draft * Fix tests, add docs * Improve docstrings * Fix test * Address comments * Address comments * Remove vocab_size attribute * Remove batch_size * Address comment * Add image processor tests * Support fx * Update docstring * Add support for 34b * Convert 34b model * Add integration tests * Update checkpoints * Convert vicuna-13b, remove doc tests * Remove script * Remove file * Address comments * Improve docstrings * Deprecate vocab_size * Remove aspect_ratio_setting * Address comments * Update READMEs * Add tips about chat templates * Fix tests * Deprecate vocab_size safely * Update tests --------- Co-authored-by: Amy Roberts <22614925+amyeroberts@users.noreply.github.com>
What does this PR do?
This PR adds LLaVa-1.6 (also called LLaVa-NeXT) as a standalone model rather than modifying the existing modeling_llava.py. As discussed in #29012 this might be better.
To do: