Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add LLaVa-1.6, bis #29586

Merged
merged 29 commits into from
Mar 20, 2024
Merged

Add LLaVa-1.6, bis #29586

merged 29 commits into from
Mar 20, 2024

Conversation

NielsRogge
Copy link
Contributor

@NielsRogge NielsRogge commented Mar 11, 2024

What does this PR do?

This PR adds LLaVa-1.6 (also called LLaVa-NeXT) as a standalone model rather than modifying the existing modeling_llava.py. As discussed in #29012 this might be better.

To do:

  • convert all checkpoints once PR is approved, along with corresponding integration test, which will also fix the failing documentation test.

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@amyeroberts
Copy link
Collaborator

Thanks for opening @NielsRogge! I can see some model related tests are failing - ping me for review once they're all passing

@NielsRogge
Copy link
Contributor Author

@amyeroberts I've fixed all tests, only the documentation test is failing because the official checkpoint hasn't been uploaded yet.

Copy link
Collaborator

@amyeroberts amyeroberts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this as a new architecture - it's a lot cleaner this way!

docs/source/en/model_doc/llava_next.md Show resolved Hide resolved
tests/models/llava_next/test_modeling_llava_next.py Outdated Show resolved Hide resolved
Comment on lines +214 to +216
@unittest.skip(reason="CPU offload is not yet supported")
def test_cpu_offload(self):
pass
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is funny - if we have _no_split_modules set then it should be supported. Otherwise the model will automatically be skipped for this test, so we shouldn't need it

Copy link
Contributor Author

@NielsRogge NielsRogge Mar 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pinging @younesbelkada here regarding how _no_split_modules should be defined for llava/llava-next. This test fails, but it's not clear to me which class would need to be added to _no_split_modules to make it pass, this test is also skipped for llava.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be resolved

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, as _no_split_modules are set in a copied model, let's just leave as-is for the moment

tests/models/llava_next/test_modeling_llava_next.py Outdated Show resolved Hide resolved
tests/models/llava_next/test_modeling_llava_next.py Outdated Show resolved Hide resolved
This was referenced Mar 17, 2024
@NielsRogge
Copy link
Contributor Author

Thanks for your review, addressed all comments and uploaded all checkpoints to a collection.

The documentation test fails, seems like the machine crashed due to the size of the model (cc @ydshieh).

@ydshieh
Copy link
Collaborator

ydshieh commented Mar 18, 2024

Thanks for your review, addressed all comments and uploaded all checkpoints to a collection.

The documentation test fails, seems like the machine crashed due to the size of the model (cc @ydshieh).

If you can make use lower resource (anyway like fp16, 4-bit etc) that would be great.
If nothing could avoid the crash, just put the file to utils/not_doctested.txt

@amyeroberts
Copy link
Collaborator

@NielsRogge Please review the PR before asking for review. There's files which should not be included as part of the PR here

@NielsRogge
Copy link
Contributor Author

Apologies, scripts are removed

Copy link
Collaborator

@amyeroberts amyeroberts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the continued work on this. It's almost ready, but some more coverage is needed for tests. In particular, the generation tests need to be added for the model

@NielsRogge
Copy link
Contributor Author

NielsRogge commented Mar 19, 2024

Regarding your comment regarding the aspect_ratio_setting of LlavaNextImageProcessor -> actually only aspect_ratio_setting = "anyres" is used for llava 1.6 (LlavaNextForConditionalGeneration).

The other 2 settings ("clip" and "pad") are there to be used for LlavaForConditionalGeneration (the previous generation). This one currently uses CLIPImageProcessor, which is technically incorrect, it should use the "pad" setting of LlavaNextImageProcessor. But I assume we can't simply replace the image processor of LlavaForConditionalGeneration by a new one (cc @ArthurZucker).

Should I keep the aspect_ratio_setting, or remove it and only support "anyres" by default?

@amyeroberts
Copy link
Collaborator

This one currently uses CLIPImageProcessor, which is technically incorrect, it should use the "pad" setting of LlavaNextImageProcessor. But I assume we can't simply replace the image processor of LlavaForConditionalGeneration by a new one (cc @ArthurZucker).

I don't quite understand this statement. "This one" is referring to LlavaForConditionalGeneration or LlavaNextForConditionalGeneration?

But I assume we can't simply replace the image processor of LlavaForConditionalGeneration by a new one (cc @ArthurZucker).

Again, not sure which object you're talking about replacing, but if there's an incorrect implementation it would be better to fix it, possibly with some kind of deprecation cycle / legacy behaviour flag. If the issue is with the pad method, I don't see why we need to completely replace the image processor, instead of updating pad

Should I keep the aspect_ratio_setting, or remove it and only support "anyres" by default?

We should remove the aspect_ratio_setting flag entirely

It doesn't make sense to have an image processor for a specific model, which is preparing inputs for a different model which already has its own image processor.

@NielsRogge
Copy link
Contributor Author

Ok, I've removed the flag as it leads to confusion. Feel free to approve the PR, if possible.

Copy link
Collaborator

@amyeroberts amyeroberts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for iterating!

PR can't be merged as-is as it introduces breaking changes to existing models: llava and vip llava

src/transformers/models/llava/configuration_llava.py Outdated Show resolved Hide resolved
src/transformers/models/llava/configuration_llava.py Outdated Show resolved Hide resolved
model_tied = model_class(config_tied)
params_tied = list(model_tied.parameters())
# Check that the embedding layer and decoding layer are the same in size and in value
# self.assertTrue(check_same_values(embeddings, decoding))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are a bunch of tests commented out here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is copied from the common test

Comment on lines +214 to +216
@unittest.skip(reason="CPU offload is not yet supported")
def test_cpu_offload(self):
pass
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be resolved

model_tied = model_class(config_tied)
params_tied = list(model_tied.parameters())
# Check that the embedding layer and decoding layer are the same in size and in value
# self.assertTrue(check_same_values(embeddings, decoding))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question here - it's better not to copy and have all this dead code

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opened #29737 to fix this

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rebase and make fix-copies should fix this now :)

model_tied = model_class(config_tied)
params_tied = list(model_tied.parameters())
# Check that the embedding layer and decoding layer are the same in size and in value
# self.assertTrue(check_same_values(embeddings, decoding))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here re commented out code

@@ -186,6 +184,188 @@ def test_training_gradient_checkpointing_use_reentrant(self):
def test_training_gradient_checkpointing_use_reentrant_false(self):
pass

# Copied from tests.test_modeling_common.ModelTesterMixin.test_resize_tokens_embeddings with config.vocab_size->config.text_config.vocab_size
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can still set config.vocab_size as a parameter, even if it's not an input argument to the config's init. This would solve having to have all of these copied files for llava, vipllava and llava next

Copy link
Contributor Author

@NielsRogge NielsRogge Mar 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tried that, but that doesn't work. The test fails with:

# Check that resizing the token embeddings with a larger vocab size increases the model's vocab size
model_embed = model.resize_token_embeddings(model_vocab_size + 10)
self.assertEqual(model.config.vocab_size, model_vocab_size + 10)
AssertionError: 99 != 109

This is because when calling the resize_token_embeddings method, we are changing config.text_config.vocab_size, whereas config.vocab_size remains unchanged.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's because you removed the line that properly updates the config value in resize_token_embeddings

Copy link
Contributor Author

@NielsRogge NielsRogge Mar 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok but adding an attribute just to make a test pass sounds a bit weird, right? There's no need to add vocab_size as attribute of the model config, only the text config needs to have it, which why I've removed that line from resize_token_embeddings. Are you ok with leveraging copied from for these tests?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree we don't want to just add stuff to make the tests pass. I mistakenly though vocab_size was still being set in the config for BC, in which case, we would have still needed to update the parameter.

If we leave as a property to deprecate (cf notes in vip llava config file) then we can leave copied from headers here

src/transformers/models/vipllava/configuration_vipllava.py Outdated Show resolved Hide resolved
src/transformers/models/vipllava/configuration_vipllava.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM let's just fix the last 2 nits!

FutureWarning,
)
# set the vocab_size
text_config["vocab_size"] = kwargs["vocab_size"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No - this is still wrong and will break backwards compatibility

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

vocab size from the kwargs was only ever set when text_config was None

Copy link
Contributor Author

@NielsRogge NielsRogge Mar 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just pushed an update which resolves this. If not, please let me know how we can make it backwards compatible.

Copy link
Collaborator

@amyeroberts amyeroberts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the continued iterations!

Just a few final notes on BC for the config files

model_tied = model_class(config_tied)
params_tied = list(model_tied.parameters())
# Check that the embedding layer and decoding layer are the same in size and in value
# self.assertTrue(check_same_values(embeddings, decoding))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rebase and make fix-copies should fix this now :)

@@ -186,6 +184,188 @@ def test_training_gradient_checkpointing_use_reentrant(self):
def test_training_gradient_checkpointing_use_reentrant_false(self):
pass

# Copied from tests.test_modeling_common.ModelTesterMixin.test_resize_tokens_embeddings with config.vocab_size->config.text_config.vocab_size
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree we don't want to just add stuff to make the tests pass. I mistakenly though vocab_size was still being set in the config for BC, in which case, we would have still needed to update the parameter.

If we leave as a property to deprecate (cf notes in vip llava config file) then we can leave copied from headers here

Comment on lines +214 to +216
@unittest.skip(reason="CPU offload is not yet supported")
def test_cpu_offload(self):
pass
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, as _no_split_modules are set in a copied model, let's just leave as-is for the moment

# set the vocab_size
text_config.vocab_size = kwargs["vocab_size"]

self.text_config = text_config

super().__init__(**kwargs)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because it's a param which used to be part of the config, we'll need to deprecate vocab_size too

def __init__(...):
    ...
    self._vocab_size = self.text_config.vocab_size

@property
def vocab_size(self):
    warnings.warn(
        "The `vocab_size` property is deprecated and will be removed in v4.42, since it can be inferred from the `text_config`.",
        FutureWarning,
    )
    return self._vocab_size

def to_dict(self):
    output = super().to_dict()
    output.pop("_vocab_size", None)
    return output

This should be added to the llava config too

Comment on lines 98 to 104
if "vocab_size" in kwargs and not isinstance(text_config, dict):
warnings.warn(
"The `vocab_size` argument is deprecated and will be removed in v4.42, since it can be inferred from the `text_config`.",
FutureWarning,
)
# set the vocab_size
text_config["vocab_size"] = kwargs["vocab_size"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Accidentally left? This has the same BC issues as before and conflicts with the code below

Suggested change
if "vocab_size" in kwargs and not isinstance(text_config, dict):
warnings.warn(
"The `vocab_size` argument is deprecated and will be removed in v4.42, since it can be inferred from the `text_config`.",
FutureWarning,
)
# set the vocab_size
text_config["vocab_size"] = kwargs["vocab_size"]

Comment on lines 130 to 136
if "vocab_size" in kwargs:
warnings.warn(
"The `vocab_size` argument is deprecated and will be removed in v4.42, since it can be inferred from the `text_config`.",
FutureWarning,
)
# set the vocab_size
text_config.vocab_size = kwargs["vocab_size"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might still result in some unexpected behaviour, as previously self.vocab_size was only set from the kwarg vocab_size if text_config was None, but in neither case was the text config overwritten by the passed in argument.

Instead, rather than forcibly overwriting here, I would just have the warning at the top of the init if "vocab_size" is passed in

def __init__(...):
    if "vocab_size" in kwargs: 
        warnings.warn(
            "The `vocab_size` argument is deprecated and will be removed in v4.42, since it can be inferred from the `text_config`. Passing this argument will have no effect",
            FutureWarning,
        )
Suggested change
if "vocab_size" in kwargs:
warnings.warn(
"The `vocab_size` argument is deprecated and will be removed in v4.42, since it can be inferred from the `text_config`.",
FutureWarning,
)
# set the vocab_size
text_config.vocab_size = kwargs["vocab_size"]

Comment on lines 130 to 136
if "vocab_size" in kwargs:
warnings.warn(
"The `vocab_size` argument is deprecated and will be removed in v4.42, since it can be inferred from the `text_config`.",
FutureWarning,
)
# set the vocab_size
text_config.vocab_size = kwargs["vocab_size"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comments as for vip llava:

  • Just have a warning at the start of the init
  • Add some deprecation for vocab_size

Copy link
Collaborator

@amyeroberts amyeroberts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pushed the changes requested so all good to merge once tests pass!

@amyeroberts amyeroberts merged commit d91fd7f into huggingface:main Mar 20, 2024
22 checks passed
ArthurZucker pushed a commit that referenced this pull request Mar 20, 2024
* First draft

* Fix tests, add docs

* Improve docstrings

* Fix test

* Address comments

* Address comments

* Remove vocab_size attribute

* Remove batch_size

* Address comment

* Add image processor tests

* Support fx

* Update docstring

* Add support for 34b

* Convert 34b model

* Add integration tests

* Update checkpoints

* Convert vicuna-13b, remove doc tests

* Remove script

* Remove file

* Address comments

* Improve docstrings

* Deprecate vocab_size

* Remove aspect_ratio_setting

* Address comments

* Update READMEs

* Add tips about chat templates

* Fix tests

* Deprecate vocab_size safely

* Update tests

---------

Co-authored-by: Amy Roberts <22614925+amyeroberts@users.noreply.github.com>
@ArthurZucker
Copy link
Collaborator

a2a9516 there were some unprotected imports 😈

self.vision_feature_select_strategy = vision_feature_select_strategy
self.vision_feature_layer = vision_feature_layer
self.vocab_size = vocab_size
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self = <tests.models.llava.test_modeling_llava.LlavaForConditionalGenerationIntegrationTest testMethod=test_small_model_integration_test_llama>

    @slow
    @require_bitsandbytes
    def test_small_model_integration_test_llama(self):
        # Let' s make sure we test the preprocessing to replace what is used
        model_id = "llava-hf/llava-1.5-7b-hf"
    
>       model = LlavaForConditionalGeneration.from_pretrained("llava-hf/llava-1.5-7b-hf", load_in_4bit=True)

tests/models/llava/test_modeling_llava.py:390: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
src/transformers/modeling_utils.py:3006: in from_pretrained
    config, model_kwargs = cls.config_class.from_pretrained(
src/transformers/configuration_utils.py:609: in from_pretrained
    return cls.from_dict(config_dict, **kwargs)
src/transformers/configuration_utils.py:761: in from_dict
    config = cls(**config_dict)
src/transformers/models/llava/configuration_llava.py:140: in __init__
    super().__init__(**kwargs)
src/transformers/configuration_utils.py:375: in __init__
    raise err
src/transformers/configuration_utils.py:372: in __init__
    setattr(self, key, value)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = LlavaConfig {
  "architectures": [
    "LlavaForConditionalGeneration"
  ],
  "ignore_index": -100,
  "image_token_ind..._dim": 768,
    "vocab_size": 32000
  },
  "vision_feature_layer": -2,
  "vision_feature_select_strategy": "default"
}

key = 'vocab_size', value = 32064

    def __setattr__(self, key, value):
        if key in super().__getattribute__("attribute_map"):
            key = super().__getattribute__("attribute_map")[key]
>       super().__setattr__(key, value)
E       AttributeError: can't set attribute

Copy link
Contributor

@fxmarty fxmarty Mar 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will fix in #29389, a setter is missing

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry about this - thanks for addressing and fixing

@aliencaocao
Copy link
Contributor

This model when loading gives The model weights are not tied. Please use the tie_weightsmethod before using theinfer_auto_device function.. I am using device_map='auto'. Anyway to fix this?

@amyeroberts
Copy link
Collaborator

Hi @aliencaocao, could you open an issue detailing this error and a minimal code snippet to reproduce? This will help us track what has and hasn't been addressed and help people find it if they're experiencing something similar.

@aliencaocao
Copy link
Contributor

#30001

itazap pushed a commit that referenced this pull request May 14, 2024
* First draft

* Fix tests, add docs

* Improve docstrings

* Fix test

* Address comments

* Address comments

* Remove vocab_size attribute

* Remove batch_size

* Address comment

* Add image processor tests

* Support fx

* Update docstring

* Add support for 34b

* Convert 34b model

* Add integration tests

* Update checkpoints

* Convert vicuna-13b, remove doc tests

* Remove script

* Remove file

* Address comments

* Improve docstrings

* Deprecate vocab_size

* Remove aspect_ratio_setting

* Address comments

* Update READMEs

* Add tips about chat templates

* Fix tests

* Deprecate vocab_size safely

* Update tests

---------

Co-authored-by: Amy Roberts <22614925+amyeroberts@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants