Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Idefics2 #30253

Merged
merged 131 commits into from
Apr 15, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
131 commits
Select commit Hold shift + click to select a range
e536f6a
Merge pull request #9 from huggingface/update
molbap Mar 4, 2024
ef8c0fb
Merge branch 'main' of github.com:huggingface/new-model-addition
ArthurZucker Mar 30, 2024
79277fe
Initial add model additions
amyeroberts Feb 26, 2024
6c89a99
Test
amyeroberts Feb 26, 2024
2e1155b
All weights loading
amyeroberts Feb 27, 2024
661b794
Can perform full forward pass
amyeroberts Feb 27, 2024
dd0a3d2
Local and remote the same
amyeroberts Feb 27, 2024
d124863
Matching local and remote
amyeroberts Mar 1, 2024
7aff0b7
Fixup
amyeroberts Mar 1, 2024
799dc71
Idefics2Model importable; fixup docstrings
amyeroberts Mar 1, 2024
dbebbb1
Don't skip by default
amyeroberts Mar 1, 2024
465e3ed
Remove deprecated use_resampler arg
amyeroberts Mar 1, 2024
ae5b94d
Remove self.config
amyeroberts Mar 1, 2024
7983e93
DecoupledLinear takes config
amyeroberts Mar 1, 2024
0a00064
Tidy up
amyeroberts Mar 1, 2024
6e4ff1b
Enable eager attention and tidy up
amyeroberts Mar 1, 2024
1aa8f7a
Most tests passing
amyeroberts Mar 1, 2024
ea4bf34
Update for batch of processed images
amyeroberts Mar 4, 2024
b6a92da
Add image processor
amyeroberts Mar 4, 2024
0d09b95
Update doc pages
amyeroberts Mar 4, 2024
3c11158
Update conversion script
amyeroberts Mar 4, 2024
c6d4559
Remove erroneous breakpoint
amyeroberts Mar 5, 2024
c6275e9
Remove accidendtal spelling change
amyeroberts Mar 5, 2024
5dd0071
Update to reflect changes on hub - make generate work
amyeroberts Mar 5, 2024
015356b
Fix up
amyeroberts Mar 5, 2024
8c50169
Image processor tests
amyeroberts Mar 5, 2024
da389b8
Update tests
amyeroberts Mar 5, 2024
e8b131d
Add a processor
amyeroberts Mar 5, 2024
2fc3ff3
Add a processor
amyeroberts Mar 6, 2024
e06740c
Update convert script
amyeroberts Mar 6, 2024
083e82b
Update modeling file - remove fixmes
amyeroberts Mar 6, 2024
256fa30
Bug fix
amyeroberts Mar 7, 2024
0fd5400
Add processing test
amyeroberts Mar 7, 2024
f537f27
Use processor
amyeroberts Mar 7, 2024
d14485a
Fix up
amyeroberts Mar 7, 2024
02371e9
Update src/transformers/models/idefics2/modeling_idefics2.py
amyeroberts Mar 11, 2024
7fba70a
Update src/transformers/models/idefics2/modeling_idefics2.py
amyeroberts Mar 11, 2024
0987d15
Fix test
amyeroberts Mar 12, 2024
78ba577
Update config - PR comments and defaults align with checkpoint
amyeroberts Mar 12, 2024
971dd72
Reviewer comments
amyeroberts Mar 12, 2024
d7dfec9
Add copied froms for flahs attention
amyeroberts Mar 12, 2024
097f402
Update src/transformers/models/idefics2/modeling_idefics2.py
amyeroberts Mar 18, 2024
1370836
Apply suggestions from code review
amyeroberts Mar 21, 2024
9dff742
Remove qk_layer_norm and freeze_layers functionality
amyeroberts Mar 21, 2024
0e1be29
Fix
amyeroberts Mar 21, 2024
c334307
Remove freeze_layer options from config
amyeroberts Mar 21, 2024
e5b5bc4
Sync with upstream main
amyeroberts Mar 21, 2024
ec867d8
Fix attention shapes siglip
amyeroberts Mar 22, 2024
0019bf1
Remove Llava-next refs - TO REBASE
amyeroberts Mar 24, 2024
b0e4081
Use AutoModel for text model
amyeroberts Mar 24, 2024
863b2ee
Add comment to explain vision embeddings
amyeroberts Mar 24, 2024
68990f8
Fix issue with tie_word_embeddings
amyeroberts Mar 25, 2024
e1456a0
Address review comments
amyeroberts Mar 25, 2024
f4b45d3
Fix and fix up
amyeroberts Mar 25, 2024
ffb2de3
Chat templates for idefics
amyeroberts Mar 27, 2024
700119d
Fix copies
amyeroberts Mar 27, 2024
cefdd1d
Fix
amyeroberts Mar 27, 2024
4823ecd
Add layer norms to FA2
amyeroberts Mar 27, 2024
2de1098
Fix tests
amyeroberts Mar 27, 2024
5205bba
Apply suggestions from code review
amyeroberts Apr 2, 2024
7edaff5
Fix
amyeroberts Apr 2, 2024
a7a0a2c
Review comments
amyeroberts Apr 2, 2024
16f7666
Update src/transformers/models/idefics2/modeling_idefics2.py
amyeroberts Apr 2, 2024
e3a22e4
Update inputs merger
amyeroberts Apr 2, 2024
1c397b1
Merge weights in correct order
amyeroberts Apr 2, 2024
182ea5f
Update convert script
amyeroberts Apr 3, 2024
0ba4cc4
Update src/transformers/models/idefics2/processing_idefics2.py
amyeroberts Apr 3, 2024
65bf223
Update template
amyeroberts Apr 3, 2024
84ea6e8
Model code examples (fix idefics too)
amyeroberts Apr 3, 2024
ee548af
More review comments
amyeroberts Apr 3, 2024
649563b
Tidy up
amyeroberts Apr 3, 2024
4c4f315
Update processing
amyeroberts Apr 3, 2024
f95e76b
Fix attention mask preparation
amyeroberts Apr 3, 2024
eae3f08
Update inputs_merger inputs
amyeroberts Apr 3, 2024
3043e40
Vectorize inputs_merger
amyeroberts Apr 3, 2024
914fa74
Update src/transformers/models/idefics2/__init__.py
amyeroberts Apr 8, 2024
877109a
Update src/transformers/models/idefics2/modeling_idefics2.py
amyeroberts Apr 8, 2024
9cde5c2
Review comments
amyeroberts Apr 8, 2024
3307e6b
saying bye to the `qk_layer_norms`
VictorSanh Apr 7, 2024
ecaac39
Simplify
amyeroberts Apr 8, 2024
366d21d
Update latents
amyeroberts Apr 8, 2024
5312a80
Remove erroneuous readme changes
amyeroberts Apr 8, 2024
9d1078b
Return images when applying chat template
amyeroberts Apr 8, 2024
b1e2f42
Fix bug - prompt images are for a single sample
amyeroberts Apr 9, 2024
09796a3
Update src/transformers/models/idefics2/modeling_idefics2.py
VictorSanh Apr 10, 2024
eaff6e6
image splitting
VictorSanh Apr 8, 2024
0034f84
fix test
VictorSanh Apr 8, 2024
e2845b1
some more comment
VictorSanh Apr 8, 2024
3ae2a1b
some comment
VictorSanh Apr 8, 2024
833a802
Apply suggestions from code review
VictorSanh Apr 9, 2024
502c3dc
Update src/transformers/models/idefics2/image_processing_idefics2.py
VictorSanh Apr 11, 2024
e8ca7b3
Update processor
amyeroberts Apr 10, 2024
4bde406
Update model tests
amyeroberts Apr 10, 2024
fea200e
Update src/transformers/models/idefics2/processing_idefics2.py
amyeroberts Apr 10, 2024
33e51a6
Update src/transformers/models/idefics2/processing_idefics2.py
amyeroberts Apr 10, 2024
fcad4e4
Don't add BOS in template
amyeroberts Apr 10, 2024
1dc90f0
Update src/transformers/models/idefics2/processing_idefics2.py
amyeroberts Apr 10, 2024
0e16d4a
Remove index in examples
amyeroberts Apr 11, 2024
107693a
Update tests to reflect #13
amyeroberts Apr 11, 2024
cd4f76a
Update src/transformers/models/idefics2/processing_idefics2.py
amyeroberts Apr 11, 2024
31945ee
PR comment - consistent typing
amyeroberts Apr 12, 2024
4ab5e1d
Update readme and model doc
amyeroberts Apr 12, 2024
d8c5045
Update docs
amyeroberts Apr 12, 2024
e8b9751
Update checkpoint references
amyeroberts Apr 12, 2024
b5a7622
Update examples
amyeroberts Apr 12, 2024
7ee8681
Fix and update tests
amyeroberts Apr 12, 2024
31c6634
Small addition
amyeroberts Apr 12, 2024
75f59ef
Update tests - remove copied from as no ignore placement copy could b…
amyeroberts Apr 12, 2024
b5ad135
Update example
amyeroberts Apr 12, 2024
419fba2
small fixes
VictorSanh Apr 13, 2024
ea3838e
Update docs/source/en/model_doc/idefics2.md
amyeroberts Apr 14, 2024
301e1c5
Update docs/source/en/model_doc/idefics2.md
amyeroberts Apr 14, 2024
7b1c4dc
Update README.md
amyeroberts Apr 14, 2024
5be2feb
Connector model as bridge
amyeroberts Apr 12, 2024
34eb76b
Fix up
amyeroberts Apr 14, 2024
7c73ede
Fix up
amyeroberts Apr 14, 2024
455dccf
Don't pass model inputs for generation kwargs update
amyeroberts Apr 15, 2024
3bbd272
IDEFICS-2 -> Idefics2
VictorSanh Apr 15, 2024
b122cb2
Merge pull request #18 from huggingface/vs/name-change
VictorSanh Apr 15, 2024
5414a02
Remove config archive name
amyeroberts Apr 15, 2024
3d84654
IDEFICS-2 -> Idefics2
amyeroberts Apr 15, 2024
8739092
Add back llava-next
amyeroberts Apr 15, 2024
779a8f8
Update readmes
amyeroberts Apr 15, 2024
f8c5301
Add requirements for processor tester
amyeroberts Apr 15, 2024
661b93b
Use custom convert_to_rgb to avoid possible BC
amyeroberts Apr 15, 2024
0efb5e8
Fix doc example
amyeroberts Apr 15, 2024
fed24d1
Fix doc example
amyeroberts Apr 15, 2024
541ce14
Skip model doc tests - as model to large
amyeroberts Apr 15, 2024
2a563b2
More doc example - account for image splitting
amyeroberts Apr 15, 2024
26c8a55
Update src/transformers/image_transforms.py
amyeroberts Apr 15, 2024
16c8317
Fix config doctest
amyeroberts Apr 15, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Chat templates for idefics
  • Loading branch information
amyeroberts committed Apr 14, 2024
commit ffb2de347630cf8f97a0bb237864f622913e2191
246 changes: 234 additions & 12 deletions src/transformers/models/idefics2/processing_idefics2.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,17 +16,35 @@
Processor class for IDEFICS2.
"""

from typing import List, Optional, Union
from functools import lru_cache
from typing import TYPE_CHECKING, Dict, List, Optional, Union

from packaging import version

from ...feature_extraction_utils import BatchFeature
from ...image_utils import ImageInput, is_valid_image
from ...image_utils import ImageInput, is_valid_image, load_image
from ...processing_utils import ProcessorMixin
from ...tokenization_utils_base import BatchEncoding, PaddingStrategy, TextInput, TruncationStrategy
from ...utils import TensorType
from ...tokenization_utils_base import AddedToken, BatchEncoding, PaddingStrategy, TextInput, TruncationStrategy
from ...utils import TensorType, logging


if TYPE_CHECKING:
from .pipelines.conversational import Conversation


logger = logging.get_logger(__name__)


def is_url(val) -> bool:
return isinstance(val, str) and val.startswith("http")


def is_image_or_image_url(elem):
return is_url(elem) or is_valid_image(elem)


def _is_str_or_image(elem):
return isinstance(elem, (str)) or is_valid_image(elem)
return isinstance(elem, (str)) or is_image_or_image_url(elem)


def build_string_from_input(prompt, image_seq_len, bos_token, image_token, fake_image_token):
Expand Down Expand Up @@ -99,15 +117,18 @@ def __init__(self, image_processor, tokenizer=None, image_seq_len: int = 64, **k
if tokenizer is None:
raise ValueError("You need to specify a `tokenizer`.")

self.fake_image_token = "<fake_token_around_image>"
self.image_token = "<image>"
self.fake_image_token = AddedToken("<fake_token_around_image>", normalized=False, special=True)
self.image_token = AddedToken("<image>", normalized=False, special=True)
self.end_of_utterance_token = AddedToken("<end_of_utterance>", normalized=False, special=True)
self.image_seq_len = image_seq_len

tokens_to_add = {"additional_special_tokens": [self.fake_image_token, self.image_token]}
tokens_to_add = {
"additional_special_tokens": [self.fake_image_token, self.image_token, self.end_of_utterance_token]
}
tokenizer.add_special_tokens(tokens_to_add)

bad_words_ids = tokenizer.convert_tokens_to_ids([self.image_token, self.fake_image_token])
self.bad_words_ids = [[id_] for id_ in bad_words_ids]
# Stores a Jinja template that formats chat histories into tokenizable strings
self.chat_template = kwargs.pop("chat_template", None)

super().__init__(image_processor, tokenizer)

Expand Down Expand Up @@ -158,8 +179,15 @@ def __call__(
)
inputs.update(text_inputs)

# Extract the images from the prompts
images = [[elem for elem in prompt if is_valid_image(elem)] for prompt in prompts]
# Extract the images from the prompts, loading them if necessary
images = []
for prompt in prompts:
for elem in prompt:
if is_valid_image(elem):
images.append(elem)
elif is_url(elem):
images.append(load_image(elem))

image_inputs = self.image_processor(images, return_tensors=return_tensors)
inputs.update(image_inputs)

Expand All @@ -184,3 +212,197 @@ def model_input_names(self):
tokenizer_input_names = self.tokenizer.model_input_names
image_processor_input_names = self.image_processor.model_input_names
return list(dict.fromkeys(tokenizer_input_names + image_processor_input_names))

# Copied from transformers.tokenization_utils_base.PreTrainedTokenizerBase.apply_chat_template
def apply_chat_template(
self,
conversation: Union[List[Dict[str, str]], "Conversation"],
chat_template: Optional[str] = None,
add_generation_prompt: bool = False,
tokenize: bool = True,
padding: bool = False,
truncation: bool = False,
max_length: Optional[int] = None,
return_tensors: Optional[Union[str, TensorType]] = None,
return_dict: bool = False,
**tokenizer_kwargs,
) -> Union[str, List[int]]:
"""
Converts a Conversation object or a list of dictionaries with `"role"` and `"content"` keys to a list of token
ids.

This method forwards all its arguments to LlamaTokenizerFast's [`~PreTrainedTokenizer.apply_chat_template`]. Please
refer to the docstring of this method for more information.

for use with chat models, and will read the tokenizer's chat_template attribute to
determine the format and control tokens to use when converting. When chat_template is None, it will fall back
to the default_chat_template specified at the class level.

Args:
conversation (Union[List[Dict[str, str]], "Conversation"]): A Conversation object or list of dicts
with "role" and "content" keys, representing the chat history so far.
chat_template (str, *optional*): A Jinja template to use for this conversion. If
this is not passed, the model's default chat template will be used instead.
add_generation_prompt (bool, *optional*): Whether to end the prompt with the token(s) that indicate
the start of an assistant message. This is useful when you want to generate a response from the model.
Note that this argument will be passed to the chat template, and so it must be supported in the
template for this argument to have any effect.
tokenize (`bool`, defaults to `True`):
Whether to tokenize the output. If `False`, the output will be a string.
padding (`bool`, defaults to `False`):
Whether to pad sequences to the maximum length. Has no effect if tokenize is `False`.
truncation (`bool`, defaults to `False`):
Whether to truncate sequences at the maximum length. Has no effect if tokenize is `False`.
max_length (`int`, *optional*):
Maximum length (in tokens) to use for padding or truncation. Has no effect if tokenize is `False`. If
not specified, the tokenizer's `max_length` attribute will be used as a default.
return_tensors (`str` or [`~utils.TensorType`], *optional*):
If set, will return tensors of a particular framework. Has no effect if tokenize is `False`. Acceptable
values are:
- `'tf'`: Return TensorFlow `tf.Tensor` objects.
- `'pt'`: Return PyTorch `torch.Tensor` objects.
- `'np'`: Return NumPy `np.ndarray` objects.
- `'jax'`: Return JAX `jnp.ndarray` objects.
return_dict (`bool`, *optional*, defaults to `False`):
Whether to return a dictionary with named outputs. Has no effect if tokenize is `False`.
**tokenizer_kwargs: Additional kwargs to pass to the tokenizer.

Returns:
`List[int]`: A list of token ids representing the tokenized chat so far, including control tokens. This
output is ready to pass to the model, either directly or via methods like `generate()`.
"""

if hasattr(conversation, "messages"):
# Indicates it's a Conversation object
conversation = conversation.messages

# priority: `chat_template` argument > `tokenizer.chat_template` > `tokenizer.default_chat_template`
if chat_template is None:
if self.chat_template is not None:
chat_template = self.chat_template
else:
chat_template = self.default_chat_template

# Compilation function uses a cache to avoid recompiling the same template
compiled_template = self._compile_jinja_template(chat_template)

# Ignore copy
rendered = compiled_template.render(
messages=conversation,
add_generation_prompt=add_generation_prompt,
image_tokens=self.image_token.content * self.image_seq_len,
**self.tokenizer.special_tokens_map,
)
# We do a hack here - it's not possible to have the same if/else logic in Jinja to match build_string_from_input so
# we just remove cases when <fake_image_token> has been added twice in a row
rendered = rendered.replace(
f"{self.fake_image_token.content}{self.fake_image_token.content}", f"{self.fake_image_token.content}"
)

if padding is True:
padding = "max_length" # There's only one sequence here, so "longest" makes no sense
if tokenize:
if return_dict:
# Ignore copy
return self.tokenizer(
rendered,
padding=padding,
truncation=truncation,
max_length=max_length,
add_special_tokens=False,
return_tensors=return_tensors,
**tokenizer_kwargs,
)
else:
# Ignore copy
return self.tokenizer.encode(
rendered,
padding=padding,
truncation=truncation,
max_length=max_length,
add_special_tokens=False,
return_tensors=return_tensors,
**tokenizer_kwargs,
)
else:
return rendered

@lru_cache
# Copied from transformers.tokenization_utils_base.PreTrainedTokenizerBase._compile_jinja_template
def _compile_jinja_template(self, chat_template):
try:
import jinja2
from jinja2.exceptions import TemplateError
from jinja2.sandbox import ImmutableSandboxedEnvironment
except ImportError:
raise ImportError("apply_chat_template requires jinja2 to be installed.")

if version.parse(jinja2.__version__) <= version.parse("3.0.0"):
raise ImportError(
"apply_chat_template requires jinja2>=3.0.0 to be installed. Your version is " f"{jinja2.__version__}."
)

def raise_exception(message):
raise TemplateError(message)

jinja_env = ImmutableSandboxedEnvironment(trim_blocks=True, lstrip_blocks=True)
jinja_env.globals["raise_exception"] = raise_exception
# Ignore copy
jinja_env.filters["is_image"] = is_image_or_image_url
return jinja_env.from_string(chat_template)

@property
def default_chat_template(self):
"""
This template formats inputs in the form of a chat history. For each message in the chat history:
* the template will output the role of the speaker followed by the content of the message.
* content can be a single string or a list of strings and images.
* If the content element is an image, the template will output a sequence of <image> tokens and <fake_token_around_image> token before and after each image
* The template will output an <end_of_utterance> token at the end of each message.

Example:

```python
messages = [
{"role": "user", "content": ["What is in this Image?", image1, "https://upload.wikimedia.org/wikipedia/commons/8/86/Id%C3%A9fix.JPG"]},
{"role": "assistant", "content": "This picture depicts Idefix, the dog of Obelix in Asterix and Obelix. Idefix is running on the ground."},
{"role": "user", "content": ["And who is that?"]},
]
```

Will create outputs like:
```
User:What is in this Image?<fake_token_around_image><image><image><image><fake_token_around_image><image><image><image><end_of_utterance>
Assistant:This picture depicts Idefix, the dog of Obelix in Asterix and Obelix. Idefix is running on the ground.<end_of_utterance>
User:And who is that?<end_of_utterance>
Assistant:
```
"""
logger.warning_once(
"\nNo chat template is defined for this processor - using a default chat template "
"that implements the ChatML format (without BOS/EOS tokens!). If the default is not appropriate for "
"your model, please set `tokenizer.chat_template` to an appropriate template. "
"See https://huggingface.co/docs/transformers/main/chat_templating for more information.\n"
)
# fmt: off
return (
"{% for message in messages %}"
"{% if message is iterable and message is not string %}"
"{{message['role'].capitalize() + ':'}}"
"{% for content_elem in message.content %}"
"{% if content_elem | is_image %}"
"{{'<fake_token_around_image>' + image_tokens + '<fake_token_around_image>'}}"
"{% else %}"
"{{content_elem}}"
"{% endif %}"
"{% endfor %}"
"<end_of_utterance>\n"
"{% else %}"
"{{message['role'].capitalize() + ':' + message['content'] + '<end_of_utterance>' + '\n'}}"
"{% endif %}"
"{% endfor %}"
"{% if add_generation_prompt %}"
"{{ 'Assistant:\n' }}"
"{% endif %}"
)
# fmt: on
35 changes: 35 additions & 0 deletions tests/models/idefics2/test_processing_idefics2.py
Original file line number Diff line number Diff line change
Expand Up @@ -133,3 +133,38 @@ def test_process_interleaved_images_prompts(self):
self.assertEqual(inputs['pixel_values'].shape, (2, 2, 3, 767, 980))
self.assertEqual(inputs['pixel_attention_mask'].shape, (2, 2, 767, 980))
# fmt: on

def test_apply_chat_template(self):
# Message contains content which a mix of lists with images and image urls and string
messages = [
{
"role": "user",
"content": [
"What do these images show?",
self.image1,
"https://upload.wikimedia.org/wikipedia/commons/8/86/Id%C3%A9fix.JPG",
],
},
{
"role": "assistant",
"content": "The first image shows the statue of Liberty in New York. The second image picture depicts Idefix, the dog of Obelix in Asterix and Obelix.",
},
{"role": "user", "content": ["And who is that?"]},
]

processor = self.processor
old_seq_len = processor.image_seq_len
# Make short sequence length to test that the fake tokens are added correctly
processor.image_seq_len = 2
rendered = processor.apply_chat_template(messages, add_generation_prompt=True, tokenize=False)

expected_rendered = (
"User:What do these images show?<fake_token_around_image><image><image><fake_token_around_image><image><image><fake_token_around_image><end_of_utterance>\n"
"Assistant:The first image shows the statue of Liberty in New York. The second image picture depicts Idefix, the dog of Obelix in Asterix and Obelix.<end_of_utterance>\n"
"User:And who is that?<end_of_utterance>\n"
"Assistant:\n"
)

self.assertEqual(rendered, expected_rendered)
# Set back to prevent tests from being stateful
processor.image_seq_len = old_seq_len