Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Idefics2 #30253

Merged
merged 131 commits into from
Apr 15, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
131 commits
Select commit Hold shift + click to select a range
e536f6a
Merge pull request #9 from huggingface/update
molbap Mar 4, 2024
ef8c0fb
Merge branch 'main' of github.com:huggingface/new-model-addition
ArthurZucker Mar 30, 2024
79277fe
Initial add model additions
amyeroberts Feb 26, 2024
6c89a99
Test
amyeroberts Feb 26, 2024
2e1155b
All weights loading
amyeroberts Feb 27, 2024
661b794
Can perform full forward pass
amyeroberts Feb 27, 2024
dd0a3d2
Local and remote the same
amyeroberts Feb 27, 2024
d124863
Matching local and remote
amyeroberts Mar 1, 2024
7aff0b7
Fixup
amyeroberts Mar 1, 2024
799dc71
Idefics2Model importable; fixup docstrings
amyeroberts Mar 1, 2024
dbebbb1
Don't skip by default
amyeroberts Mar 1, 2024
465e3ed
Remove deprecated use_resampler arg
amyeroberts Mar 1, 2024
ae5b94d
Remove self.config
amyeroberts Mar 1, 2024
7983e93
DecoupledLinear takes config
amyeroberts Mar 1, 2024
0a00064
Tidy up
amyeroberts Mar 1, 2024
6e4ff1b
Enable eager attention and tidy up
amyeroberts Mar 1, 2024
1aa8f7a
Most tests passing
amyeroberts Mar 1, 2024
ea4bf34
Update for batch of processed images
amyeroberts Mar 4, 2024
b6a92da
Add image processor
amyeroberts Mar 4, 2024
0d09b95
Update doc pages
amyeroberts Mar 4, 2024
3c11158
Update conversion script
amyeroberts Mar 4, 2024
c6d4559
Remove erroneous breakpoint
amyeroberts Mar 5, 2024
c6275e9
Remove accidendtal spelling change
amyeroberts Mar 5, 2024
5dd0071
Update to reflect changes on hub - make generate work
amyeroberts Mar 5, 2024
015356b
Fix up
amyeroberts Mar 5, 2024
8c50169
Image processor tests
amyeroberts Mar 5, 2024
da389b8
Update tests
amyeroberts Mar 5, 2024
e8b131d
Add a processor
amyeroberts Mar 5, 2024
2fc3ff3
Add a processor
amyeroberts Mar 6, 2024
e06740c
Update convert script
amyeroberts Mar 6, 2024
083e82b
Update modeling file - remove fixmes
amyeroberts Mar 6, 2024
256fa30
Bug fix
amyeroberts Mar 7, 2024
0fd5400
Add processing test
amyeroberts Mar 7, 2024
f537f27
Use processor
amyeroberts Mar 7, 2024
d14485a
Fix up
amyeroberts Mar 7, 2024
02371e9
Update src/transformers/models/idefics2/modeling_idefics2.py
amyeroberts Mar 11, 2024
7fba70a
Update src/transformers/models/idefics2/modeling_idefics2.py
amyeroberts Mar 11, 2024
0987d15
Fix test
amyeroberts Mar 12, 2024
78ba577
Update config - PR comments and defaults align with checkpoint
amyeroberts Mar 12, 2024
971dd72
Reviewer comments
amyeroberts Mar 12, 2024
d7dfec9
Add copied froms for flahs attention
amyeroberts Mar 12, 2024
097f402
Update src/transformers/models/idefics2/modeling_idefics2.py
amyeroberts Mar 18, 2024
1370836
Apply suggestions from code review
amyeroberts Mar 21, 2024
9dff742
Remove qk_layer_norm and freeze_layers functionality
amyeroberts Mar 21, 2024
0e1be29
Fix
amyeroberts Mar 21, 2024
c334307
Remove freeze_layer options from config
amyeroberts Mar 21, 2024
e5b5bc4
Sync with upstream main
amyeroberts Mar 21, 2024
ec867d8
Fix attention shapes siglip
amyeroberts Mar 22, 2024
0019bf1
Remove Llava-next refs - TO REBASE
amyeroberts Mar 24, 2024
b0e4081
Use AutoModel for text model
amyeroberts Mar 24, 2024
863b2ee
Add comment to explain vision embeddings
amyeroberts Mar 24, 2024
68990f8
Fix issue with tie_word_embeddings
amyeroberts Mar 25, 2024
e1456a0
Address review comments
amyeroberts Mar 25, 2024
f4b45d3
Fix and fix up
amyeroberts Mar 25, 2024
ffb2de3
Chat templates for idefics
amyeroberts Mar 27, 2024
700119d
Fix copies
amyeroberts Mar 27, 2024
cefdd1d
Fix
amyeroberts Mar 27, 2024
4823ecd
Add layer norms to FA2
amyeroberts Mar 27, 2024
2de1098
Fix tests
amyeroberts Mar 27, 2024
5205bba
Apply suggestions from code review
amyeroberts Apr 2, 2024
7edaff5
Fix
amyeroberts Apr 2, 2024
a7a0a2c
Review comments
amyeroberts Apr 2, 2024
16f7666
Update src/transformers/models/idefics2/modeling_idefics2.py
amyeroberts Apr 2, 2024
e3a22e4
Update inputs merger
amyeroberts Apr 2, 2024
1c397b1
Merge weights in correct order
amyeroberts Apr 2, 2024
182ea5f
Update convert script
amyeroberts Apr 3, 2024
0ba4cc4
Update src/transformers/models/idefics2/processing_idefics2.py
amyeroberts Apr 3, 2024
65bf223
Update template
amyeroberts Apr 3, 2024
84ea6e8
Model code examples (fix idefics too)
amyeroberts Apr 3, 2024
ee548af
More review comments
amyeroberts Apr 3, 2024
649563b
Tidy up
amyeroberts Apr 3, 2024
4c4f315
Update processing
amyeroberts Apr 3, 2024
f95e76b
Fix attention mask preparation
amyeroberts Apr 3, 2024
eae3f08
Update inputs_merger inputs
amyeroberts Apr 3, 2024
3043e40
Vectorize inputs_merger
amyeroberts Apr 3, 2024
914fa74
Update src/transformers/models/idefics2/__init__.py
amyeroberts Apr 8, 2024
877109a
Update src/transformers/models/idefics2/modeling_idefics2.py
amyeroberts Apr 8, 2024
9cde5c2
Review comments
amyeroberts Apr 8, 2024
3307e6b
saying bye to the `qk_layer_norms`
VictorSanh Apr 7, 2024
ecaac39
Simplify
amyeroberts Apr 8, 2024
366d21d
Update latents
amyeroberts Apr 8, 2024
5312a80
Remove erroneuous readme changes
amyeroberts Apr 8, 2024
9d1078b
Return images when applying chat template
amyeroberts Apr 8, 2024
b1e2f42
Fix bug - prompt images are for a single sample
amyeroberts Apr 9, 2024
09796a3
Update src/transformers/models/idefics2/modeling_idefics2.py
VictorSanh Apr 10, 2024
eaff6e6
image splitting
VictorSanh Apr 8, 2024
0034f84
fix test
VictorSanh Apr 8, 2024
e2845b1
some more comment
VictorSanh Apr 8, 2024
3ae2a1b
some comment
VictorSanh Apr 8, 2024
833a802
Apply suggestions from code review
VictorSanh Apr 9, 2024
502c3dc
Update src/transformers/models/idefics2/image_processing_idefics2.py
VictorSanh Apr 11, 2024
e8ca7b3
Update processor
amyeroberts Apr 10, 2024
4bde406
Update model tests
amyeroberts Apr 10, 2024
fea200e
Update src/transformers/models/idefics2/processing_idefics2.py
amyeroberts Apr 10, 2024
33e51a6
Update src/transformers/models/idefics2/processing_idefics2.py
amyeroberts Apr 10, 2024
fcad4e4
Don't add BOS in template
amyeroberts Apr 10, 2024
1dc90f0
Update src/transformers/models/idefics2/processing_idefics2.py
amyeroberts Apr 10, 2024
0e16d4a
Remove index in examples
amyeroberts Apr 11, 2024
107693a
Update tests to reflect #13
amyeroberts Apr 11, 2024
cd4f76a
Update src/transformers/models/idefics2/processing_idefics2.py
amyeroberts Apr 11, 2024
31945ee
PR comment - consistent typing
amyeroberts Apr 12, 2024
4ab5e1d
Update readme and model doc
amyeroberts Apr 12, 2024
d8c5045
Update docs
amyeroberts Apr 12, 2024
e8b9751
Update checkpoint references
amyeroberts Apr 12, 2024
b5a7622
Update examples
amyeroberts Apr 12, 2024
7ee8681
Fix and update tests
amyeroberts Apr 12, 2024
31c6634
Small addition
amyeroberts Apr 12, 2024
75f59ef
Update tests - remove copied from as no ignore placement copy could b…
amyeroberts Apr 12, 2024
b5ad135
Update example
amyeroberts Apr 12, 2024
419fba2
small fixes
VictorSanh Apr 13, 2024
ea3838e
Update docs/source/en/model_doc/idefics2.md
amyeroberts Apr 14, 2024
301e1c5
Update docs/source/en/model_doc/idefics2.md
amyeroberts Apr 14, 2024
7b1c4dc
Update README.md
amyeroberts Apr 14, 2024
5be2feb
Connector model as bridge
amyeroberts Apr 12, 2024
34eb76b
Fix up
amyeroberts Apr 14, 2024
7c73ede
Fix up
amyeroberts Apr 14, 2024
455dccf
Don't pass model inputs for generation kwargs update
amyeroberts Apr 15, 2024
3bbd272
IDEFICS-2 -> Idefics2
VictorSanh Apr 15, 2024
b122cb2
Merge pull request #18 from huggingface/vs/name-change
VictorSanh Apr 15, 2024
5414a02
Remove config archive name
amyeroberts Apr 15, 2024
3d84654
IDEFICS-2 -> Idefics2
amyeroberts Apr 15, 2024
8739092
Add back llava-next
amyeroberts Apr 15, 2024
779a8f8
Update readmes
amyeroberts Apr 15, 2024
f8c5301
Add requirements for processor tester
amyeroberts Apr 15, 2024
661b93b
Use custom convert_to_rgb to avoid possible BC
amyeroberts Apr 15, 2024
0efb5e8
Fix doc example
amyeroberts Apr 15, 2024
fed24d1
Fix doc example
amyeroberts Apr 15, 2024
541ce14
Skip model doc tests - as model to large
amyeroberts Apr 15, 2024
2a563b2
More doc example - account for image splitting
amyeroberts Apr 15, 2024
26c8a55
Update src/transformers/image_transforms.py
amyeroberts Apr 15, 2024
16c8317
Fix config doctest
amyeroberts Apr 15, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Add a processor
  • Loading branch information
amyeroberts committed Apr 14, 2024
commit 2fc3ff3c5ed6373a6acd36628a6559bde98e59bd
5 changes: 5 additions & 0 deletions docs/source/en/model_doc/idefics2.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,3 +53,8 @@ The original code can be found [here](https://huggingface.co/HuggingFaceM4/idefi
## Idefics2ImageProcessor
[[autodoc]] Idefics2ImageProcessor
- preprocess


## Idefics2Processor
[[autodoc]] Idefics2Processor
- __call__
2 changes: 2 additions & 0 deletions src/transformers/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -2440,6 +2440,7 @@
"Idefics2ForConditionalGeneration",
"Idefics2Model",
"Idefics2PreTrainedModel",
"Idefics2Processor",
]
)
_import_structure["models.imagegpt"].extend(
Expand Down Expand Up @@ -7157,6 +7158,7 @@
Idefics2ForConditionalGeneration,
Idefics2Model,
Idefics2PreTrainedModel,
Idefics2Processor,
)
from .models.imagegpt import (
IMAGEGPT_PRETRAINED_MODEL_ARCHIVE_LIST,
Expand Down
1 change: 1 addition & 0 deletions src/transformers/models/auto/processing_auto.py
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,7 @@
("groupvit", "CLIPProcessor"),
("hubert", "Wav2Vec2Processor"),
("idefics", "IdeficsProcessor"),
("idefics2", "Idefics2Processor"),
("instructblip", "InstructBlipProcessor"),
("kosmos-2", "Kosmos2Processor"),
("layoutlmv2", "LayoutLMv2Processor"),
Expand Down
2 changes: 2 additions & 0 deletions src/transformers/models/idefics2/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,7 @@
"Idefics2PreTrainedModel",
"Idefics2Model",
]
_import_structure["processing_idefics2"] = ["Idefics2Processor"]

if TYPE_CHECKING:
from .configuration_idefics2 import IDEFICS2_PRETRAINED_CONFIG_ARCHIVE_MAP, Idefics2Config
Expand All @@ -64,6 +65,7 @@
Idefics2Model,
Idefics2PreTrainedModel,
)
from .processing_idefics2 import Idefics2Processor


else:
Expand Down
193 changes: 94 additions & 99 deletions src/transformers/models/idefics2/processing_idefics2.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,13 +16,59 @@
Processor class for IDEFICS2.
"""

from ...processing_utils import ProcessorMixin
from ...feature_extraction_utils import BatchFeature
from typing import List, Optional, Union

from ...feature_extraction_utils import BatchFeature
from ...image_utils import ImageInput, is_valid_image
from ...processing_utils import ProcessorMixin
from ...tokenization_utils import AddedToken
from ...tokenization_utils_base import BatchEncoding, PaddingStrategy, TextInput, TruncationStrategy
from ...utils import TensorType


def build_string_from_input(prompt, image_seq_len, bos_token, image_token, fake_image_token):
"""
Builds a string from the input prompt and image tokens.

class IdeficsProcessor(ProcessorMixin):
For example, for the call:

build_string_from_input(
prompt=["Initial str", img1, img2, "mid str", img3],
image_seq_len=2,
bos_token="<s>",
image_token="<im>",
fake_image_token="<fake>"
)

The output will be:

"<s>Initial str<fake><im><im><fake><im><im><fake>mid str<fake><im><im><fake>"

Args:
prompt (`List[Union[str, ImageInput]]`): The input prompt.
image_seq_len (`int`): The length of the image sequence.
bos_token (`str`): The beginning of sentence token.
image_token (`str`): The image token.
fake_image_token (`str`): The fake image token.
"""
s = f"{bos_token}"
open_image_tag = False
for elem in prompt:
if is_valid_image(elem):
s += f"{fake_image_token}{image_token * image_seq_len}"
open_image_tag = True
else:
if open_image_tag:
s += f"{fake_image_token}"
open_image_tag = False
s += elem
if open_image_tag:
s += f"{fake_image_token}"
return s



class Idefics2Processor(ProcessorMixin):
r"""
Constructs a IDEFICS2 processor which wraps a LLama tokenizer and IDEFICS2 image processor into a single processor.

Expand All @@ -34,124 +80,73 @@ class IdeficsProcessor(ProcessorMixin):
An instance of [`Idefics2ImageProcessor`]. The image processor is a required input.
tokenizer (`LlamaTokenizerFast`):
An instance of [`LlamaTokenizerFast`]. The tokenizer is a required input.
image_size (`int`, *optional*, defaults to 224): Image size (assuming a square image)
"""

attributes = ["image_processor", "tokenizer"]
image_processor_class = "Idefics2ImageProcessor"
tokenizer_class = "LlamaTokenizerFast"

def __init__(self, image_processor, tokenizer=None, image_size=224, add_end_of_utterance_token=None, **kwargs):
def __init__(self, image_processor, tokenizer=None, image_seq_len: int = 64, **kwargs):
if image_processor is None:
raise ValueError("You need to specify an `image_processor`.")
if tokenizer is None:
raise ValueError("You need to specify a `tokenizer`.")

self.fake_image_token = "<fake_token_around_image>"
self.image_token = "<image>"
self.image_seq_len = image_seq_len

tokens_to_add = [
AddedToken(self.fake_image_token, lstrip=True, rstrip=False, normalized=False),
AddedToken(self.image_token, lstrip=True, rstrip=False, normalized=False)
]
tokenizer.add_tokens(tokens_to_add)

super().__init__(image_processor, tokenizer)

def __call__(
self,
prompts: Union[List[TextInput], List[List[TextInput]]] = None,
images: Optional[Union[ImageInput, List[ImageInput], List[List[ImageInput]]]] = None,
prompts: Union[List[TextInput], List[List[TextInput]]],
image_seq_len: Optional[int] = None,
padding: Union[bool, str, PaddingStrategy] = False,
truncation: Union[bool, str, TruncationStrategy] = None,
max_length: Optional[int] = None,
return_tensors: Optional[Union[str, TensorType]] = TensorType.PYTORCH,
return_tensors: Optional[Union[str, TensorType]] = None,
) -> BatchEncoding:
"""This method takes batched or non-batched prompts made of text and images and converts them into prompts that
the model was trained on and prepares the image pixel values for the model to process.

Args:
prompts (`Union[List[TextInput], [List[List[TextInput]]]]`):
either a single prompt or a batched list of prompts.
images (`Union[ImageInput, List[ImageInput], List[List[ImageInput]]]`, *optional*):
either a single image or a batched list of images to process
padding (`bool`, `str` or [`~utils.PaddingStrategy`], *optional*, defaults to `False`):
Select a strategy to pad the returned sequences (according to the model's padding side and padding
index) among:
- `True` or `'longest'`: Pad to the longest sequence in the batch (or no padding if only a single
sequence if provided).
- `'max_length'`: Pad to a maximum length specified with the argument `max_length` or to the maximum
acceptable input length for the model if that argument is not provided.
- `False` or `'do_not_pad'` (default): No padding (i.e., can output a batch with sequences of different
lengths).
max_length (`int`, *optional*):
Maximum length of the returned list and optionally padding length (see above).
truncation (`bool`, *optional*):
Activates truncation to cut input sequences longer than `max_length` to `max_length`.
return_tensors (`str` or `TensorType`, *optional*, defaults to `TensorType.PYTORCH`):
The type of tensors to return. Can be one of:
- `TensorType.PYTORCH` or `'pt'`: Return a batch of type `torch.Tensor`.

Returns:
a dict with entries: `input_ids`, `attention_mask`, `pixel_values`, `image_attention_mask` which can be
directly passed to `model.generate`

Example:

```python
checkpoint = "HuggingFaceM4/idefics2"
processor = AutoProcessor.from_pretrained(checkpoint)
url = "https://hips.hearstapps.com/hmg-prod/images/cute-photos-of-cats-in-grass-1593184777.jpg"
img = processor.image_processor.fetch_images([url])[0]

image1 = Image.open(
BytesIO(
requests.get(
"https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
).content
)
)
image2 = Image.open(
BytesIO(requests.get("https://cdn.britannica.com/59/94459-050-DBA42467/Skyline-Chicago.jpg").content)
)
image3 = Image.open(
BytesIO(
requests.get(
"https://thumbs.dreamstime.com/b/golden-gate-bridge-san-francisco-purple-flowers-california-echium-candicans-36805947.jpg"
).content
)
)
raw_images = [
[image1],
[image2, image3],
]
prompts = [
"<fake_token_around_image>{image_seq}<fake_token_around_image>In this image, we see",
"bla bla<fake_token_around_image>{image_seq}<fake_token_around_image>{image_seq}<fake_token_around_image>",
""" """
image_seq_len = image_seq_len if image_seq_len is not None else self.image_seq_len

if isinstance(prompts, list) and not isinstance(prompts[0], list):
prompts = [prompts]

# Build the string from the input prompt and image tokens
prompt_strings = [
build_string_from_input(
prompt=prompt,
image_seq_len=image_seq_len,
bos_token=self.tokenizer.bos_token,
image_token=self.image_token,
fake_image_token=self.fake_image_token
) for prompt in prompts
]

inputs = processor(prompts, return_tensors="pt")
generated_ids = model.generate(**inputs, max_length=100)
generated_text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
```
"""
inputs = BatchFeature()
text_inputs = self.tokenizer(
text=prompt_strings,
add_special_tokens=False,
padding=padding,
truncation=truncation,
max_length=max_length,
return_tensors=return_tensors,
)
inputs.update(text_inputs)

inputs = BatchFeature({}, return_tensors=return_tensors)

if prompts is not None:
if isinstance(prompts, str):
prompts = [prompts]

# Add to BOS token to the prompts
prompts = [f"{self.tokenizer.bos_token}{p}" for p in prompts]

text_inputs = self.tokenizer(
text=prompts,
add_special_tokens=False,
padding=padding,
truncation=truncation,
max_length=max_length,
return_tensors=return_tensors,
)
inputs.update(text_inputs)

if images is not None:
image_inputs = self.image_processor(
images,
return_tensors=return_tensors,
)
inputs.update(image_inputs)
# Extract the images from the prompts
images = [
[elem for elem in prompt if is_valid_image(elem)] for prompt in prompts
]
image_inputs = self.image_processor(images, return_tensors=return_tensors)
inputs.update(image_inputs)

return inputs

Expand Down
7 changes: 7 additions & 0 deletions src/transformers/utils/dummy_pt_objects.py
Original file line number Diff line number Diff line change
Expand Up @@ -4405,6 +4405,13 @@ def __init__(self, *args, **kwargs):
requires_backends(self, ["torch"])


class Idefics2Processor(metaclass=DummyObject):
_backends = ["torch"]

def __init__(self, *args, **kwargs):
requires_backends(self, ["torch"])


IMAGEGPT_PRETRAINED_MODEL_ARCHIVE_LIST = None


Expand Down
15 changes: 8 additions & 7 deletions tests/models/idefics2/test_image_processing_idefics2.py
Original file line number Diff line number Diff line change
Expand Up @@ -91,22 +91,23 @@ def get_expected_values(self, image_inputs, batched=False):
assuming do_resize is set to True with a scalar size and size_divisor.
"""
if not batched:
size = self.size["shortest_edge"]
shortest_edge = self.size["shortest_edge"]
longest_edge = self.size["longest_edge"]
image = image_inputs[0]
if isinstance(image, Image.Image):
w, h = image.size
else:
h, w = image.shape[1], image.shape[2]

aspect_ratio = w / h
if w > h and w >= 980:
w = 980
if w > h and w >= longest_edge:
w = longest_edge
h = int(w / aspect_ratio)
elif h > w and h >= 980:
h = 980
elif h > w and h >= longest_edge:
h = longest_edge
w = int(h * aspect_ratio)
w = max(w, 378)
h = max(h, 378)
w = max(w, shortest_edge)
h = max(h, shortest_edge)
expected_height = h
expected_width = w
else:
Expand Down