Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Idefics2 #30253

Merged
merged 131 commits into from
Apr 15, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
131 commits
Select commit Hold shift + click to select a range
e536f6a
Merge pull request #9 from huggingface/update
molbap Mar 4, 2024
ef8c0fb
Merge branch 'main' of github.com:huggingface/new-model-addition
ArthurZucker Mar 30, 2024
79277fe
Initial add model additions
amyeroberts Feb 26, 2024
6c89a99
Test
amyeroberts Feb 26, 2024
2e1155b
All weights loading
amyeroberts Feb 27, 2024
661b794
Can perform full forward pass
amyeroberts Feb 27, 2024
dd0a3d2
Local and remote the same
amyeroberts Feb 27, 2024
d124863
Matching local and remote
amyeroberts Mar 1, 2024
7aff0b7
Fixup
amyeroberts Mar 1, 2024
799dc71
Idefics2Model importable; fixup docstrings
amyeroberts Mar 1, 2024
dbebbb1
Don't skip by default
amyeroberts Mar 1, 2024
465e3ed
Remove deprecated use_resampler arg
amyeroberts Mar 1, 2024
ae5b94d
Remove self.config
amyeroberts Mar 1, 2024
7983e93
DecoupledLinear takes config
amyeroberts Mar 1, 2024
0a00064
Tidy up
amyeroberts Mar 1, 2024
6e4ff1b
Enable eager attention and tidy up
amyeroberts Mar 1, 2024
1aa8f7a
Most tests passing
amyeroberts Mar 1, 2024
ea4bf34
Update for batch of processed images
amyeroberts Mar 4, 2024
b6a92da
Add image processor
amyeroberts Mar 4, 2024
0d09b95
Update doc pages
amyeroberts Mar 4, 2024
3c11158
Update conversion script
amyeroberts Mar 4, 2024
c6d4559
Remove erroneous breakpoint
amyeroberts Mar 5, 2024
c6275e9
Remove accidendtal spelling change
amyeroberts Mar 5, 2024
5dd0071
Update to reflect changes on hub - make generate work
amyeroberts Mar 5, 2024
015356b
Fix up
amyeroberts Mar 5, 2024
8c50169
Image processor tests
amyeroberts Mar 5, 2024
da389b8
Update tests
amyeroberts Mar 5, 2024
e8b131d
Add a processor
amyeroberts Mar 5, 2024
2fc3ff3
Add a processor
amyeroberts Mar 6, 2024
e06740c
Update convert script
amyeroberts Mar 6, 2024
083e82b
Update modeling file - remove fixmes
amyeroberts Mar 6, 2024
256fa30
Bug fix
amyeroberts Mar 7, 2024
0fd5400
Add processing test
amyeroberts Mar 7, 2024
f537f27
Use processor
amyeroberts Mar 7, 2024
d14485a
Fix up
amyeroberts Mar 7, 2024
02371e9
Update src/transformers/models/idefics2/modeling_idefics2.py
amyeroberts Mar 11, 2024
7fba70a
Update src/transformers/models/idefics2/modeling_idefics2.py
amyeroberts Mar 11, 2024
0987d15
Fix test
amyeroberts Mar 12, 2024
78ba577
Update config - PR comments and defaults align with checkpoint
amyeroberts Mar 12, 2024
971dd72
Reviewer comments
amyeroberts Mar 12, 2024
d7dfec9
Add copied froms for flahs attention
amyeroberts Mar 12, 2024
097f402
Update src/transformers/models/idefics2/modeling_idefics2.py
amyeroberts Mar 18, 2024
1370836
Apply suggestions from code review
amyeroberts Mar 21, 2024
9dff742
Remove qk_layer_norm and freeze_layers functionality
amyeroberts Mar 21, 2024
0e1be29
Fix
amyeroberts Mar 21, 2024
c334307
Remove freeze_layer options from config
amyeroberts Mar 21, 2024
e5b5bc4
Sync with upstream main
amyeroberts Mar 21, 2024
ec867d8
Fix attention shapes siglip
amyeroberts Mar 22, 2024
0019bf1
Remove Llava-next refs - TO REBASE
amyeroberts Mar 24, 2024
b0e4081
Use AutoModel for text model
amyeroberts Mar 24, 2024
863b2ee
Add comment to explain vision embeddings
amyeroberts Mar 24, 2024
68990f8
Fix issue with tie_word_embeddings
amyeroberts Mar 25, 2024
e1456a0
Address review comments
amyeroberts Mar 25, 2024
f4b45d3
Fix and fix up
amyeroberts Mar 25, 2024
ffb2de3
Chat templates for idefics
amyeroberts Mar 27, 2024
700119d
Fix copies
amyeroberts Mar 27, 2024
cefdd1d
Fix
amyeroberts Mar 27, 2024
4823ecd
Add layer norms to FA2
amyeroberts Mar 27, 2024
2de1098
Fix tests
amyeroberts Mar 27, 2024
5205bba
Apply suggestions from code review
amyeroberts Apr 2, 2024
7edaff5
Fix
amyeroberts Apr 2, 2024
a7a0a2c
Review comments
amyeroberts Apr 2, 2024
16f7666
Update src/transformers/models/idefics2/modeling_idefics2.py
amyeroberts Apr 2, 2024
e3a22e4
Update inputs merger
amyeroberts Apr 2, 2024
1c397b1
Merge weights in correct order
amyeroberts Apr 2, 2024
182ea5f
Update convert script
amyeroberts Apr 3, 2024
0ba4cc4
Update src/transformers/models/idefics2/processing_idefics2.py
amyeroberts Apr 3, 2024
65bf223
Update template
amyeroberts Apr 3, 2024
84ea6e8
Model code examples (fix idefics too)
amyeroberts Apr 3, 2024
ee548af
More review comments
amyeroberts Apr 3, 2024
649563b
Tidy up
amyeroberts Apr 3, 2024
4c4f315
Update processing
amyeroberts Apr 3, 2024
f95e76b
Fix attention mask preparation
amyeroberts Apr 3, 2024
eae3f08
Update inputs_merger inputs
amyeroberts Apr 3, 2024
3043e40
Vectorize inputs_merger
amyeroberts Apr 3, 2024
914fa74
Update src/transformers/models/idefics2/__init__.py
amyeroberts Apr 8, 2024
877109a
Update src/transformers/models/idefics2/modeling_idefics2.py
amyeroberts Apr 8, 2024
9cde5c2
Review comments
amyeroberts Apr 8, 2024
3307e6b
saying bye to the `qk_layer_norms`
VictorSanh Apr 7, 2024
ecaac39
Simplify
amyeroberts Apr 8, 2024
366d21d
Update latents
amyeroberts Apr 8, 2024
5312a80
Remove erroneuous readme changes
amyeroberts Apr 8, 2024
9d1078b
Return images when applying chat template
amyeroberts Apr 8, 2024
b1e2f42
Fix bug - prompt images are for a single sample
amyeroberts Apr 9, 2024
09796a3
Update src/transformers/models/idefics2/modeling_idefics2.py
VictorSanh Apr 10, 2024
eaff6e6
image splitting
VictorSanh Apr 8, 2024
0034f84
fix test
VictorSanh Apr 8, 2024
e2845b1
some more comment
VictorSanh Apr 8, 2024
3ae2a1b
some comment
VictorSanh Apr 8, 2024
833a802
Apply suggestions from code review
VictorSanh Apr 9, 2024
502c3dc
Update src/transformers/models/idefics2/image_processing_idefics2.py
VictorSanh Apr 11, 2024
e8ca7b3
Update processor
amyeroberts Apr 10, 2024
4bde406
Update model tests
amyeroberts Apr 10, 2024
fea200e
Update src/transformers/models/idefics2/processing_idefics2.py
amyeroberts Apr 10, 2024
33e51a6
Update src/transformers/models/idefics2/processing_idefics2.py
amyeroberts Apr 10, 2024
fcad4e4
Don't add BOS in template
amyeroberts Apr 10, 2024
1dc90f0
Update src/transformers/models/idefics2/processing_idefics2.py
amyeroberts Apr 10, 2024
0e16d4a
Remove index in examples
amyeroberts Apr 11, 2024
107693a
Update tests to reflect #13
amyeroberts Apr 11, 2024
cd4f76a
Update src/transformers/models/idefics2/processing_idefics2.py
amyeroberts Apr 11, 2024
31945ee
PR comment - consistent typing
amyeroberts Apr 12, 2024
4ab5e1d
Update readme and model doc
amyeroberts Apr 12, 2024
d8c5045
Update docs
amyeroberts Apr 12, 2024
e8b9751
Update checkpoint references
amyeroberts Apr 12, 2024
b5a7622
Update examples
amyeroberts Apr 12, 2024
7ee8681
Fix and update tests
amyeroberts Apr 12, 2024
31c6634
Small addition
amyeroberts Apr 12, 2024
75f59ef
Update tests - remove copied from as no ignore placement copy could b…
amyeroberts Apr 12, 2024
b5ad135
Update example
amyeroberts Apr 12, 2024
419fba2
small fixes
VictorSanh Apr 13, 2024
ea3838e
Update docs/source/en/model_doc/idefics2.md
amyeroberts Apr 14, 2024
301e1c5
Update docs/source/en/model_doc/idefics2.md
amyeroberts Apr 14, 2024
7b1c4dc
Update README.md
amyeroberts Apr 14, 2024
5be2feb
Connector model as bridge
amyeroberts Apr 12, 2024
34eb76b
Fix up
amyeroberts Apr 14, 2024
7c73ede
Fix up
amyeroberts Apr 14, 2024
455dccf
Don't pass model inputs for generation kwargs update
amyeroberts Apr 15, 2024
3bbd272
IDEFICS-2 -> Idefics2
VictorSanh Apr 15, 2024
b122cb2
Merge pull request #18 from huggingface/vs/name-change
VictorSanh Apr 15, 2024
5414a02
Remove config archive name
amyeroberts Apr 15, 2024
3d84654
IDEFICS-2 -> Idefics2
amyeroberts Apr 15, 2024
8739092
Add back llava-next
amyeroberts Apr 15, 2024
779a8f8
Update readmes
amyeroberts Apr 15, 2024
f8c5301
Add requirements for processor tester
amyeroberts Apr 15, 2024
661b93b
Use custom convert_to_rgb to avoid possible BC
amyeroberts Apr 15, 2024
0efb5e8
Fix doc example
amyeroberts Apr 15, 2024
fed24d1
Fix doc example
amyeroberts Apr 15, 2024
541ce14
Skip model doc tests - as model to large
amyeroberts Apr 15, 2024
2a563b2
More doc example - account for image splitting
amyeroberts Apr 15, 2024
26c8a55
Update src/transformers/image_transforms.py
amyeroberts Apr 15, 2024
16c8317
Fix config doctest
amyeroberts Apr 15, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Image processor tests
  • Loading branch information
amyeroberts committed Apr 14, 2024
commit 8c50169981e6733fa330e5bfc75adf67ad141b78
15 changes: 14 additions & 1 deletion src/transformers/image_transforms.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@

from .image_utils import (
ChannelDimension,
ImageInput,
get_channel_dimension_axis,
get_image_size,
infer_channel_dimension_format,
Expand Down Expand Up @@ -745,7 +746,19 @@ def _expand_for_data_format(values):


# TODO (Amy): Accept 1/3/4 channel numpy array as input and return np.array as default
def convert_to_rgb(image):
def convert_to_rgb(image: ImageInput) -> ImageInput:
"""
Converts an image to RGB format. Only converts if the image is of type PIL.Image.Image, otherwise returns the image
as is.
Args:
image (Image):
The image to convert.
"""
requires_backends(convert_to_rgb, ["vision"])

if not isinstance(image, PIL.Image.Image):
return image

# `image.convert("RGB")` would only work for .jpg images, as it creates a wrong background
# for transparent images. The call to `alpha_composite` handles this case
if image.mode == "RGB":
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -332,7 +332,6 @@ def empty_image(size, input_data_format):

for batch_idx in range(batch_size):
for sample_idx, image in enumerate(images[batch_idx]):
print(batch_idx, sample_idx)
padded_images_list[batch_idx][sample_idx] = self._pad_image(
image,
pad_size,
Expand Down
263 changes: 263 additions & 0 deletions tests/models/idefics2/test_image_processing_idefics2.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,263 @@
# coding=utf-8
# Copyright 2021 HuggingFace Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.


import unittest

import numpy as np

from transformers.testing_utils import require_torch, require_vision
from transformers.utils import is_torch_available, is_vision_available

from ...test_image_processing_common import ImageProcessingTestMixin, prepare_image_inputs


if is_vision_available():
from PIL import Image
from transformers import Idefics2ImageProcessor


if is_torch_available():
import torch


class Idefics2ImageProcessingTester(unittest.TestCase):
def __init__(
self,
parent,
batch_size=7,
num_channels=3,
num_images=1,
image_size=18,
min_resolution=30,
max_resolution=400,
do_resize=True,
size=None,
do_rescale=True,
rescale_factor=1 / 255,
do_normalize=True,
image_mean=[0.5, 0.5, 0.5],
image_std=[0.5, 0.5, 0.5],
do_convert_rgb=True,
do_pad=True
):
size = size if size is not None else {"shortest_edge": 378, "longest_edge": 980}
self.parent = parent
self.batch_size = batch_size
self.num_channels = num_channels
self.num_images = num_images
self.image_size = image_size
self.min_resolution = min_resolution
self.max_resolution = max_resolution
self.do_resize = do_resize
self.size = size
self.do_normalize = do_normalize
self.image_mean = image_mean
self.image_std = image_std
self.do_rescale = do_rescale
self.rescale_factor = rescale_factor
self.do_convert_rgb = do_convert_rgb
self.do_pad = do_pad

def prepare_image_processor_dict(self):
return {
"do_convert_rgb": self.do_convert_rgb,
"do_resize": self.do_resize,
"size": self.size,
"do_rescale": self.do_rescale,
"rescale_factor": self.rescale_factor,
"do_normalize": self.do_normalize,
"image_mean": self.image_mean,
"image_std": self.image_std,
"do_pad": self.do_pad
}

def get_expected_values(self, image_inputs, batched=False):
"""
This function computes the expected height and width when providing images to BridgeTowerImageProcessor,
assuming do_resize is set to True with a scalar size and size_divisor.
"""
if not batched:
size = self.size["shortest_edge"]
image = image_inputs[0]
if isinstance(image, Image.Image):
w, h = image.size
else:
h, w = image.shape[1], image.shape[2]

aspect_ratio = w / h
if w > h and w >= 980:
w = 980
h = int(w / aspect_ratio)
elif h > w and h >= 980:
h = 980
w = int(h * aspect_ratio)
w = max(w, 378)
h = max(h, 378)
expected_height = h
expected_width = w
else:
expected_values = []
for images in image_inputs:
for image in images:
expected_height, expected_width = self.get_expected_values([image])
expected_values.append((expected_height, expected_width))
expected_height = max(expected_values, key=lambda item: item[0])[0]
expected_width = max(expected_values, key=lambda item: item[1])[1]

return expected_height, expected_width

def expected_output_image_shape(self, images):
height, width = self.get_expected_values(images, batched=True)
return self.num_images, self.num_channels, height, width

def prepare_image_inputs(
self,
batch_size=None,
min_resolution=None,
max_resolution=None,
num_channels=None,
num_images=None,
size_divisor=None,
equal_resolution=False,
numpify=False,
torchify=False,
):
"""This function prepares a list of PIL images, or a list of numpy arrays if one specifies numpify=True,
or a list of PyTorch tensors if one specifies torchify=True.

One can specify whether the images are of the same resolution or not.
"""
assert not (numpify and torchify), "You cannot specify both numpy and PyTorch tensors at the same time"

batch_size = batch_size if batch_size is not None else self.batch_size
min_resolution = min_resolution if min_resolution is not None else self.min_resolution
max_resolution = max_resolution if max_resolution is not None else self.max_resolution
num_channels = num_channels if num_channels is not None else self.num_channels
num_images = num_images if num_images is not None else self.num_images

images_list = []
for i in range(batch_size):
images = []
for j in range(num_images):
if equal_resolution:
width = height = max_resolution
else:
# To avoid getting image width/height 0
if size_divisor is not None:
# If `size_divisor` is defined, the image needs to have width/size >= `size_divisor`
min_resolution = max(size_divisor, min_resolution)
width, height = np.random.choice(np.arange(min_resolution, max_resolution), 2)
images.append(np.random.randint(255, size=(num_channels, width, height), dtype=np.uint8))
images_list.append(images)

if not numpify and not torchify:
# PIL expects the channel dimension as last dimension
images_list = [[Image.fromarray(np.moveaxis(image, 0, -1)) for image in images] for images in images_list]

if torchify:
images_list = [[torch.from_numpy(image) for image in images] for images in images_list]

return images_list


@require_torch
@require_vision
class Idefics2ImageProcessingTest(ImageProcessingTestMixin, unittest.TestCase):
image_processing_class = Idefics2ImageProcessor if is_vision_available() else None

def setUp(self):
self.image_processor_tester = Idefics2ImageProcessingTester(self)

@property
def image_processor_dict(self):
return self.image_processor_tester.prepare_image_processor_dict()

def test_image_processor_properties(self):
image_processing = self.image_processing_class(**self.image_processor_dict)
self.assertTrue(hasattr(image_processing, "do_convert_rgb"))
self.assertTrue(hasattr(image_processing, "do_resize"))
self.assertTrue(hasattr(image_processing, "size"))
self.assertTrue(hasattr(image_processing, "do_rescale"))
self.assertTrue(hasattr(image_processing, "rescale_factor"))
self.assertTrue(hasattr(image_processing, "do_normalize"))
self.assertTrue(hasattr(image_processing, "image_mean"))
self.assertTrue(hasattr(image_processing, "image_std"))
self.assertTrue(hasattr(image_processing, "do_pad"))

def test_call_numpy(self):
# Initialize image_processing
image_processing = self.image_processing_class(**self.image_processor_dict)
# create random numpy tensors
image_inputs = self.image_processor_tester.prepare_image_inputs(equal_resolution=False, numpify=True)
for sample_images in image_inputs:
for image in sample_images:
self.assertIsInstance(image, np.ndarray)

# Test not batched input
encoded_images = image_processing(image_inputs[0], return_tensors="pt").pixel_values
expected_output_image_shape = self.image_processor_tester.expected_output_image_shape([image_inputs[0]])
self.assertEqual(tuple(encoded_images.shape), (1, *expected_output_image_shape))

# Test batched
encoded_images = image_processing(image_inputs, return_tensors="pt").pixel_values
expected_output_image_shape = self.image_processor_tester.expected_output_image_shape(image_inputs)
self.assertEqual(
tuple(encoded_images.shape), (self.image_processor_tester.batch_size, *expected_output_image_shape)
)

def test_call_pil(self):
# Initialize image_processing
image_processing = self.image_processing_class(**self.image_processor_dict)
# create random PIL images
image_inputs = self.image_processor_tester.prepare_image_inputs(equal_resolution=False)
for images in image_inputs:
for image in images:
self.assertIsInstance(image, Image.Image)

# Test not batched input
encoded_images = image_processing(image_inputs[0], return_tensors="pt").pixel_values
expected_output_image_shape = self.image_processor_tester.expected_output_image_shape([image_inputs[0]])
self.assertEqual(tuple(encoded_images.shape), (1, *expected_output_image_shape))

# Test batched
encoded_images = image_processing(image_inputs, return_tensors="pt").pixel_values
expected_output_image_shape = self.image_processor_tester.expected_output_image_shape(image_inputs)
self.assertEqual(
tuple(encoded_images.shape), (self.image_processor_tester.batch_size, *expected_output_image_shape)
)

def test_call_pytorch(self):
# Initialize image_processing
image_processing = self.image_processing_class(**self.image_processor_dict)
# create random PyTorch tensors
image_inputs = self.image_processor_tester.prepare_image_inputs(equal_resolution=False, torchify=True)

for images in image_inputs:
for image in images:
self.assertIsInstance(image, torch.Tensor)

# Test not batched input
encoded_images = image_processing(image_inputs[0], return_tensors="pt").pixel_values
expected_output_image_shape = self.image_processor_tester.expected_output_image_shape([image_inputs[0]])
self.assertEqual(tuple(encoded_images.shape), (1, *expected_output_image_shape))

# Test batched
expected_output_image_shape = self.image_processor_tester.expected_output_image_shape(image_inputs)
encoded_images = image_processing(image_inputs, return_tensors="pt").pixel_values
self.assertEqual(
tuple(encoded_images.shape),
(self.image_processor_tester.batch_size, *expected_output_image_shape),
)